且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

定期将大数据(json)导入Firebase

更新时间:2022-06-18 21:17:00

我终于发布了答案,因为它与2017年新的Google Cloud Platform工具保持一致.

I finally post the answer as it aligns with the new Google Cloud Platform tooling of 2017.

新推出的Google Cloud Functions的运行时间大约为9分钟( 540秒).但是,云功能可以像这样从云存储创建一个node.js读取流( @ googlecloud/npm上的存储)

The newly introduced Google Cloud Functions have a limited run-time of approximately 9 minutes (540 seconds). However, cloud functions are able to create a node.js read stream from cloud storage like so (@googlecloud/storage on npm)

var gcs = require('@google-cloud/storage')({
// You don't need extra authentication when running the function
// online in the same project
  projectId: 'grape-spaceship-123',
  keyFilename: '/path/to/keyfile.json'
});

// Reference an existing bucket. 
var bucket = gcs.bucket('json-upload-bucket');

var remoteReadStream = bucket.file('superlarge.json').createReadStream();

即使它是远程流,它也是高效的.在测试中,我能够在4分钟内解析大于3 GB的json,并进行了简单的json转换.

Even though it is a remote stream, it is highly efficient. In tests I was able to parse jsons larger than 3 GB under 4 minutes, doing simple json transformations.

由于我们现在正在使用node.js流,因此任何JSONStream库都可以动态转换数据( npm上的JSONStream ),就像处理带有事件流的大型数组一样异步处理数据( npm上的事件流).

As we are working with node.js streams now, any JSONStream Library can efficiently transform the data on the fly (JSONStream on npm), dealing with the data asynchronously just like a large array with event streams (event-stream on npm).

es = require('event-stream')

remoteReadStream.pipe(JSONStream.parse('objects.*'))
  .pipe(es.map(function (data, callback(err, data)) {
    console.error(data)
    // Insert Data into Firebase.
    callback(null, data) // ! Return data if you want to make further transformations.
  }))

在管道末尾的回调中仅返回null,以防止内存泄漏阻塞整个函数.

Return only null in the callback at the end of the pipe to prevent a memory leak blocking the whole function.

如果您进行的重转换需要较长的运行时间,则可以在firebase中使用作业数据库"来跟踪您所在的位置,并且仅执行100.000转换并再次调用该函数,或者设置一个额外的函数来监听插入"forimport db"中,最终将原始jsons对象记录异步转换为您的目标格式和生产系统.拆分导入和计算.

If you do heavier transformations that require a longer run time, either use a "job db" in firebase to track where you are at and only do i.e. 100.000 transformations and call the function again, or set up an additional function which listens on inserts into a "forimport db" that finally transforms the raw jsons object record into your target format and production system asynchronously. Splitting import and computation.

此外,您可以在nodejs应用引擎中运行云函数代码.但不一定非要如此.

Additionally, you can run cloud functions code in a nodejs app engine. But not necessarily the other way around.