且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用Java将数据从Google Cloud Storage加载到BigQuery

更新时间:2023-01-06 21:38:01

我不知道任何用于将表格从Google Cloud Storage加载到BigQuery的Java示例。也就是说,如果您按照说明运行查询作业此处,则可以运行加载作业,而不是以下内容:

  Job job = new Job(); 
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);

job.setConfiguration(config);

//设置您从哪里导入(即Google云端存储路径)。
列表< String> sources = new ArrayList< String>();
sources.add(gs://bucket/csv_to_load.csv);
loadConfig.setSourceUris(sources);

//描述您要导入的表格:
TableReference tableRef = new TableReference();
tableRef.setDatasetId(myDataset);
tableRef.setTableId(myTable);
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);

列表< TableFieldSchema> fields = new ArrayList< TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName(foo);
fieldFoo.setType(string);
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName(bar);
fieldBar.setType(integer);
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);

//设置自定义分隔符或标题行以跳过这里....
// [未显示]。

插入insert = bigquery.jobs()。insert(projectId,job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute()。getJobReference();

// ...查看其余的codelab等待工作完成。

有关加载配置对象的更多信息,请参阅javadoc 此处

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?

What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.

I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:

Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);

job.setConfiguration(config);

// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);

// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);

List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);

// Also set custom delimiter or header rows to skip here....
// [not shown].

Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef =  insert.execute().getJobReference();

// ... see rest of codelab for waiting for job to complete.

For more information on the load configuration object, see the javadoc here.