如何将多个csv文件(不同的架构)加载到bigquery中

更新时间：2023-12-02 20:21:04

我要自动执行此操作的方法基本上是从给定存储桶(或其子文件夹)读取所有文件，并使用其文件名"(进行假设)"作为要提取的目标表名.方法如下:

The way I would go about automating this is basically reading all the files from a given bucket (or its subfolder) and (making an assumption) using their "filename" to be the target tablename to ingest. Here is how:

gsutil ls gs://mybucket/subfolder/*.csv | xargs -I{} echo {} | awk '{n=split($1,A,"/"); q=split(A[n],B,"."); print "mydataset."B[1]" "$0}' | xargs -I{} sh -c 'bq --location=US load --replace=false --autodetect --source_format=CSV {}'

请确保将 location ， mydataset 替换为所需的值.另外，请注意以下假设:

Make sure to replace location, mydataset with your desired values. Also, please take note of the following assumptions:

假定每个CSV的第一行都是标题，因此被视为列名.
我们正在使用-replace = false 标志进行编写，这意味着数据将在您每次运行命令时附加.如果您想改写，只需将其设置为 true ，则每次运行都会覆盖所有表的数据.
CSV文件名( .csv 之前的部分用作表名.您可以修改awk脚本以将其更改为其他任何替代名称.

First row of each CSV is assumed to be the header, and thus is treated as column names.
We are writing with --replace=false flag, meaning data will be appended everytime you run the command. If you want to overwrite instead, just turn it to true and all tables' data will be over-written on each run.
CSV filenames (part before .csv is used as a tablename. You can modify the awk script to change it to any other alternative.

上一篇 : ：如何手动重新渲染组件?下一篇 : 如何从自动完成中删除已删除的分支名称？

如何将多个csv文件(不同的架构)加载到bigquery中

相关阅读

推荐文章