更新时间:2023-12-02 20:21:04
我要自动执行此操作的方法基本上是从给定存储桶(或其子文件夹)读取所有文件,并使用其文件名"(进行假设)"作为要提取的目标表名.方法如下:
The way I would go about automating this is basically reading all the files from a given bucket (or its subfolder) and (making an assumption) using their "filename" to be the target tablename to ingest. Here is how:
gsutil ls gs://mybucket/subfolder/*.csv | xargs -I{} echo {} | awk '{n=split($1,A,"/"); q=split(A[n],B,"."); print "mydataset."B[1]" "$0}' | xargs -I{} sh -c 'bq --location=US load --replace=false --autodetect --source_format=CSV {}'
请确保将 location
, mydataset
替换为所需的值.另外,请注意以下假设:
Make sure to replace location
, mydataset
with your desired values. Also, please take note of the following assumptions:
-replace = false
标志进行编写,这意味着数据将在您每次运行命令时附加.如果您想改写,只需将其设置为 true
,则每次运行都会覆盖所有表的数据. .csv
之前的部分用作表名.您可以修改awk脚本以将其更改为其他任何替代名称.--replace=false
flag, meaning data will be appended everytime you run the command. If you want to overwrite instead, just turn it to true
and all tables' data will be over-written on each run..csv
is used as a tablename. You can modify the awk script to change it to any other alternative.