且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用mongoimport将csv数据作为数组导入mongodb中

更新时间:2022-11-25 20:51:46

对我来说,最简单的方法是弄清楚如何为 mongoexport ,以查看CSV格式是什么.

To me, the simplest way to work out how to format your "CSV" for mongoimport is to simply create a collection then use mongoexport on it to see what the CSV format should look like.

因此从外壳创建您的文档:

So create your document from the shell:

db.newcol.insert({
  id:"122234343",
  name: "name1",
  children: ["222334444","333344444"]
})

然后退出外壳并运行mongoexport:

 mongoexport -d test -c testcol --fields id,name,children --type csv > out.csv

哪个将显示为:

id,name,children
122234343,name1,"[""222334444"",""333344444""]"

其中数组"用字符串"表示,并使用引号""以其转义形式表示.

Where the "array" is represented with a "string" and using the quotes "" in their escaped form.

现在这是一个使用mongoimport的明确位置,所以现在只需导入"即可进行测试:

That now is a pretty clear place to use mongoimport from, so just "import" now to test:

mongoimport -d test -c newcol --headerline --type csv out.csv

重新输入外壳并查看新集合中的文档:

Re-enter the shell and see the document(s) in the new collection:

db.newcol.findOne()
{
        "_id" : ObjectId("59476924d6eb0b7d6ac37e02"),
        "id" : 122234343,
        "name" : "name1",
        "children" : "[\"222334444\",\"333344444\"]"
}

所有内容均已存在,这些子项被列为字符串"而不是数组.但这并不是一个真正的问题,因为我们已经导入了数据,而现在正是我们进行实际转换的原因:

So everything is there, BUT the children are listed as a "string" rather than an array. But this is not really a problem, since we got the data imported and now it's just up us to now actually transform it:

var ops = [];
db.testcol.find({ "children": { "$type": 2} }).forEach(doc => {
  var children = doc.children.split(',').map( e => e.replace(/"|\[|\]|\\/gm,'').toString() );
  ops.push({
    "updateOne": {
      "filter": { "_id": doc._id },
      "update": { "$set": { "children": children } }
    }
  });

  if ( ops.length >= 1000 ) {
    db.newcol.bulkWrite(ops);
    ops = [];
  }             
});

if ( ops.length > 0 ) {
  db.newcol.bulkWrite(ops);
  ops = [];
}

因此,这将迭代通过

So that is going to iterate anything that was imported into the collection that has a BSON type of 2 which is "string" via the $type query operator.

然后,我们将字符串取为字符串,将其拆分为数组,然后剥离其他字符以仅保留所需的值.

Then we take the string, split it as an array and strip the other characters to only leave the value you want.

使用 .bulkWrite() 提交这些更新以一种有效的方式,而不是每个请求都写每个文档.实际上,它们是按1000个批次批量发送到服务器的.

Using .bulkWrite() you commit those updates in an efficient way, rather than writing every single document per request. They are actually sent in batches of 1000 to the server.

最终结果是原始所需形式的文档:

The end result is the document in the original wanted form:

db.testcol.findOne()
{
        "_id" : ObjectId("5947652ccb237bd6e4e902a5"),
        "id" : "122234343",
        "name" : "name1",
        "children" : [
                "222334444",
                "333344444"
        ]
}

这就是我逐步"介绍如何计算CSV格式,导入它,然后将数据转换"为所需状态的逐步".

So that is my "step by step" of how you can work out your CSV format, import it and then "transform" the data into the state that you need it.