且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

MongoDB在数组中的子文档上聚合

更新时间:2022-11-13 22:38:09

MapReduce速度较慢,但​​可以处理非常大的数据集.另一方面,Aggregation框架要快一些,但要处理大量数据时会遇到困难.

MapReduce is slow, but it can handle very large data sets. The Aggregation framework on the other hand is a little quicker, but will struggle with large data volumes.

所示结构的麻烦在于,您需要"$ unwind"阵列以破解数据.这意味着为每个数组项创建一个新文档,并需要使用聚合框架在内存中执行此操作.因此,如果您有1000个文档和100个数组元素,则需要构建100,000个文档流以对groupBy进行计数.

The trouble with your structure shown is that you need to "$unwind" the arrays to crack open the data. This means creating a new document for every array item and with the aggregation framework it needs to do this in memory. So if you have 1000 documents with 100 array elements it will need to build a stream of 100,000 documents in order to groupBy and count them.

您可能想考虑是否有一种架构布局可以更好地处理您的查询,但是如果您想使用Aggregation框架来做到这一点,可以采用以下方法(使用一些示例数据,以便将整个脚本放入外壳);

You might want to consider seeing if there's a schema layout that will server your queries better, but if you want to do it with the Aggregation framework here's how you could do it (with some sample data so the whole script will drop into the shell);

db.so.remove();
db.so.ensureIndex({ "items.sku": 1}, {unique:false});
db.so.insert([
    {
        _id: 42,
        last_modified: ISODate("2012-03-09T20:55:36Z"),
        status: 'active',
        items: [
            { sku: '00e8da9b', qty: 1, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
    ]
    },
    {
        _id: 43,
        last_modified: ISODate("2012-03-09T20:55:36Z"),
        status: 'active',
        items: [
            { sku: '00e8da9b', qty: 1, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
        ]
    },
]);


db.so.runCommand("aggregate", {
    pipeline: [
        {   // optional filter to exclude inactive elements - can be removed    
            // you'll want an index on this if you use it too
            $match: { status: "active" }
        },
        // unwind creates a doc for every array element
        { $unwind: "$items" },
        {
            $group: {
                // group by unique SKU, but you only wanted to count a SKU once per doc id
                _id: { _id: "$_id", sku: "$items.sku" },
            }
        },
        {
            $group: {
                // group by unique SKU, and count them
                _id: { sku:"$_id.sku" },
                doc_count: { $sum: 1 },
            }
        }
    ]
    //,explain:true
})

请注意,我已经$ group两次,因为您说一个SKU每个文档只能计数一次,因此我们需要首先整理出唯一的doc/sku对,然后对它们进行计数.

Note that I've $group'd twice, because you said that an SKU can only count once per document, so we need to first sort out the unique doc/sku pairs and then count them up.

如果您希望输出有所不同(换句话说,就像您的示例中一样),我们可以对它们进行$ project.

If you want the output a little different (in other words, EXACTLY like in your sample) we can $project them.