且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 MongoDB 中合并两个集合

更新时间:2022-11-17 13:45:30

这类似于在 MongoDB-users Google Groups 上提出的问题.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1

This is similar to a question that was asked on the MongoDB-users Google Groups.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1

答案引用了一个与您的示例类似的在线教程:http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

The answer references an on-line tutorial which looks similar to your example: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

有关 MongoDB 中的 MapReduce 的更多信息,请参阅文档:http://www.mongodb.org/display/DOCS/MapReduce

For more information on MapReduce in MongoDB, please see the documentation: http://www.mongodb.org/display/DOCS/MapReduce

此外,在题为使用版本化文档查找最大值和最小值"的 MongoDB 食谱文章的附加"部分中,有一个有用的分步演练,介绍了 MapReduce 操作的工作原理:http://cookbook.mongodb.org/patterns/finding_max_and_min/

Additionally, there is a useful step-by-step walkthrough of how a MapReduce operation works in the "Extras" Section of the MongoDB Cookbook article titled, "Finding Max And Min Values with Versioned Documents": http://cookbook.mongodb.org/patterns/finding_max_and_min/

如果您已经阅读了一些参考文件,请原谅我.我将它们包括在内是为了其他可能正在阅读这篇文章并且不熟悉在 MongoDB 中使用 MapReduce 的用户

Forgive me if you have already read some of the referenced documents. I have included them for the benefit of other users who may be reading this post and new to using MapReduce in MongoDB

Map 函数中emit"语句的输出必须与 Reduce 函数的输出相匹配,这一点很重要.如果 Map 函数只输出一个文档,Reduce 函数可能根本没有运行,然后您的输出集合将包含不匹配的文档.

It is important that the outputs from the 'emit' statements in the Map functions match the outputs of the Reduce function. If there is only one document output by the Map function, the Reduce function might not be run at all, and then your output collection will have mismatched documents.

我稍微修改了您的 map 语句,以您想要的输出格式发出文档,并带有两个单独的类"数组.
我还修改了您的 reduce 语句,以将新类添加到 classes_1 和 classes_2 数组中,前提是它们尚不存在.

I have slightly modified your map statements to emit documents in the format of your desired output, with two separate "classes" arrays.
I have also reworked your reduce statement to add new classes to the classes_1 and classes_2 arrays, only if they do not already exist.

var mapDetails = function(){
    var output = {studentid: this.studentid, classes_1: [], classes_2: [], year: this.year, overall: 0, subscore: 0}
    if (this.year == 1) {
        output.classes_1 = this.classes;
    }
    if (this.year == 2) {
        output.classes_2 = this.classes;
    }
    emit(this.studentid, output);
};

var mapGpas = function() {
    emit(this.studentid, {studentid: this.studentid, classes_1: [], classes_2: [], year: 0, overall: this.overall, subscore: this.subscore});
};

var r = function(key, values) {
    var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};

    values.forEach(function(v){
        outs.studentid = v.studentid;
        v.classes_1.forEach(function(class){if(outs.classes_1.indexOf(class)==-1){outs.classes_1.push(class)}})
        v.classes_2.forEach(function(class){if(outs.classes_2.indexOf(class)==-1){outs.classes_2.push(class)}})

        if (v.year == 0) {
            outs.overall = v.overall;
            outs.subscore = v.subscore;
        }
    });
    return outs;
};

res = db.details.mapReduce(mapDetails, r, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, r, {out: {reduce: 'joined'}})

运行两个 MapReduce 操作会产生以下集合,它与您所需的格式相匹配:

Running the two MapReduce operations results in the following collection, which matches your desired format:

> db.joined.find()
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ 1, 17, 19, 21 ], "classes_2" : [ 32, 91, 101, 217 ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ 1, 11, 18, 22 ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ 2, 12, 19, 22 ], "classes_2" : [ 32, 99, 110, 215 ], "overall" : 85, "subscore" : 5 } }
>

MapReduce 总是以 {_id:"id", value:"value"} 的形式输出文档在标题为Dot Notation (Reaching into Objects)"的文档中有更多关于使用子文档的信息:http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

MapReduce always outputs documents in the form of {_id:"id", value:"value"} There is more information available on working with sub-documents in the document titled, "Dot Notation (Reaching into Objects)": http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

如果您希望 MapReduce 的输出以不同的格式显示,您必须在应用程序中以编程方式执行此操作.

If you would like the output of MapReduce to appear in a different format, you will have to do that programmatically in your application.

希望这将提高您对 MapReduce 的理解,并使您更接近生成所需的输出集合.祝你好运!

Hopefully this will improve your understanding of MapReduce, and get you one step closer to producing your desired output collection. Good Luck!