且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

最有效的方式来存储嵌套类别(或分层数据)在Mongo?

更新时间:2023-11-25 16:29:46

The first thing you want to decide is exactly what kind of tree you will use.

The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.

So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.

As you so rightly pointed out MongoDB does have a good documentation page for this: http://docs.mongodb.org/manual/tutorial/model-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.

The one that should catch the eye if you are looking to query easily is materialised paths: http://docs.mongodb.org/manual/tutorial/model-tree-structures/#model-tree-structures-with-materialized-paths

This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:

db.products.find({category: /^Sports,Tennis,Womens[,]/})

to find all products listed under a certain path of your tree.

Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.

A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:

{
    _id: ObjectId(),
    name: 'Women\'s',
    path: 'Sports,Tennis,Womens',
    normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}

So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.

So an example of changing "Tennis" to "Badmin":

db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
    doc.path = doc.path.replace(/,Tennis/, ",Badmin");
    db.categories.save(doc);
});

Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.

And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.

Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.

The schema for materialised paths easily supports this by just adding another path, that simple.

Hope it makes sense, quite a long one there.

相关阅读

推荐文章