且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Cassandra数据库数据模型:批评我的架构设计

更新时间:2022-11-24 22:11:10

您的数据集最终会在行中有多大?我们使用PlayOrm来存储noSQL中的关系数据,这些数据有时候很好,表可以进入X数百万行。如果你进入数十亿行,那么我们使用PlayOrm对相同的数据进行分区,以便缩放。



那么你需要扩展的能力吗?您可能想查看宽行模式(PlayOrm大量使用)。宽行可以帮助您索引非常快速的查找。



我真的没有得到这部分的东西

  TestsData:{//列族 - 每隔几分钟我们对每个测试进行抽样... 
[TestID] [AlienID] [Bodypart] [MinutesFromTestStart]:{/ /复合列为行键
温度:30 //列
大小:5 //列
}
}

这里不应该有更宽的一行吗?其中testid是行键,并且您有其他数据的许多复合名称?并且宽行不应该大于1000万列,所以确保没有测试数据行会超过。所以宽行可能是



testid - > alienId:fk23 = null,alienId:fk25 = null等等。temperature = 30,size = 5



以后,
Dean


I need to implement a database for a testing system. It is designed to store test data for future statistical analysis. It has to be Cassandra based.

I've designed a schema, but since this is my first attempt at NoSQL design, I would like to get some feedback.

I will first describe the data I wish to save, then describe two basic queries and finally present my suggested design.

I intend on using Cassandra 1.1 so I tried to use Composite Columns in my design, however, feel free to suggest super columns or what ever seems right.

Data:

The basic unit we are testing is an alien. Each alien has a unique ID. Each alien has a number of bodyparts. Also, each alien is part of a family of aliens. The families have unique names.

When we run a test, we run it on a few bodyparts of an alien group. For example, we take a few families and run a test on all of their eyes and mouths.

There are a few kind of tests. We log each test with it's own test unique ID.

When we run a test, we sample all relevant alien bodyparts every couple of minutes and gather some statistics.

Basic Queries:

  1. Per each family or alien or unique bodypart - which tests it participated in.
  2. Per each test ID - which families or aliens or unique bodyparts participated in it.
  3. In the future, statistical analysis of all data...

My attempt at design:

GeneralAliensData : { // Column Family  - general data on aliens. 
    [FamilyID][AlienID][Bodypart] : { //Composite Columns as Row keys
        Race: 'Blurgons' // column
        Shoesize: 5 // column
        Favorite probe: 'fun, toy' // column
    }  
}

TestsData : { // Column Family - we sample each test every couple of minutes...
    [TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
        Temperture: 30 // column
        Size: 5 // column
    }  
}


BodypartTestParticipation : { // Column Family - all the tests a unique bodypart passed...
    [FamilyID][AlienID][Bodypart]: { //Composite Columns as Row keys
        TestID: 105 // column
        TestID: 564 // column
        ...
    }  
}

This is it. Since I'm a real beginner in databases and Cassandra in particular, I'd appreciate any input.

Thank you for your time.

How large will your dataset eventually be in rows? We use PlayOrm to store relational data in noSQL sometimes which works great and tables can go into the X millions of rows. If you are going into the billions/trillions of rows, then we use PlayOrm to partition the same data so it scales.

So, do you need ability to scale? You may want to check out the wide row pattern(PlayOrm makes heavy use of that). Wide rows can help you index stuff for very fast lookups.

I really don't get this part of your stuff

TestsData : { // Column Family - we sample each test every couple of minutes...
    [TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
        Temperture: 30 // column
        Size: 5 // column
    }  
}

Shouldn't it be more a wide row here? where testid is the row key and you have many composite names for the other data? and wide rows should not be larger than 10 million columns so make sure no test data rows would go over that. So a wide row might be

testid -> alienId:fk23=null, alienId:fk25=null, etc. etc. temperture=30, size=5

later, Dean