且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么我们将mysql表分成许多较小的表?

更新时间:2023-02-27 09:10:41

我认为您在这里混了几个字.

I think you have a few terms mixed up here.

您的所有数据都进入一个数据库(也称为架构). 在数据库中,您可以有表.

All your data goes into one database (aka schema). In a database you can have tables.

例如

table employee
   id integer
   name varchar
   address varchar
   country varchar

table office
   id integer
   employee_id integer
   address varchar

内部表中有字段(id, name, address) aka列. 并且表有一个或多个行.
表格员工的示例:

Inside tables you have fields (id, name, address) aka columns. And tables have one or more rows.
An example for table employee:

id  name        address           country
----------------------------------------------------
1   John        1 Regent Street   UK
2   James       24 Jump Street    China
3   Darth Vader 1 Death Star      Bestine, Tatooine

基本知识太多了.

为什么要分区
现在,假设我们的数据库中有很多人(行).
请记住,这是一个银河数据库,所以我们有1000亿条记录.
如果我们想快速搜索低谷,***并行进行.
因此,我们将表分区(按国家/地区划分),然后我们可以让x个服务器分别在1个国家/地区查找.
跨服务器分区称为sharding.

Why partitioning
Now suppose that we have lots and lots of people (rows) in our database.
Remember this a galactic database, so we have 100 billion records.
If we want to search trough this fast it's nice if we can do this in parallel.
So we partition the table (say by country) and then we can have x servers looking in 1 country each.
Partitioning across servers is called sharding.

或者我们可以对例如历史数据(按年份),因此我们不必仅查看全部数据即可获取最新新闻.今年我们只需要检查分区.这称为partitioning.

Or we can partition e.g. historical data by year, so we don't have to go through all the data just to get the recent news. We only have to go through the partition for this year. This is called partitioning.

sharding可以只是partitioning和最大的区别是什么?

What's the big difference between sharding can just partitioning?

共享
sharding中,您预计所有数据都是相关的,并且同样有可能被查询. (例如,谷歌可以期望查询其所有数据;将其部分数据归档对他们来说是无用的.)
在这种情况下,您希望许多计算机并行查看数据,每台计算机都在其中进行部分工作.
因此,您为每台计算机分配了不同的数据分区(分片),并为所有计算机提供了相同的查询.结果出来后,您将全部UNION一起输出,并输出结果.

Sharding
In sharding you anticipate that all your data is relevant, and equally likely to be queried. (e.g. google can expect all their data to be queried; archiving part of their data is useless for them).
In this case you want lots of machines to look though your data in parallel, where each machine does part of the work.
So you give each machine a different partition (shard) of the data and give all the machines the same query. When the results come out you UNION them all together and output the result.

基本分区
在基本的partitioning中,数据的一部分为hot,而一部分为not.典型的情况是历史数据,新数据是hot,旧数据很难被触及.
对于此用例,将旧数据放在单独的服务器中是没有意义的.这些机器只会等待,什么也不会做,因为除了那些每年检查一次的审计员之外,没有人会关心旧数据.
因此,您可以按年份对数据进行分区,服务器将自动存档旧分区,因此您的查询将仅查看一(也许2)年的数据,并且速度更快.

Basic partitioning
In basic partitioning part of your data is hot and part is not. A typical case is historical data, the new data is hot, the old data hardly gets touched.
For this use case it is pointless to put the old data in separate servers. Those machines will just wait and wait and do nothing because nobody cares about the old data except some auditors who look at it once a year.
So you partition that data by year and the server will automatically archive the old partitions so your queries will only look at one (maybe 2) years of data and be much faster.

我需要分区吗?
仅当您有大量数据时才进行分区,因为这会使您的设置复杂化.
除非您有超过一百万条记录,否则不必考虑分区. *)
如果您有超过1亿条记录,则应绝对考虑. *)

Do I need partitioning?
You only do partitioning when you have lots and lots of data, because it complicates your setup.
Unless you have more than a million records you don't have to consider partitioning.*)
If you have more than a 100 million records, you should definitely consider it.*)

有关更多信息,请参见: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
和: http ://blog.mayflower.de/archives/353-Is-MySQL-partitioning-useful-for-very-big-real-life-problems.html
另请参阅Wiki: http://en.wikipedia.org/wiki/Partition_%28database%29

For more info see: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
and: http://blog.mayflower.de/archives/353-Is-MySQL-partitioning-useful-for-very-big-real-life-problems.html
See also wiki: http://en.wikipedia.org/wiki/Partition_%28database%29

*)这些只是我个人的启发式YMMV.

*) These are just my personal heuristics YMMV.