且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Zookeeper 容错究竟是什么意思?同时或累积?

更新时间:2022-05-31 03:56:13

Zookeeper 集群要工作,它需要仲裁.仲裁是集群中的大多数服务器.

For Zookeeper cluster to work, it needs quorum. And quorum is the majority of servers from the cluster.

  • 对于 3 节点集群,大多数是 2 个节点.因此,您只能容忍 1 个节点不同步.
  • 对于 5 个节点的集群,大多数是 3 个节点.因此,您只能容忍 2 个节点不同步.
  • 对于 7 个节点的集群,大多数是 4 个节点.因此,您只能容忍 3 个节点不同步.

同步是什么意思?该节点不仅在未运行时不属于仲裁.但也包括它在失败后仍在重新加入集群时.

What does being in sync mean? The node is not part of the quorum not only when it is not running. But also when it is still rejoining the cluster after a failure.

节点在 Zookeeper 配置中硬编码.因此集群中的每个节点都知道它应该是具有 N 个节点的集群的一部分.因此,它不会以两个节点宕机的 7 节点集群突然变成 5 节点集群而另外 2 个节点可能宕机的方式工作.它将始终表现为 7 个节点的集群,除非您更改配置文件,否则只有 3 个节点可以关闭.

The nodes are hardcoded in Zookeeper configuration. So each node in the cluster know that it should be part of a cluster with N nodes. Therefore it doesn't work in the way that a 7 node cluster where two nodes are down is suddenly a 5 node cluster and another 2 nodes can go down. It will always behave as a 7 node cluster and only 3 nodes can go down unless you change the configuration files.

关于偶数和奇数节点的整个事情基本上是关于在保持法定人数时可能关闭的节点数量.对于 4 节点集群,大多数将是 3 个.因此 4 节点集群仍然只能容忍 1 个节点宕机.因此,使用与 3 节点集群具有相同容错能力的 4 节点集群没有多大意义.

The whole thing about even and odd number of nodes is basically about the number of nodes which could be down while maintaining the quorum. And with 4 node cluster, the majority will be 3. So 4 node cluster can still tolerate only 1 node being down. Hence it doesn't make much sense to use 4 node cluster which has the same fault tolerance as the 3 node cluster.