更新时间:2022-10-03 18:48:39
前阶段同事迁移Zookeeper(是给Kafka使用的以及flume使用)后发现所有Flume-producer/consumer端集体报错:
1
2
3
4
|
07 Jan 2014 01 : 19 : 32 , 571 INFO [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect: 1058 ) - Opening socket connection to server xxx: 2181
07 Jan 2014 01 : 19 : 32 , 572 INFO [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection: 947 ) - Socket connection established to xxx: 2181 , initiating session
07 Jan 2014 01 : 19 : 32 , 573 INFO [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.run: 1183 ) - Unable to read additional data from server sessionid 0x142f42b91871911 , likely server has closed socket, closing socket connection and attempting reconnect
07 Jan 2014 01 : 19 : 32 , 845 INFO [conf-file-poller- 0 -SendThread(xxx: 2181 )] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect: 1058 ) - Opening socket connection to server xxx: 2181
|
一直在不断的重试连接失败再重试,问同事说:网路连通性早就验证过,然后查看server端日志发现:
1
2
3
4
5
6
7
8
|
2014 - 01 - 06 23 : 59 : 59 , 987 [myid: 1 ] - INFO [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxnFactory @197 ] - Accepted socket connection from /xxx: 45282
2014 - 01 - 06 23 : 59 : 59 , 987 [myid: 1 ] - WARN [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :ZooKeeperServer @793 ] - Connection request from old client xxx: 45282 ; will
be dropped if server is in r-o mode
2014 - 01 - 06 23 : 59 : 59 , 987 [myid: 1 ] - INFO [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :ZooKeeperServer @812 ] - Refusing session request for client xxx: 45282 as it
has seen zxid 0x60fd15564 our last zxid is 0x10000000f client must try another server
2014 - 01 - 06 23 : 59 : 59 , 987 [myid: 1 ] - INFO [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxn @1001 ] - Closed socket connection for client xxx: 45282 (no se
ssion established for client)
2014 - 01 - 06 23 : 59 : 59 , 989 [myid: 1 ] - INFO [NIOServerCxn.Factory: 0.0 . 0.0 / 0.0 . 0.0 : 2181 :NIOServerCnxnFactory @197 ] - Accepted socket connection from xxx: 45285
|
发现Flume还是保留原来的zxid,但是现在的zxid竟然是0,所以抛出异常!
1
2
3
4
5
6
7
8
9
10
11
|
if (connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid) {
String msg = "Refusing session request for client "
+ cnxn.getRemoteSocketAddress()
+ " as it has seen zxid 0x"
+ Long.toHexString(connReq.getLastZxidSeen())
+ " our last zxid is 0x"
+ Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid())
+ " client must try another server" ;
LOG.info(msg);
throw new CloseRequestException(msg);
}
|
后来问同事是怎么做的迁移:先启动一套新的集群,然后关闭老的集群,同时在老集群的一个IP:2181起了一个haproxy代理新集群以为这样,可以做到透明迁移=。=,其实是触发了ZK的bug-832导致不停的重试连接,只有重启flume才可以解决
正确的迁移方式是,把新集群加入老集群,然后修改Flume配置等一段时间(flume自动reconfig)后再关闭老集群就不会触发这个问题了.
本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1365364,如需转载请自行联系原作者