且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Corosync+Pacemaker+DRBD+NFS高可用实例配置

更新时间:2022-08-20 15:16:00

原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 、作者信息和本声明。否则将追究法律责任。http://koumm.blog.51cto.com/703525/1738795

环境说明:    
操作系统: CentOS 6.6 x64,本文采用rpm方式安装corosync+pacemaker+drbd+nfs。 
本文与上文配置进行了一个对比,实现相同的功能,具体哪个好,还是根据需求以及对哪个方案理解比较透,Heartbeat+DRBD+NFS高可用实例配置http://koumm.blog.51cto.com/703525/1737702

一、双机Heartbeat配置

1. app1,app2配置hosts文件,以及主机名。

[root@app1 soft]# vi /etc/hosts   
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4    
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6    
192.168.0.24         app1    
192.168.0.25         app2    
10.10.10.24          app1-priv    
10.10.10.25          app2-priv

说明:10段是心跳IP, 192.168段是业务IP, 采用VIP地址是192.168.0.26。

 

2. 关闭selinux与防火墙

sed -i '/SELINUX/s/enforcing/disabled/' /etc/selinux/config 
setenforce 0 
chkconfig iptables off 
service iptables stop

 

3. 配置各节点ssh互信,好像可配\可不配,方便管理。

app1: 
[root@app1 ~]# ssh-keygen  -t rsa -f ~/.ssh/id_rsa  -P ''  
[root@app1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@app2

app2: 
[root@app2 ~]# ssh-keygen  -t rsa -f ~/.ssh/id_rsa  -P '' 
[root@app2 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@app1

 

二、DRDB安装配置

1. app1,app2配置hosts文件以及准备磁盘分区

app1: /dev/sdb1  —> app2: /dev/sdb1

 

2. app1,app2安装drbd并安装

(1) 下载drbd安装包, CentOS6.6采用kmod-drbd84-8.4.5-504.1安装包才可用。 
http://rpm.pbone.net/

drbd84-utils-8.9.1-1.el6.elrepo.x86_64.rpm 
kmod-drbd84-8.4.5-504.1.el6.x86_64.rpm

# rpm -ivh drbd84-utils-8.9.5-1.el6.elrepo.x86_64.rpm kmod-drbd84-8.4.5-504.1.el6.x86_64.rpm 
Preparing...                ########################################### [100%] 
   1:drbd84-utils           ########################################### [ 50%] 
   2:kmod-drbd84            ########################################### [100%] 
Working. This may take some time ... 
Done. 
#

(2) 加载DRBD到内核模块

app1,app2分别操作,并加入到/etc/rc.local文件中。 
modprobe drbd 
lsmode |grep drbd

 

3. 创建修改配置文件。节点1,节点2一样配置。

[root@app1 ~]# vi /etc/drbd.d/global_common.conf 
global { 
        usage-count no; 

common { 
        protocol C; 
        disk { 
                on-io-error detach; 
                no-disk-flushes; 
                no-md-flushes;  
        } 
        net { 
                sndbuf-size 512k; 
                max-buffers     8000; 
                unplug-watermark   1024; 
                max-epoch-size  8000; 
                cram-hmac-alg "sha1"; 
                shared-secret "hdhwXes23sYEhart8t"; 
                after-sb-0pri disconnect; 
                after-sb-1pri disconnect; 
                after-sb-2pri disconnect; 
                rr-conflict disconnect; 
        } 
        syncer { 
                rate 300M; 
                al-extents 517; 
        } 
}

resource data { 
      on app1 { 
               device    /dev/drbd0; 
               disk      /dev/sdb1; 
               address   10.10.10.24:7788; 
               meta-disk internal; 
      } 
      on app2 { 
               device     /dev/drbd0; 
               disk       /dev/sdb1; 
               address    10.10.10.25:7788; 
               meta-disk internal; 
      } 
}

 

4. 初始化资源

在app1和app2上分别执行:

# drbdadm create-md data

initializing activity log 
NOT initializing bitmap 
Writing meta data... 
New drbd meta data block successfully created.

 

5. 启动服务

在app1和app2上分别执行:或采用 drbdadm up data

# service drbd start

Starting DRBD resources: [ 
     create res: data 
   prepare disk: data 
    adjust disk: data 
     adjust net: data 

.......... 
#

 

6. 查看启动状态, 两节点应均处于Secondary状态。

cat /proc/drbd       #或者直接使用命令drbd-overview

节点1: 
[root@app1 drbd.d]# cat /proc/drbd  
version: 8.4.5 (api:1/proto:86-101) 
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by root@node1.magedu.com, 2015-01-02 12:06:20
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- 
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:20964116


节点2: 
[root@app2 drbd.d]# cat /proc/drbd  
version: 8.4.5 (api:1/proto:86-101) 
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by root@node1.magedu.com, 2015-01-02 12:06:20
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- 
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:20964116

 

7. 将其中一个节点配置为主节点

我们需要将其中一个节点设置为Primary,在要设置为Primary的节点上执行如下两条命令均可: 
drbdadm -- --overwrite-data-of-peer primary data   

主节点查看同步状态: 
[root@app1 drbd.d]# cat /proc/drbd  
version: 8.4.5 (api:1/proto:86-101) 
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by root@node1.magedu.com, 2015-01-02 12:06:20
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- 
    ns:1229428 nr:0 dw:0 dr:1230100 al:0 bm:0 lo:0 pe:2 ua:0 ap:0 ep:1 wo:d oos:19735828 
        [>...................] sync'ed:  5.9% (19272/20472)M 
        finish: 0:27:58 speed: 11,744 (11,808) K/sec 
[root@app1 drbd.d]#

 

8. 创建文件系统

文件系统的挂载只能在Primary节点进行,只有在设置了主节点后才能对drbd设备进行格式化, 格式化与手动挂载测试。

[root@app1 ~]# mkfs.ext4 /dev/drbd0 
[root@app1 ~]# mount /dev/drbd0 /data

 

三、安装配置NFS

1. app1,app2节点配置nfs

# vi /etc/exports 
/data 192.168.0.0/24(rw,no_root_squash)

 

2. app1,app2节点配置nfs

# service rpcbind start 
# service nfs start 
# chkconfig rpcbind on 
# chkconfig nfs on

 

四、corosync+pacemaker

1. app1,app2配置安装corosync pacemaker

# yum install corosync pacemaker -y

2. app1,app2安装crmsh

RHEL自6.4起不再提供集群的命令行配置工具crmsh,要实现对集群资源管理,还需要独立安装crmsh。 
crmsh的rpm安装可从如下地址下载:http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/

[root@app1 crm]# yum install python-dateutil -y   
说明:python-pssh、pssh依懒于python-dateutil包

[root@app1 crm]# rpm -ivh pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1-4.2.x86_64.rpm crmsh-2.1-1.6.x86_64.rpm 
warning: pssh-2.3.1-4.2.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID 17280ddf: NOKEY 
Preparing...                ########################################### [100%] 
   1:python-pssh            ########################################### [ 33%] 
   2:pssh                   ########################################### [ 67%] 
   3:crmsh                  ########################################### [100%] 
[root@app1 crm]# 
[root@app1 crm]#

 

3. 创建corosync配置文件,app1,app2一样。

cd /etc/corosync/ 
cp corosync.conf.example corosync.conf

vi /etc/corosync/corosync.conf 
# Please read the corosync.conf.5 manual page 
compatibility: whitetank 
totem {    
        version: 2 
        secauth: on 
        threads: 0 
        interface { 
                ringnumber: 0 
                bindnetaddr: 10.10.10.0 
                mcastaddr: 226.94.8.8 
                mcastport: 5405 
                ttl: 1 
        } 
}

logging { 
        fileline: off 
        to_stderr: no 
        to_logfile: yes 
        to_syslog: no 
        logfile: /var/log/cluster/corosync.log 
        debug: off 
        timestamp: on 
        logger_subsys { 
                subsys: AMF 
                debug: off 
        } 
}

amf { 
        mode: disabled 
}

service { 
        ver:  1                   
        name: pacemaker        

aisexec { 
        user: root 
        group:  root 
}

 

4. 创建认证文件,app1,app2一样

各节点之间通信需要安全认证,需要安全密钥,生成后会自动保存至当前目录下,命名为authkey,权限为400。

[root@app1 corosync]# corosync-keygen 
Corosync Cluster Engine Authentication key generator. 
Gathering 1024 bits for key from /dev/random. 
Press keys on your keyboard to generate entropy. 
Press keys on your keyboard to generate entropy (bits = 128). 
Press keys on your keyboard to generate entropy (bits = 192). 
Press keys on your keyboard to generate entropy (bits = 256). 
Press keys on your keyboard to generate entropy (bits = 320). 
Press keys on your keyboard to generate entropy (bits = 384). 
Press keys on your keyboard to generate entropy (bits = 448). 
Press keys on your keyboard to generate entropy (bits = 512). 
Press keys on your keyboard to generate entropy (bits = 576). 
Press keys on your keyboard to generate entropy (bits = 640). 
Press keys on your keyboard to generate entropy (bits = 704). 
Press keys on your keyboard to generate entropy (bits = 768). 
Press keys on your keyboard to generate entropy (bits = 832). 
Press keys on your keyboard to generate entropy (bits = 896). 
Press keys on your keyboard to generate entropy (bits = 960). 
Writing corosync key to /etc/corosync/authkey. 
[root@app1 corosync]#

 

5. 将刚才配置的两个文件同步至app2

# scp authkeys corosync.conf  root@app2:/etc/corosync/  

 

6. 启动corosync\pacemaker服务,测试能否正常提供服务

节点1:   
[root@app1 ~]# service corosync start    
Starting Corosync Cluster Engine (corosync):               [OK]

[root@app1 ~]# service pacemaker start 
Starting Pacemaker Cluster Manager                         [OK]

配置服务开机自启动: 
chkconfig corosync on 
chkconfig pacemaker on


节点2:   
[root@app2 ~]# service corosync start    
Starting Corosync Cluster Engine (corosync):               [OK]

[root@app1 ~]# service pacemaker start 
Starting Pacemaker Cluster Manager                         [OK]

配置服务开机自启动: 
chkconfig corosync on 
chkconfig pacemaker on

 

7. 测试corosync,pacemaker,crmsh安装情况

(1) 查看节点情况

[root@app1 ~]# crm status 
Last updated: Tue Jan 26 13:13:19 2016 
Last change: Mon Jan 25 17:46:04 2016 via cibadmin on app1 
Stack: classic openais (with plugin) 
Current DC: app1 - partition with quorum 
Version: 1.1.10-14.el6-368c726 
2 Nodes configured, 2 expected votes 
0 Resources configured

Online: [ app1 app2 ]

 

(2) 查看端口启动情况

# netstat -tunlp 
Active Internet connections (only servers) 
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
udp        0      0 10.10.10.25:5404            0.0.0.0:*                               2828/corosync       
udp        0      0 10.10.10.25:5405            0.0.0.0:*                               2828/corosync       
udp        0      0 226.94.8.8:5405             0.0.0.0:*                               2828/corosync      


(3) 查看日志

[root@app1 corosync]# tail -f  /var/log/cluster/corosync.log

可以查看日志中关键信息: 
Jan 23 16:09:30 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. 
Jan 23 16:09:30 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'. 
.... 
Jan 23 16:09:30 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). 
Jan 23 16:09:31 corosync [TOTEM ] The network interface [10.10.10.24] is now up. 
Jan 23 16:09:31 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. 
Jan 23 16:09:48 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. 
[root@app1 corosync]#

 

五、配置pacemaker

1. 基本配置

corosync默认启用了stonith功能,而我们要配置的集群并没有stonith设备,因此在配置集群的全局属性时要对其禁用。

# crm 
crm(live)# configure                                      ##进入配置模式 
crm(live)configure# property stonith-enabled=false        ##禁用stonith设备 
crm(live)configure# property no-quorum-policy=ignore      ##不具备法定票数时采取的动作 
crm(live)configure# rsc_defaults resource-stickiness=100  ##设置默认的资源黏性,只对当前节点有效。 
crm(live)configure# verify                                ##校验 
crm(live)configure# commit                                ##校验没有错误再提交 
crm(live)configure# show                                  ##查看当前配置 
node app1 
node app2 
property cib-bootstrap-options: \ 
        dc-version=1.1.11-97629de \ 
        cluster-infrastructure="classic openais (with plugin)" \ 
        expected-quorum-votes=2 \ 
        stonith-enabled=false \ 
        default-resource-stickiness=100 \ 
        no-quorum-policy=ignore

或:

# crm configure property stonith-enabled=false 
# crm configure property no-quorum-policy=ignore 
# crm configure property default-resource-stickiness=100

 

2. 资源配置

#命令使用经验说明:verify报错的,可以直接退出,也可以采用edit编辑,修改正确为止。 
# crm configure edit  可以直接编辑配置文件

 

(1) 添加VIP

不要单个资源提交,等所有资源及约束一起建立之后提交。 
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.0.26 cidr_netmask=24 nic=eth0:1 op monitor interval=30s timeout=20s on-fail=restart 
crm(live)configure# verify    #验证一下参数是否正确

说明:

primitive                   :定义资源命令 
myip                        :资源ID名,可自行定义 
ocf:heartbeat:IPaddr        :资源代理(RA) 
params ip=192.168.0.26      :定义VIP 
op monitor                  :监控该资源 
interval=30s                :间隔时间 
timeout=20s                 :超时时间 
on-fail=restart             :如服务非正常关闭,让其重启,如重启不了,再转移至其他节点

 

(2) 添加drdb服务

crm(live)configure# primitive mydrbd ocf:linbit:drbd params drbd_resource=data op monitor role=Master interval=20 timeout=30 op monitor role=Slave interval=30 timeout=30 op start timeout=240 op stop timeout=100 
crm(live)configure# verify

把drbd设为主从资源:

crm(live)configure# ms ms_mydrbd mydrbd meta master-max=1 master-node-max=1 clone-max=2  clone-node-max=1 notify=true 
crm(live)configure# verify

 

(3) 文件系统挂载服务:

crm(live)configure# primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/data fstype=ext4 op start timeout=60s op stop timeout=60s op monitor interval=30s timeout=40s on-fail=restart 
crm(live)configure# verify

 

(3) 创建约束,很关键,VIP,DRBD, 目录挂载均在一台节点上,而且VIP,目录挂载均依懒于主DRBD.

创建组资源,vip与mystore一起。                    
crm(live)configure# group g_service vip mystore 
crm(live)configure# verify

创建位置约束,组资源的启动依懒于drbd主节点 
crm(live)configure# colocation c_g_service inf: g_service ms_mydrbd:Master 
创建位置约整,mystore存储挂载依赖于drbd主节点 
crm(live)configure# colocation mystore_with_drbd_master inf: mystore ms_mydrbd:Master 
启动顺序依懒,drbd启动后,创建g_service组资源 
crm(live)configure# order o_g_service inf: ms_mydrbd:promote g_service:start 
crm(live)configure# verify 
crm(live)configure# commit

 

3. 配置完成后,查看状态

[root@app1 ~]# crm status 
Last updated: Mon Jan 25 22:24:55 2016 
Last change: Mon Jan 25 22:24:46 2016 
Stack: classic openais (with plugin) 
Current DC: app2 - partition with quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 2 expected votes 
4 Resources configured

Online: [ app1 app2 ]

Master/Slave Set: ms_mydrbd [mydrbd] 
     Masters: [ app1 ] 
     Slaves: [ app2 ] 
Resource Group: g_service 
     vip        (ocf::heartbeat:IPaddr):        Started app1 
     mystore    (ocf::heartbeat:Filesystem):    Started app1 
[root@app1 ~]#

#说明:切换测试时有时会出现警告提示,影响真实状态查看,可以采用如下方式清除,提示哪个资源报警就清哪个,清理后,再次crm status查看状态显示正常。 
Failed actions: 
mystore_stop_0 on app1 'unknown error' (1): call=97, status=complete, last-rc-change='Tue Jan 26 14:39:21 2016', queued=6390ms, exec=0ms

[root@app1 ~]# crm resource cleanup mystore 
Cleaning up mystore on app1 
Cleaning up mystore on app2 
Waiting for 2 replies from the CRMd.. OK 
[root@app1 ~]#

(1) 查看DRBD挂载目录

[root@app2 ~]# df -h 
Filesystem            Size  Used Avail Use% Mounted on 
/dev/mapper/vg_app2-lv_root 
                       35G  3.7G   30G  11% / 
tmpfs                 497M   45M  452M  10% /dev/shm 
/dev/sda1             477M   34M  418M   8% /boot 
192.168.1.26:/data     20G   44M   19G   1% /mnt 
/dev/drbd0             20G   44M   19G   1% /data 
[root@app2 ~]#

(2) 查看DRBD主备情况

[root@app2 ~]# cat /proc/drbd 
version: 8.4.5 (api:1/proto:86-101) 
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by root@node1.magedu.com, 2015-01-02 12:06:20
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- 
    ns:20484 nr:336 dw:468 dr:21757 al:4 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

[root@app1 ~]# cat /proc/drbd 
version: 8.4.5 (api:1/proto:86-101) 
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by root@node1.magedu.com, 2015-01-02 12:06:20
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- 
    ns:0 nr:20484 dw:20484 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0


(3) NFS客户端挂载读写正常

[root@vm15 ~]# df -h 
Filesystem            Size  Used Avail Use% Mounted on 
/dev/sda3              21G  4.6G   15G  24% / 
/dev/sda1              99M   23M   72M  25% /boot 
tmpfs                 7.4G     0  7.4G   0% /dev/shm 
/dev/mapper/vg-data    79G   71G  4.2G  95% /data 
192.168.0.26:/data/   5.0G  138M  4.6G   3% /mnt 
[root@vm15 ~]# 
[root@vm15 ~]# 
[root@vm15 ~]# cd /mnt 
[root@vm15 mnt]# ls 
abc.txt  lost+found 
[root@vm15 mnt]# cp abc.txt a.txt 
[root@vm15 mnt]# 
[root@vm15 mnt]# 
[root@vm15 mnt]# ls 
a.txt  abc.txt  lost+found 
[root@vm15 mnt]# 
[root@vm15 mnt]# 
[root@vm15 mnt]#

 

4. 关机节点1测试

(1) 关闭app1节点,资源全都在节点2启动

[root@app2 ~]# crm status 
Last updated: Tue Jan 26 13:31:54 2016 
Last change: Tue Jan 26 13:30:21 2016 via cibadmin on app1 
Stack: classic openais (with plugin) 
Current DC: app2 - partition with quorum 
Version: 1.1.10-14.el6-368c726 
2 Nodes configured, 2 expected votes 
4 Resources configured


Online: [ app2 ] 
OFFLINE: [ app1 ]

Master/Slave Set: ms_mydrbd [mydrbd] 
     Masters: [ app2 ] 
     Stopped: [ app1 ] 
Resource Group: g_service 
     vip        (ocf::heartbeat:IPaddr):        Started app2 
     mystore    (ocf::heartbeat:Filesystem):    Started app2 
[root@app2 ~]#

(2) 磁盘目录挂载成功 
[root@app2 ~]# df -h 
Filesystem                   Size  Used Avail Use% Mounted on 
/dev/mapper/vg_app2-lv_root   36G  3.7G   30G  11% / 
tmpfs                       1004M   44M  960M   5% /dev/shm 
/dev/sda1                    485M   39M  421M   9% /boot 
/dev/drbd0                   5.0G  138M  4.6G   3% /data 
[root@app2 ~]#

(3) DRBD也切换成了主节点: 
[root@app2 ~]# cat /proc/drbd 
version: 8.4.3 (api:1/proto:86-101) 
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00 
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- 
    ns:0 nr:144 dw:148 dr:689 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:4 
[root@app2 ~]#

节点1启动后,可以直接加入,资源也无需要再次切换。

 

5. 节点切换测试

# crm node standby app2         #app2离线

查看资源,节点资源直接切换到app1上面,还是重启效果好。 
[root@app1 ~]# crm status 
Last updated: Tue Jan 26 14:30:05 2016 
Last change: Tue Jan 26 14:29:59 2016 via crm_attribute on app2 
Stack: classic openais (with plugin) 
Current DC: app2 - partition with quorum 
Version: 1.1.10-14.el6-368c726 
2 Nodes configured, 2 expected votes 
4 Resources configured

Node app2: standby 
Online: [ app1 ]

Master/Slave Set: ms_mydrbd [mydrbd] 
     Masters: [ app1 ] 
     Stopped: [ app2 ] 
Resource Group: g_service 
     vip        (ocf::heartbeat:IPaddr):        Started app1 
     mystore    (ocf::heartbeat:Filesystem):    Started app1 
[root@app1 ~]#


6. 配置stonith,之前配置是关闭的,这里补充一下主要测试功能,实现环境中例如IBM x系列服务器可以采用ipmi等stonith设备配置。

本文采用VMware ESXi5.1虚拟机,stonith也是采用VMware ESXi的fence设备fence_vmware_soap
注:在测试corosync+pacemaker过程中出现无法快速reboot/shutdown.stonith对一些服务器无法重启时配置该操作很有用。

需要在app1,app2安装fence-agents安装包。

# yum install fence-agents

安装之后位置以及stonith测试功能
[root@app1 ~]# /usr/sbin/fence_vmware_soap -a 192.168.0.61 -z -l root -p 876543 -o list  
...
...                  
DRBD_HEARTBEAT_APP1,564d09c3-e8ee-9a01-e5f4-f1b11f03c810
DRBD_HEARTBEAT_APP2,564dddb8-f4bf-40e6-dbad-9b97b97d3d25
...
...

例如:重启虚拟机:
[root@app1 ~]# /usr/sbin/fence_vmware_soap -a 192.168.0.61 -z -l root -p 876543 -n DRBD_HEARTBEAT_APP2 -o reboot


[root@app1 ~]# crm
crm(live)# configure
crm(live)configure# primitive vm-fence-app1 stonith:fence_vmware_soap params ipaddr=192.168.0.61 login=root passwd=876543 port=app1 ssl="1" pcmk_host_list="DRBD_HEARTBEAT_APP1" retry_on="10" shell_timeout="120" login_timeout="120" action="reboot" op start interval="0" timeout="120"
crm(live)configure# primitive vm-fence-app2 stonith:fence_vmware_soap params ipaddr=192.168.0.61 login=root passwd=876543 port=app2 ssl="1" pcmk_host_list="DRBD_HEARTBEAT_APP2" retry_on="10" shell_timeout="120" login_timeout="120" action="reboot" op start interval="0" timeout="120"
crm(live)configure# location l-vm-fence-app1 vm-fence-app1 -inf: app1
crm(live)configure# location l-vm-fence-app2 vm-fence-app2 -inf: app2
crm(live)configure# property stonith-enabled=true
crm(live)configure# verify
crm(live)configure# commit

[root@app1 ~]# crm status
Last updated: Tue Jan 26 16:50:53 2016
Last change: Tue Jan 26 16:50:27 2016 via crmd on app2
Stack: classic openais (with plugin)
Current DC: app2 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
6 Resources configured

Online: [ app1 app2 ]

 Master/Slave Set: ms_mydrbd [mydrbd]
     Masters: [ app2 ]
     Slaves: [ app1 ]
 Resource Group: g_service
     vip        (ocf::heartbeat:IPaddr):        Started app2 
     mystore    (ocf::heartbeat:Filesystem):    Started app2 
 vm-fence-app1  (stonith:fence_vmware_soap):    Started app2 
 vm-fence-app2  (stonith:fence_vmware_soap):    Started app1

查看整个配置文件:

[root@app1 ~]# crm 
crm(live)# configure
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib num_updates="4" dc-uuid="app2" update-origin="app2" crm_feature_set="3.0.7" validate-with="pacemaker-1.2" update-client="crmd" epoch="91" admin_epoch="0" cib-last-written="Tue Jan 26 16:50:27 2016" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6-368c726"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
        <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
        <nvpair name="default-resource-stickiness" value="100" id="cib-bootstrap-options-default-resource-stickiness"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1453798227"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="app2" uname="app2">
        <instance_attributes id="nodes-app2">
          <nvpair id="nodes-app2-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="app1" uname="app1">
        <instance_attributes id="nodes-app1">
          <nvpair id="nodes-app1-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive id="vm-fence-app1" class="stonith" type="fence_vmware_soap">
        <instance_attributes id="vm-fence-app1-instance_attributes">
          <nvpair name="ipaddr" value="192.168.0.61" id="vm-fence-app1-instance_attributes-ipaddr"/>
          <nvpair name="login" value="root" id="vm-fence-app1-instance_attributes-login"/>
          <nvpair name="passwd" value="xjj876543" id="vm-fence-app1-instance_attributes-passwd"/>
          <nvpair name="port" value="app1" id="vm-fence-app1-instance_attributes-port"/>
          <nvpair name="ssl" value="1" id="vm-fence-app1-instance_attributes-ssl"/>
          <nvpair name="pcmk_host_list" value="DRBD_HEARTBEAT_APP1" id="vm-fence-app1-instance_attributes-pcmk_host_list"/>
          <nvpair name="retry_on" value="10" id="vm-fence-app1-instance_attributes-retry_on"/>
          <nvpair name="shell_timeout" value="120" id="vm-fence-app1-instance_attributes-shell_timeout"/>
          <nvpair name="login_timeout" value="120" id="vm-fence-app1-instance_attributes-login_timeout"/>
          <nvpair name="action" value="reboot" id="vm-fence-app1-instance_attributes-action"/>
        </instance_attributes>
        <operations>
          <op name="start" interval="0" timeout="120" id="vm-fence-app1-start-0"/>
        </operations>
      </primitive>
      <primitive id="vm-fence-app2" class="stonith" type="fence_vmware_soap">
        <instance_attributes id="vm-fence-app2-instance_attributes">
          <nvpair name="ipaddr" value="192.168.0.61" id="vm-fence-app2-instance_attributes-ipaddr"/>
          <nvpair name="login" value="root" id="vm-fence-app2-instance_attributes-login"/>
          <nvpair name="passwd" value="xjj876543" id="vm-fence-app2-instance_attributes-passwd"/>
          <nvpair name="port" value="app2" id="vm-fence-app2-instance_attributes-port"/>
          <nvpair name="ssl" value="1" id="vm-fence-app2-instance_attributes-ssl"/>
          <nvpair name="pcmk_host_list" value="DRBD_HEARTBEAT_APP2" id="vm-fence-app2-instance_attributes-pcmk_host_list"/>
          <nvpair name="retry_on" value="10" id="vm-fence-app2-instance_attributes-retry_on"/>
          <nvpair name="shell_timeout" value="120" id="vm-fence-app2-instance_attributes-shell_timeout"/>
          <nvpair name="login_timeout" value="120" id="vm-fence-app2-instance_attributes-login_timeout"/>
          <nvpair name="action" value="reboot" id="vm-fence-app2-instance_attributes-action"/>
        </instance_attributes>
        <operations>
          <op name="start" interval="0" timeout="120" id="vm-fence-app2-start-0"/>
        </operations>
      </primitive>
      <group id="g_service">
        <primitive id="vip" class="ocf" provider="heartbeat" type="IPaddr">
          <instance_attributes id="vip-instance_attributes">
            <nvpair name="ip" value="192.168.0.26" id="vip-instance_attributes-ip"/>
            <nvpair name="cidr_netmask" value="24" id="vip-instance_attributes-cidr_netmask"/>
            <nvpair name="nic" value="eth0:1" id="vip-instance_attributes-nic"/>
          </instance_attributes>
          <operations>
            <op name="monitor" interval="30s" timeout="20s" on-fail="restart" id="vip-monitor-30s"/>
          </operations>
        </primitive>
        <primitive id="mystore" class="ocf" provider="heartbeat" type="Filesystem">
          <instance_attributes id="mystore-instance_attributes">
            <nvpair name="device" value="/dev/drbd0" id="mystore-instance_attributes-device"/>
            <nvpair name="directory" value="/data" id="mystore-instance_attributes-directory"/>
            <nvpair name="fstype" value="ext4" id="mystore-instance_attributes-fstype"/>
          </instance_attributes>
          <operations>
            <op name="start" timeout="60s" interval="0" id="mystore-start-0"/>
            <op name="stop" timeout="60s" interval="0" id="mystore-stop-0"/>
            <op name="monitor" interval="30s" timeout="40s" on-fail="restart" id="mystore-monitor-30s"/>
          </operations>
        </primitive>
      </group>
      <master id="ms_mydrbd">
        <meta_attributes id="ms_mydrbd-meta_attributes">
          <nvpair name="master-max" value="1" id="ms_mydrbd-meta_attributes-master-max"/>
          <nvpair name="master-node-max" value="1" id="ms_mydrbd-meta_attributes-master-node-max"/>
          <nvpair name="clone-max" value="2" id="ms_mydrbd-meta_attributes-clone-max"/>
          <nvpair name="clone-node-max" value="1" id="ms_mydrbd-meta_attributes-clone-node-max"/>
          <nvpair name="notify" value="true" id="ms_mydrbd-meta_attributes-notify"/>
        </meta_attributes>
        <primitive id="mydrbd" class="ocf" provider="linbit" type="drbd">
          <instance_attributes id="mydrbd-instance_attributes">
            <nvpair name="drbd_resource" value="data" id="mydrbd-instance_attributes-drbd_resource"/>
          </instance_attributes>
          <operations>
            <op name="monitor" role="Master" interval="20" timeout="30" id="mydrbd-monitor-20"/>
            <op name="monitor" role="Slave" interval="30" timeout="30" id="mydrbd-monitor-30"/>
            <op name="start" timeout="240" interval="0" id="mydrbd-start-0"/>
            <op name="stop" timeout="100" interval="0" id="mydrbd-stop-0"/>
          </operations>
        </primitive>
      </master>
    </resources>
    <constraints>
      <rsc_colocation id="c_g_service" score="INFINITY" rsc="g_service" with-rsc="ms_mydrbd" with-rsc-role="Master"/>
      <rsc_colocation id="mystore_with_drbd_master" score="INFINITY" rsc="mystore" with-rsc="ms_mydrbd" with-rsc-role="Master"/>
      <rsc_order id="o_g_service" score="INFINITY" first="ms_mydrbd" first-action="promote" then="g_service" then-action="start"/>
      <rsc_location id="l-vm-fence-app1" rsc="vm-fence-app1" score="-INFINITY" node="app1"/>
      <rsc_location id="l-vm-fence-app2" rsc="vm-fence-app2" score="-INFINITY" node="app2"/>
    </constraints>
  </configuration>
</cib>


清空资源,重新配置操作方法:

[root@app2 ~]# crm status         
Last updated: Wed Jan 27 10:39:24 2016
Last change: Tue Jan 26 16:50:27 2016 via crmd on app2
Stack: classic openais (with plugin)
Current DC: app2 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
6 Resources configured


Online: [ app1 app2 ]

 Master/Slave Set: ms_mydrbd [mydrbd]
     Masters: [ app2 ]
     Slaves: [ app1 ]
 Resource Group: g_service
     vip        (ocf::heartbeat:IPaddr):        Started app2 
     mystore    (ocf::heartbeat:Filesystem):    Started app2 
 vm-fence-app1  (stonith:fence_vmware_soap):    Started app2 
 vm-fence-app2  (stonith:fence_vmware_soap):    Started app1
[root@app2 ~]# 
先依次关闭资源 :
[root@app2 ~]# 
[root@app2 ~]# crm resource stop vm-fence-app2
[root@app2 ~]# crm resource stop vm-fence-app1
[root@app2 ~]# crm resource stop mystore
[root@app2 ~]# crm resource stop vip
[root@app2 ~]# crm resource stop ms_mydrbd
[root@app2 ~]# crm status
Last updated: Wed Jan 27 10:40:28 2016
Last change: Wed Jan 27 10:40:23 2016 via cibadmin on app2
Stack: classic openais (with plugin)
Current DC: app2 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
6 Resources configured


Online: [ app1 app2 ]
[root@app2 ~]# 
再清空配置:
[root@app2 ~]# crm configure erase
INFO: resource references in colocation:c_g_service updated
INFO: resource references in colocation:mystore_with_drbd_master updated
INFO: resource references in order:o_g_service updated
[root@app2 ~]# 
[root@app2 ~]# 
[root@app2 ~]# crm status         
Last updated: Wed Jan 27 10:40:58 2016
Last change: Wed Jan 27 10:40:52 2016 via crmd on app2
Stack: classic openais (with plugin)
Current DC: app2 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured


Online: [ app1 app2 ]
[root@app2 ~]# 
[root@app2 ~]#

就可以再次重新配置了。

8. 配置小结:

之前多次未成功配置的成功主要在于资源的排列与定位启动上面,造成切换,启动均不成功,这个也是corosync+pacemaker的配置要理解的重点, DRBD+可以实现很多种组合,本文仅提供技术实现参考。


本文出自 “koumm的linux技术博客” 博客,请务必保留此出处http://koumm.blog.51cto.com/703525/1738795