该公司及其合作方现有网络及笔者的设计如下所示:
静态路由实现负载均衡和高可用
 
分析:由于希望实现线路都正常时负载均衡,读者的第一反应肯定是用HSRP来搞定,因为HSRP除了HA(高可用)功能之外,还有一个功能,就是负载均衡。但是,仔细再想想,HSRP的负载均衡功能在这里可能并不好用,因为两个客户的网络里面都只有一台机器(应用服务器)用到HSRP,而HSRP的负载均衡需要在多组服务器里面/或多个VLAN里面做(实际上就是让多组服务器使用多个路由器/多层交换机作为网关,从而实现负载在多个路由器/多层交换机分担)。那怎么办呢?笔者选择的是配置多条静态路由来实现负载在两条线路上分担。具体配置下面会讲。当然,还是会使用HSRP的,因为它的高可用(HA)嘛。
仔细看看图,还会看到一个比较麻烦的问题,两个网络的内网地址范围是一样的,都是172.17.1.0/24,而且该公司不希望改变内网IP地址,因为有些程序绑定了IP,如果换IP的话,牵涉的变动比较多。笔者决定使用NAT来克服这个困难。
下面将具体的配置:
首先,笔者打算在不使用NAT的情况下,配置并调试好HSRP,负载均衡和容错。
为此,笔者在R1和R3上各配置了一个loopback,来模拟客户A和客户B的应用服务器。
由于HSRP的配置比较简单,下面列出各个路由器上HSRP的配置,请注意其中的HSRP
接口track功能。
R1#sh run int f1/0
interface FastEthernet1/0
 ip address 172.17.1.2 255.255.255.0
 standby 10 ip 172.17.1.1
 standby 10 preempt
 standby 10 track Serial0/0 20  <--- 如果S0/0 down了,该HSRP实例的优先级会降低20,
                                                         由于R1和R3使用默优先级(100),而R2,R4配置的优先
                                                         级为90。所以,一旦线路一down了,HSRP活动路由器自
                                                          动会切换到R2和R4。
end
R2#sh run int f1/0
interface FastEthernet1/0
 ip address 172.17.1.3 255.255.255.0
 standby 10 ip 172.17.1.1
 standby 10 priority 90
 standby 10 preempt
 standby 10 track Serial0/0 20
end
R3#sh run int f1/0
interface FastEthernet1/0
 ip address 172.17.1.2 255.255.255.0
 standby 20 ip 172.17.1.1
 standby 20 preempt
 standby 20 track Serial0/0 20
end
R4#sh run int f1/0
interface FastEthernet1/0
 ip address 172.17.1.3 255.255.255.0
 standby 20 ip 172.17.1.1
 standby 20 priority 90
 standby 20 preempt
 standby 20 track Serial0/0 20
end
验证HSRP的命令:show standby。该命令的输出,就不列出来了。
 
配好了HSRP,接着就改配loopback和静态路由了。笔者选择用静态路由来实现负载分担。可能读者会想,为什么不用动态协议来实现负载均衡呢,比如用EIGRP就可以实现啊。确实可以用诸如EIGRP之类的路由协议来实现。不过可能需要调整某些接口的bandwidth,delay和variance 变量值。还是静态路由比较简单吧。分析一下网络就可以发现,只需要在R1上配置到达172.17.252.0/24子网的下一跳路由器为R3和R1,在R2上配置到达172.17.252.0/24子网的下一跳路由器为R4,在R3上配置到达172.17.251.0/24子网的下一跳路由器为R1和R4,在R4上配置到达172.17.252.0/24子网的下一跳路由器为R2,并且每条静态路由设置同样的metric,比如用默认值,就实现了两条线路负载分担。(读者请想想,为什么在R2和R4上不需要配置为相应的目标子网配置两个下一跳路由器?)
 
下面是两条线路都正常时,各个路由器上的接口和路由:
R1#sh ip int bri
Interface                      IP-Address      OK?   Method   Status                Proocol
Serial0/0                      10.0.0.1           YES    manual     up                     up
FastEthernet1/0          172.17.1.2        YES   manual     up                     up
Loopback0                  172.17.251.1    YES   manual     up                     up
R1#sh ip route
......
     172.17.0.0/24 is subnetted, 3 subnets
S       172.17.252.0 [1/0] via 172.17.1.3  <-- next hop R2
                               [1/0] via 10.0.0.2     <--  next hop R3
C       172.17.251.0 is directly connected, Loopback0
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.0.0 is directly connected, Serial0/0
 
R2#sh ip route
......
S       172.17.252.0 [1/0] via 10.0.1.2
S       172.17.251.0 [1/0] via 172.17.1.2
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.1.0 is directly connected, Serial0/0
 
R3#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
C       172.17.252.0 is directly connected, Loopback0
S       172.17.251.0 [1/0] via 172.17.1.3   <-- next hop R4
                              [1/0] via 10.0.0.1        <-- next hop R1
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.0.0 is directly connected, Serial0/0
 
R4#sh ip int bri
Interface                  IP-Address      OK? Method Status                Protocol
Serial0/0                  10.0.1.2        YES manual up                    up
FastEthernet1/0            172.17.1.3      YES manual up                    up
R4#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
S       172.17.252.0 [1/0] via 172.17.1.2
S       172.17.251.0 [1/0] via 10.0.1.1
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.1.0 is directly connected, Serial0/0

配置好了,我们来验证一下。我们在R1上使用扩展ping,并且记录路由:
R1#ping
Protocol [ip]:
Target IP address: 172.17.252.1
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.17.251.1
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: R
Number of hops [ 9 ]:
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
......
Record route:
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.252.1)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   (172.17.251.1) <*>
   (0.0.0.0)
 End of list
Reply to request 1 (56 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (10.0.0.1)
   (172.17.252.1)
   (10.0.0.2)
   (172.17.251.1) <*>
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
 End of list
Reply to request 2 (208 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (172.17.1.2)
   (10.0.1.1)
......
可以看出,两条线路都正常时,负载跑在两条线路上。
 
下面我们看看线路一失效后各个路由器上的接口和路由:
R1#sh ip int bri
Interface                  IP-Address      OK? Method Status                Protocol
Serial0/0                  10.0.0.1        YES manual administratively down down
FastEthernet1/0            172.17.1.2      YES manual up                    up
Loopback0                  172.17.251.1    YES manual up                    up
R1#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
S       172.17.252.0 [1/0] via 172.17.1.3
C       172.17.251.0 is directly connected, Loopback0
C       172.17.1.0 is directly connected, FastEthernet1/0
 
R2#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
S       172.17.252.0 [1/0] via 10.0.1.2
S       172.17.251.0 [1/0] via 172.17.1.2
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.1.0 is directly connected, Serial0/0
 
R3#sh ip int bri
Interface                  IP-Address      OK? Method Status                Protocol
Serial0/0                  10.0.0.2        YES manual up                    down
FastEthernet1/0            172.17.1.2      YES manual up                    up
Loopback0                  172.17.252.1    YES manual up                    up
R3#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
C       172.17.252.0 is directly connected, Loopback0
S       172.17.251.0 [1/0] via 172.17.1.3
C       172.17.1.0 is directly connected, FastEthernet1/0
 
R4#sh ip route
     172.17.0.0/24 is subnetted, 3 subnets
S       172.17.252.0 [1/0] via 172.17.1.2
S       172.17.251.0 [1/0] via 10.0.1.1
C       172.17.1.0 is directly connected, FastEthernet1/0
     10.0.0.0/30 is subnetted, 1 subnets
C       10.0.1.0 is directly connected, Serial0/0
 
我们再次在R1上执行扩展ping,并且记录路由:
R1#ping
Protocol [ip]:
Target IP address: 172.17.252.1
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.17.251.1
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: R
Number of hops [ 9 ]:
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Reply to request 0 (172 ms).
 Record route:
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.252.1)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   (172.17.251.1) <*>
   (0.0.0.0)
 End of list
Reply to request 1 (208 ms).  
 Record route:
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.252.1)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   (172.17.251.1) <*>
   (0.0.0.0)
 End of list
Reply to request 2 (128 ms).  
 Record route:
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.252.1)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   (172.17.251.1) <*>
   (0.0.0.0)
 End of list
Success rate is 100 percent (3/3), round-trip min/avg/max = 128/169/208 ms
R1#
可以看出,一条线路失败后,所有负载都跑在正常的线路上。
至此,我们已经搞定了在不使用NAT情况下的高可用性,负载分担和容错。
接下来,我们配置NAT。
 
NAT的配置比较简单,我们把客户A的网络NAT为172.17.251.0/24地址,客户B的网络NAT为172.17.252.0/24地址,且假定客户A和客户B的应用服务器的IP都是172.17.1.100,配置如下所示:
R1#sh run
interface Serial0/0
 ip address 10.0.0.1 255.255.255.252
 ip nat outside
!
interface FastEthernet1/0
 ip address 172.17.1.2 255.255.255.0
 ip nat inside
!
ip nat inside source static 172.17.1.100 172.17.251.100
 
R2#sh run
interface Serial0/0
 ip address 10.0.1.1 255.255.255.252
 ip nat outside
!
interface FastEthernet1/0
 ip address 172.17.1.3 255.255.255.0
 ip nat inside
!
ip nat inside source static 172.17.1.100 172.17.251.100
 
R3#sh run
interface Serial0/0
 ip address 10.0.0.2 255.255.255.252
 ip nat outside
!
interface FastEthernet1/0
 ip address 172.17.1.2 255.255.255.0
 ip nat inside
!
ip nat inside source static 172.17.1.100 172.17.252.100
 
R4#sh run
interface Serial0/0
 ip address 10.0.1.2 255.255.255.252
 ip nat outside
!
interface FastEthernet1/0
 ip address 172.17.1.3 255.255.255.0
 ip nat inside
!
ip nat inside source static 172.17.1.100 172.17.252.100

然后,在客户A的应用服务器上(笔者在设计阶段,用的是模拟器,故用模拟的路由器代替客户A和B的应用服务器)执行扩展ping,并记录路由:
ClientA#ping
Protocol [ip]:
Target IP address: 172.17.252.100
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Number of hops [ 9 ]:
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Reply to request 0 (292 ms). 
 Record route:
   (172.17.1.100)
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.1.100)
   (172.17.1.100)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   <*>
 End of list
Reply to request 1 (224 ms). 
 Record route:
   (172.17.1.100)
   (10.0.0.1)
   (172.17.1.2)
   (172.17.1.100)
   (172.17.1.100)
   (10.0.0.2)
   (172.17.1.2)
   (172.17.1.100) <*>
   (0.0.0.0)
 End of list
Reply to request 2 (240 ms).  Received packet has
 Total option bytes= 40, padded length=40
 Record route:
   (172.17.1.100)
   (172.17.1.2)
   (10.0.1.1)
   (172.17.1.3)
   (172.17.1.100)
   (172.17.1.100)
   (172.17.1.2)
   (10.0.1.2)
   (172.17.1.3)
   <*>
 End of list
......
可以看出,我们在路由器上配置了NAT后,两条线路都正常时,负载跑在两条线路上。
至于一条线路失效后的情况,也跟前面未配置NAT时的情况一样,跟我们预期的一致。
笔者配置了NAT后,在测试的时候,先用了traceroute命令,执行了很多次,每次都跑的同一条线路,笔者还以为加入NAT后,静态路由负载均衡不起作用了。后来,用扩展ping的时候,才得到预期的效果。
 
关于NAT,笔者不打算说什么,cisco的网站上有很经典的配置文档。附件就是一份比较不错的文档。
 
还需要指出的一点是,由于该设计到两个公司的网络,在几个路由器上都配置了严格的ACL,具体配置,就不列出来了。
 
笔者没有仔细区分“负载均衡”和“负载分担”,总觉得有的时候用负载分担比用负载均衡准确。