Octavia 的 Amphorae 故障转移实现

为了保证 LBaaS 的持续服务能力,Octavia 实现了一套虚拟机级别的故障迁移机制 —— Health Manager

Health Manager

Health Manager – This subcomponent monitors individual amphorae to ensure they are up and running, and otherwise healthy. It also handles failover events if amphorae fail unexpectedly.

简单的说,Health Manager 主要用于监控每个 amphora 虚拟机的健康状态,并在故障发生时自动完成故障转移流程,以此保障负载均衡的高可用性。可见,掌握 Health Manager Service 就是搞清楚两个问题:

  1. 如何监控?

  2. 如何转移故障?

Heartbeat 心跳健康

惯例从入口 octavia/cmd/health_manager.py 看起 ,启动 octavia-health-manager service 会加载 UDPStatusGetter.check() 和 HealthManager.health_check() 两个功能函数。首先看前者的实现:

  1. # file: /opt/rocky/octavia/octavia/amphorae/drivers/health/heartbeat_udp.py



  2. class UDPStatusGetter(object):

  3. """This class defines methods that will gather heatbeats


  4. The heartbeats are transmitted via UDP and this class will bind to a port

  5. and absorb them

  6. """

  7. def __init__(self):

  8. self.key = cfg.CONF.health_manager.heartbeat_key

  9. self.ip = cfg.CONF.health_manager.bind_ip

  10. self.port = cfg.CONF.health_manager.bind_port

  11. self.sockaddr = None

  12. LOG.info('attempting to listen on %(ip)s port %(port)s',

  13. {'ip': self.ip, 'port': self.port})

  14. self.sock = None

  15. self.update(self.key, self.ip, self.port)


  16. self.executor = futures.ProcessPoolExecutor(

  17. max_workers=cfg.CONF.health_manager.status_update_threads)

  18. self.repo = repositories.Repositories().amphorahealth


  19. def update(self, key, ip, port):

  20. """Update the running config for the udp socket server


  21. :param key: The hmac key used to verify the UDP packets. String

  22. :param ip: The ip address the UDP server will read from

  23. :param port: The port the UDP server will read from

  24. :return: None

  25. """

  26. self.key = key

  27. for addrinfo in socket.getaddrinfo(ip, port, 0, socket.SOCK_DGRAM):

  28. ai_family = addrinfo[0]

  29. self.sockaddr = addrinfo[4]

  30. if self.sock is not None:

  31. self.sock.close()

  32. self.sock = socket.socket(ai_family, socket.SOCK_DGRAM)

  33. self.sock.settimeout(1)

  34. self.sock.bind(self.sockaddr)

  35. if cfg.CONF.health_manager.sock_rlimit > 0:

  36. rlimit = cfg.CONF.health_manager.sock_rlimit

  37. LOG.info("setting sock rlimit to %s", rlimit)

  38. self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF,

  39. rlimit)

  40. break # just used the first addr getaddrinfo finds

  41. if self.sock is None:

  42. raise exceptions.NetworkConfig("unable to find suitable socket")

Class:UDPStatusGetter 在 octavia-health-manager service 中负责接收从 amphora 发送过来的 heatbeats(心跳包),然后 prepare heatbeats 中的数据并持久化到数据库中。从 __init__() 看出 amphora 与 octavia-health-manager service 的通信实现是 UDP socket,socket 为 (CONF.health_manager.bind_ip,CONF.health_manager.bind_port)

NOTE:此处强调一下 amphora 与 octavia-health-manager service 的网络连通性。

1. 如果部署 Octavia 时直接使用 ext-net 作为 octavia 的 “lb-mgmt-net”,那么 CONF.healthmanager.bindip 应该是物理主机的 IP 地址,amphora 与 octavia-health-manager service 直接通过 OpenStack Management Network 进行通信。不过这种方式,amphora 会占用 ext-net 的 fixed ip,所以在生产环境中并不建议使用该方式。

2. 如果部署 Octavia 时使用另外创建的 tenant network 作为 lb-mgmt-net,那么 CONF.healthmanager.bindip 就应该是 lb-mgmt-net IP pool 中的地址。那么就需要解决 lb-mgmt-net 与 OpenStack Management Network 互通的问题。其中 devstack 的做法是将 lb-mgmt-net 的一个 port 挂载到 ex-int 上,lb-mgmt-net 中的 amphora 就可以通过这个 port 与运行在物理主机上的 octavia-health-manager service 进行通信了。而在生产环境中,就需要结合现场环境的网络环境由网管进行配置了。

PS:DevStack 打通本地网络的指令

  1. $ neutron port-create --name octavia-health-manager-standalone-listen-port

  2. --security-group <lb-health-mgr-sec-grp>

  3. --device-owner Octavia:health-mgr

  4. --binding:host_id=<hostname> lb-mgmt-net

  5. --tenant-id <octavia service>


  6. $ ovs-vsctl --may-exist add-port br-int o-hm0

  7. -- set Interface o-hm0 type=internal

  8. -- set Interface o-hm0 external-ids:iface-status=active

  9. -- set Interface o-hm0 external-ids:attached-mac=<Health Manager Listen Port MAC>

  10. -- set Interface o-hm0 external-ids:iface-id=<Health Manager Listen Port ID>


  11. # /etc/octavia/dhcp/dhclient.conf

  12. request subnet-mask,broadcast-address,interface-mtu;

  13. do-forward-updates false;


  14. $ ip link set dev o-hm0 address <Health Manager Listen Port MAC>

  15. $ dhclient -v o-hm0 -cf /etc/octavia/dhcp/dhclient.conf



  16. o-hm0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450

  17. inet 192.168.0.4 netmask 255.255.255.0 broadcast 192.168.0.255

  18. inet6 fe80::f816:3eff:fef0:b9ee prefixlen 64 scopeid 0x20<link>

  19. ether fa:16:3e:f0:b9:ee txqueuelen 1000 (Ethernet)

  20. RX packets 1240893 bytes 278415460 (265.5 MiB)

  21. RX errors 0 dropped 45 overruns 0 frame 0

  22. TX packets 417078 bytes 75842972 (72.3 MiB)

  23. TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

回到主题, UDPStatusGetter.check() 的实现如下:

  1. def check(self):

  2. try:

  3. obj, srcaddr = self.dorecv()

  4. except socket.timeout:

  5. # Pass here as this is an expected cycling of the listen socket

  6. pass

  7. except exceptions.InvalidHMACException:

  8. # Pass here as the packet was dropped and logged already

  9. pass

  10. except Exception as e:

  11. LOG.warning('Health Manager experienced an exception processing a'

  12. 'heartbeat packet. Ignoring this packet. '

  13. 'Exception: %s', e)

  14. else:

  15. self.executor.submit(update_health, obj, srcaddr)

  16. self.executor.submit(update_stats, obj, srcaddr)

  • 调用 self.dorecv() 接收数据

  • 调用 self.executor.submit(update_health,obj,srcaddr) 将 health 持久化到 table amphora_health

  • 调用 self.executor.submit(update_stats,obj,srcaddr) 将 stats 持久化到 table listener_statistics

继续看 amphora 是发出 heatbeats 的实现。

  1. # file: /opt/rocky/octavia/octavia/cmd/agent.py


  2. def main():

  3. # comment out to improve logging

  4. service.prepare_service(sys.argv)


  5. gmr.TextGuruMeditation.setup_autorun(version)


  6. health_sender_proc = multiproc.Process(name='HM_sender',

  7. target=health_daemon.run_sender,

  8. args=(HM_SENDER_CMD_QUEUE,))

  9. health_sender_proc.daemon = True

  10. health_sender_proc.start()


  11. # Initiate server class

  12. server_instance = server.Server()


  13. bind_ip_port = utils.ip_port_str(CONF.haproxy_amphora.bind_host,

  14. CONF.haproxy_amphora.bind_port)

  15. options = {

  16. 'bind': bind_ip_port,

  17. 'workers': 1,

  18. 'timeout': CONF.amphora_agent.agent_request_read_timeout,

  19. 'certfile': CONF.amphora_agent.agent_server_cert,

  20. 'ca_certs': CONF.amphora_agent.agent_server_ca,

  21. 'cert_reqs': True,

  22. 'preload_app': True,

  23. 'accesslog': '/var/log/amphora-agent.log',

  24. 'errorlog': '/var/log/amphora-agent.log',

  25. 'loglevel': 'debug',

  26. }

  27. AmphoraAgent(server_instance.app, options).run()

在启动 amphora-agent 服务进程时,加载了 health_daemon.run_sender 这就是 amphora 向 octavia-health-manager service 发送心跳包的实现。

  1. # file: /opt/rocky/octavia/octavia/amphorae/backends/health_daemon/health_daemon.py


  2. def run_sender(cmd_queue):

  3. LOG.info('Health Manager Sender starting.')

  4. sender = health_sender.UDPStatusSender()


  5. keepalived_cfg_path = util.keepalived_cfg_path()

  6. keepalived_pid_path = util.keepalived_pid_path()


  7. while True:


  8. try:

  9. # If the keepalived config file is present check

  10. # that it is running, otherwise don't send the health

  11. # heartbeat

  12. if os.path.isfile(keepalived_cfg_path):

  13. # Is there a pid file for keepalived?

  14. with open(keepalived_pid_path, 'r') as pid_file:

  15. pid = int(pid_file.readline())

  16. os.kill(pid, 0)


  17. message = build_stats_message()

  18. sender.dosend(message)


  19. except IOError as e:

  20. # Missing PID file, skip health heartbeat

  21. if e.errno == errno.ENOENT:

  22. LOG.error('Missing keepalived PID file %s, skipping health '

  23. 'heartbeat.', keepalived_pid_path)

  24. else:

  25. LOG.error('Failed to check keepalived and haproxy status due '

  26. 'to exception %s, skipping health heartbeat.', e)

  27. except OSError as e:

  28. # Keepalived is not running, skip health heartbeat

  29. if e.errno == errno.ESRCH:

  30. LOG.error('Keepalived is configured but not running, '

  31. 'skipping health heartbeat.')

  32. else:

  33. LOG.error('Failed to check keepalived and haproxy status due '

  34. 'to exception %s, skipping health heartbeat.', e)

  35. except Exception as e:

  36. LOG.error('Failed to check keepalived and haproxy status due to '

  37. 'exception %s, skipping health heartbeat.', e)


  38. try:

  39. cmd = cmd_queue.get_nowait()

  40. if cmd == 'reload':

  41. LOG.info('Reloading configuration')

  42. CONF.reload_config_files()

  43. elif cmd == 'shutdown':

  44. LOG.info('Health Manager Sender shutting down.')

  45. break

  46. except queue.Empty:

  47. pass

  48. time.sleep(CONF.health_manager.heartbeat_interval)

run_sender function 调用了 build_stats_message() 构建 heatbeats,然后调用 UDPStatusSender.dosend() 来发送数据。注意,当 keepalived 服务进程没有正常运行的时候,是不会发送 heatbeats 的。也就是说 keepalived 不正常的 amphora 就会被当作故障 amphora 处理。数据发送依旧使用了 UDP socket,目标 URL 由 CONF.health_manager.controller_ip_port_list 设定。

  1. # file: /etc/octavia/octavia.conf


  2. [health_manager]

  3. bind_port = 5555

  4. bind_ip = 192.168.0.4

  5. controller_ip_port_list = 192.168.0.4:5555

简而言之,octavia-health-manager 与 amphora-agent 之间实现了周期性的心跳协议来监控 amphora 的健康状态。

Failover 故障转移

故障转移机制由 health_manager.HealthManager.health_check() 周期性监控并在预期条件下触发。

healthcheck method 周期性的从 table amphorahealth 获取所谓的 stale amphora 记录,也就是过期没有上报 heatbeats 被判定为故障的 amphora:

  1. # file: /opt/rocky/octavia/octavia/db/repositories.py


  2. def get_stale_amphora(self, session):

  3. """Retrieves a stale amphora from the health manager database.


  4. :param session: A Sql Alchemy database session.

  5. :returns: [octavia.common.data_model]

  6. """


  7. timeout = CONF.health_manager.heartbeat_timeout

  8. expired_time = datetime.datetime.utcnow() - datetime.timedelta(

  9. seconds=timeout)


  10. amp = session.query(self.model_class).with_for_update().filter_by(

  11. busy=False).filter(

  12. self.model_class.last_update < expired_time).first()


  13. if amp is None:

  14. return None


  15. amp.busy = True


  16. return amp.to_data_model()

如果存在 stale amphora 并且 loadbalancer status 不处于 PENDING_UPDATE,那么就会进入 failover amphora 流程,failover amphora 的 taskflow 是 self._amphora_flows.get_failover_flow

Failover 的 UML 图

5月技术周 | Rocky Octavia 实现与分析(五):Amphorae 的故障转移实现

很明显,整个 failover_flow 分为 delete old amphora 和 get a new amphora 两大部分。而且大部分的 TASKs 我们前文都有分析,所以下面简单罗列任务的功能即可

DELETE OLD AMPHORA

  • MarkAmphoraPendingDeleteInDB

  • MarkAmphoraHealthBusy

  • ComputeDelete:删除 amphora

  • WaitForPortDetach:卸载 amphora 上的 port(s)

  • MarkAmphoraDeletedInDB

NOTE:如果故障的 amphora 是一个 free amphora,那么直接删除掉即可。

CREATE NEW AMPHORA

  • getamphoraforlbsubflow:获取一个可用的 free amphora

  • UpdateAmpFailoverDetails:将 old amphora 的信息(table amphora)更新到 new amphora

  • ReloadLoadBalancer & ReloadAmphora:从数据库获取 loadbalancer 和 amphora 的记录作为 stores 传入 flow 中

  • GetAmphoraeNetworkConfigs & GetListenersFromLoadbalancer & GetAmphoraeFromLoadbalancer:获取 listener、amphora 及其网络信息, 作为 stores 传入 flow 中,准备重建 amphora 网络模型

  • PlugVIPPort:为 amphora 设定 keepalived 的 VIP NIC

  • AmphoraPostVIPPlug:将 amphora 的 VIP NIC 注入 network namespace 中 updateampssubflowAmpListenersUpdate:根据 listener 数据更新 amphora 的 haproxy 配置文件,该 flow 为 unordered 类型,所以如果存在多个 listener 则会并发执行。

  • CalculateAmphoraDelta:计算 amphora 需要的 NICs 和 amphora 已存在的 NICs 的差值

  • HandleNetworkDelta:根据上述的差值添加或删除 NICs

  • AmphoraePostNetworkPlug:添加一个 port 连接到 member 所处于的 subnet 中 ReloadLoadBalancer

  • MarkAmphoraMasterInDB

  • AmphoraUpdateVRRPInterface:根据 amphora 的 role 获取并更新 table amphora 中的 VRRP intreface name(字段:vrrpinterface)

  • CreateVRRPGroupForLB:根据 amphora 的 role 更新 loadbalancer’s 主从 amphorae 的 group

  • AmphoraVRRPUpdate:根据 amphora 的 role 更新 keepalived 服务进程的 VRRP 配置

  • AmphoraVRRPStart:启动 keepalived 服务进程

  • ListenersStart:启动 haproxy 服务进程

  • DisableAmphoraHealthMonitoring:删除对应的 amphorahealth 数据库记录

简单终结一下 amphora failover 的思路,首先删除故障的 old amphora,然后获取一个可用的 new amphora,将 old 的关联系数据(e.g. database)以及对象(e.g. 网络模型)转移的 new。

需要注意的是:

It seems intuitive to boot an amphora prior to deleting the old amphora, however this is a complicated issue. If the target host (due to anit-affinity) is resource constrained, this will fail where a post-delete will succeed. Since this is async with the API it would result in the LB ending in ERROR though the amps are still alive. Consider in the future making this a complicated try-on-failure-retry flow, or move upgrade failovers to be synchronous with the API. For now spares pool and act/stdby will mitigate most of this delay.

虽然故障转移就是 delete old amphora 然后 get new amphora,但实际上过程却是复杂的。例如:在删除 old amphora 成功后,创建 new amphora 却可能会由于资源限制导致失败;再例如:由于异步的 API 调用,所以也有可能 create new amphora 成功了,但 loadbalancer 的状态已变成 ERROR。对于异步 API 的问题,将来可能会考虑使用同步 API 来解决,但就目前来说更加依赖于 space amphora 来缓解异步创建的时延问题。

故障迁移测试

关闭 MASTER amphora 的电源,octavia-health-manager service 触发 amphora failover。

  1. Nov 22 11:22:31 control01 octavia-health-manager[29147]: INFO octavia.controller.healthmanager.health_manager [-] Stale amphora's id is: cd444019-ce8f-4f89-be6b-0edf76f41b77

  2. Nov 22 11:22:31 control01 octavia-health-manager[29147]: INFO octavia.controller.healthmanager.health_manager [-] Waiting for 1 failovers to finish

old amphorae:

  1. 2ddc4ba5-b829-4962-93d8-562de91f1dab | amphora-4ff5d6fe-854c-4022-8194-0c6801a7478b | ACTIVE | lb-mgmt-net=192.168.0.23 | amphora-x64-haproxy | m1.amphora |

  2. | b237b2b8-afe4-407b-83f2-e2e60361fa07 | amphora-bcff6f9e-4114-4d43-a403-573f1d97d27e | ACTIVE | lb-mgmt-net=192.168.0.11 | amphora-x64-haproxy | m1.amphora |

  3. | 46eccf47-be10-47ec-89b2-0de44ea3caec | amphora-cd444019-ce8f-4f89-be6b-0edf76f41b77 | ACTIVE | lb-mgmt-net=192.168.0.9; web-server-net=192.168.1.3; lb-vip-net=172.16.1.3 | amphora-x64-haproxy | m1.amphora |

  4. | bc043b23-d481-45c4-9410-f7b349987c98 | amphora-a1c1ba86-6f99-4f60-b469-a4a29d7384c5 | ACTIVE | lb-mgmt-net=192.168.0.3; web-server-net=192.168.1.12; lb-vip-net=172.16.1.7 | amphora-x64-haproxy | m1.amphora |

new amphoras:

  1. | 712ff785-c082-4b53-994c-591d1ec0bf7b | amphora-caa6ba0f-1a68-4f22-9be9-8521695ac4f4 | ACTIVE | lb-mgmt-net=192.168.0.13 | amphora-x64-haproxy | m1.amphora |

  2. | 2ddc4ba5-b829-4962-93d8-562de91f1dab | amphora-4ff5d6fe-854c-4022-8194-0c6801a7478b | ACTIVE | lb-mgmt-net=192.168.0.23; web-server-net=192.168.1.4; lb-vip-net=172.16.1.3 | amphora-x64-haproxy | m1.amphora |

  3. | b237b2b8-afe4-407b-83f2-e2e60361fa07 | amphora-bcff6f9e-4114-4d43-a403-573f1d97d27e | ACTIVE | lb-mgmt-net=192.168.0.11 | amphora-x64-haproxy | m1.amphora |

  4. | bc043b23-d481-45c4-9410-f7b349987c98 | amphora-a1c1ba86-6f99-4f60-b469-a4a29d7384c5 | ACTIVE | lb-mgmt-net=192.168.0.3; web-server-net=192.168.1.12; lb-vip-net=172.16.1.7 | amphora-x64-haproxy | m1.amphora |

new amphora haproxy config:

  1. # Configuration for loadbalancer 01197be7-98d5-440d-a846-cd70f52dc503

  2. global

  3. daemon

  4. user nobody

  5. log /dev/log local0

  6. log /dev/log local1 notice

  7. stats socket /var/lib/octavia/1385d3c4-615e-4a92-aea1-c4fa51a75557.sock mode 0666 level user

  8. maxconn 1000000

  9. external-check


  10. defaults

  11. log global

  12. retries 3

  13. option redispatch


  14. peers 1385d3c4615e4a92aea1c4fa51a75557_peers

  15. peer 3dVescsRZ-RdRBfYVLW6snVI9gI 172.16.1.3:1025

  16. peer l_Ustq0qE-h-_Q1dlXLXBAiWR8U 172.16.1.7:1025



  17. frontend 1385d3c4-615e-4a92-aea1-c4fa51a75557

  18. option httplog

  19. maxconn 1000000

  20. bind 172.16.1.10:8080

  21. mode http

  22. acl 8d9b8b1e-83d7-44ca-a5b4-0103d5f90cb9 req.hdr(host) -i -m beg server

  23. use_backend 8196f752-a367-4fb4-9194-37c7eab95714 if 8d9b8b1e-83d7-44ca-a5b4-0103d5f90cb9

  24. acl c76f36bc-92c0-4f48-8d57-a13e3b1f09e1 req.hdr(host) -i -m beg server

  25. use_backend 822f78c3-ea2c-4770-bef0-e97f1ac2eba8 if c76f36bc-92c0-4f48-8d57-a13e3b1f09e1

  26. default_backend 8196f752-a367-4fb4-9194-37c7eab95714

  27. timeout client 50000


  28. backend 8196f752-a367-4fb4-9194-37c7eab95714

  29. mode http

  30. balance roundrobin

  31. timeout check 10s

  32. option external-check

  33. external-check command /var/lib/octavia/ping-wrapper.sh

  34. fullconn 1000000

  35. option allbackups

  36. timeout connect 5000

  37. timeout server 50000

  38. server b6e464fd-dd1e-4775-90f2-4231444a0bbe 192.168.1.14:80 weight 1 check inter 5s fall 3 rise 3


  39. backend 822f78c3-ea2c-4770-bef0-e97f1ac2eba8

  40. mode http

  41. balance roundrobin

  42. timeout check 10s

  43. option external-check

  44. external-check command /var/lib/octavia/ping-wrapper.sh

  45. fullconn 1000000

  46. option allbackups

  47. timeout connect 5000

  48. timeout server 50000

  49. server 7da6f176-36c6-479a-9d86-c892ecca6ae5 192.168.1.6:80 weight 1 check inter 5s fall 3 rise 3

new amphora keepalived config:

  1. vrrp_script check_script {

  2. script /var/lib/octavia/vrrp/check_script.sh

  3. interval 5

  4. fall 2

  5. rise 2

  6. }


  7. vrrp_instance 01197be798d5440da846cd70f52dc503 {

  8. state MASTER

  9. interface eth1

  10. virtual_router_id 1

  11. priority 100

  12. nopreempt

  13. garp_master_refresh 5

  14. garp_master_refresh_repeat 2

  15. advert_int 1

  16. authentication {

  17. auth_type PASS

  18. auth_pass b76d77e

  19. }


  20. unicast_src_ip 172.16.1.3

  21. unicast_peer {

  22. 172.16.1.7

  23. }


  24. virtual_ipaddress {

  25. 172.16.1.10

  26. }

  27. track_script {

  28. check_script

  29. }

  30. }

检查 new amphora 的配置文件和网络设置与 old amphora 一致,迁移成功。


相关阅读:

3月技术周 | Rocky Octavia 实现与分析(四):Amphora 与 Octavia 的安全通信实现

2月技术周 | Rocky Octavia 实现与分析 (三): 核心资源创建流程

12月技术周 | Rocky Octavia 的实现与分析(二):LoadBalancer 创建流程

技术周 | Octavia 实现与分析;是否升级OpenStack;Metadata Service分析;DHCP原理分析……


关于九州云99Cloud

九州云成立于2012年,是中国早期从事开放基础架构服务的专业公司。公司成立六年,秉承“开源 · 赋能变革”的理念,不断夯实自身实力,先后为政府、金融、运营商、能源、制造业、商业、交通、物流、教育、医疗等各大行业的企业级客户提供高质量的开放基础架构服务。目前拥有国家电网、南方电网广东公司、中国人民银行、中国银联、中国移动、中国电信、中国联通、中国资源卫星、eBay、国际陆港集团、中国人寿、万达信息、东风汽车、诺基亚等众多重量级客户,被用户认可为最值得信赖的合作伙伴。

5月技术周 | Rocky Octavia 实现与分析(五):Amphorae 的故障转移实现