BFD(Bidirectional Forwarding Detection)是一种双向转发检测机制,可以提供毫秒级的检测,可以实现链路的快速检测,BFD通过与上层路由协议联动,可以实现路由的快速收敛,确保业务的永续性。这里我以迈普设备为例讲解如何排查BFD故障,其他厂商设备可以参照此思路排查,欢迎各位小伙伴评论交流。
BFD常见故障有三种情形,分别是:1.BFD会话无法创建;2.BFD会话DOWN;3.BFD出现频繁振荡。这里再仔细剖析下导致每种故障的愿因以及排障思路和排障步骤。
1.BFD会话无法创建
故障原因:⑴接口上关闭了BFD会话;
⑵配置错误;
⑶关联协议的状态不正确;
排障思路:
首先检查BFD配置是否正确;其次检查关联协议的状态是否正确。
排障步骤:
①执行show running-config interface vlan vlan-id命令,检查接口BFD配置。
Hostname#show running-config interface vlan 2
Building Configuration...
interface vlan2
ip address 10.0.1.6 255.255.255.252
bfd min-transmit-interval 300
bfd min-receive-interval 300
ip rip bfd disable
如果接口配置相关协议BFD disable,则此接口相关协议不会关联BFD;
如果接口关闭了相关协议的BFD,则修改配置;如果接口未关闭相关协议BFD,请执行步骤2。
②检查配置是否正确
如果BFD关联的是OSPF,执行show running-config router ospf命令,检查OSPF关联BFD的配置是否正确;
Hostname#show running-config router ospf
router ospf 100
bfd all-interfaces
network 10.10.4.0 0.0.0.255 area 0
network 10.10.6.0 0.0.0.255 area 31
network 10.10.10.10 0.0.0.0 area 0
network 10.10.16.0 0.0.0.255 area 0
执行show running-config interface vlan vlan-id命令,检查接口配置是否正确。
Hostname#show running-config interface vlan 3
Building Configuration...
interface vlan3
ip address 10.0.1.10 255.255.255.252
ip ospf network point-to-point
ip ospf bfd
如果BFD关联的是RIP,执行show running-config router rip命令,检查RIP关联BFD的配置是否正确;
Hostname#show running-config router rip
router rip
version 2
network 10.0.0.0
no auto-summary
bfd all-interfaces
执行show running-config interface vlan vlan-id命令,检查接口配置是否正确。
Hostname#show running-config interface vlan 3
Building Configuration...
interface vlan3
ip address 10.0.1.10 255.255.255.252
ip rip bfd
如果BFD关联的是BGP,执行show running-config router bgp命令,检查BGP关联BFD的配置是否正确;
Hostname#show running-config router bgp
router bgp 100
no auto-summary
no synchronization
neighbor 10.20.28.1 update-source loopback0
neighbor 10.20.28.1 remote-as 100
neighbor 10.20.38.1 fall-over bfd
如果BFD关联的是静态路由,执行show running-config ip route命令,检查静态路由关联BFD的配置是否正确。
Hostname#show running-config ip route
ip route static bfd vlan2 10.0.1.5
ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5
如果配置不正确,请修改相关配置;如果配置正确,请执行步骤3。
③检查协议状态是否正确。
如果BFD关联的是OSPF协议,执行show ip ospf neighbor命令,检查OSPF邻居状态。
Hostname#show ip ospf neighbor
ID Pri State Dead Time Address Interface
10.20.19.1 0 Full/ - 00:00:35 10.10.60.2 vlan2
OSPF邻居状态为Full时,对应BFD会话才会创建,否则不会创建BFD会话。
如果BFD关联的是RIP协议,执行show ip route rip命令,检查RIP是否学习到对端路由。
Hostname#show ip route rip
Codes: C - connected, S - static, R - RIP, O - OSPF, OE-OSPF External, M - Management
D - Redirect, E - IRMP, Ex - IRMP external, o - SNSP, B - BGP, i-ISIS
Gateway of last resort is 10.0.1.5 to network 0.0.0.0
R 10.0.0.0/30 [120/1] via 10.0.1.5, 03:58:33, vlan2
RIP只有接收对方通告的路由,才会创建BFD会话,否则不会创建BFD会话。
如果BFD关联的是BGP协议,执行show ip bgp neighbors ip-address命令,检查BGP邻居状态。
Hostname#show ip bgp neighbors
BGP neighbor is 10.1.0.253, remote AS 200, local AS 100, external link
BGP version 4, remote router ID 222.222.222.222 BGP state = Established, up for 00:00:17
Last read , hold time is 180, keepalive interval is 60 seconds
Connect retry timer is 0
Neighbor capabilities:
Route refresh: advertised and received (old and new)
Address family IPv4 Unicast: advertised and received
Received 2 messages, 1 open, 0 update, 1 keepalive_in, 0 notifications, 0 in queue
Sent 2 messages, 1 open, 0 update, 1 keepalive_in, 0 notifications, 0 in queue
Route refresh request: received 0, sent 0
Minimum time between advertisement runs is 30 seconds
Update source is loopback1
For address family: IPv4 Unicast
BGP table version 1, neighbor version 1
Index 1, Offset 0, Mask 0x2
0 accepted prefixes
0 announced prefixes
Connections established 2; dropped 1
Local host: 10.1.0.254, Local port: 1135
Foreign host: 10.1.0.253, Foreign port: 179
Nexthop: 10.1.0.254
Nexthop global: ::
Nexthop local: ::
BGP connection: non shared network
Last error code: 6 , subcode: 0
BGP邻居状态为Established时,才会创建BFD会话,否则不会创建BFD会话。
如果关联协议状态不正常,请参考故障排除手册<单播路由>相关章节的故障处理思路排查路由协议问题。
如果协议状态正常,请执行步骤4。
④如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。
Ⅰ.上述步骤的执行结果。
Ⅱ.两端设备的配置文件(show running-config)、日志信息(show logging、show logging buffer)、接口信息(show interface interface-name)、故障情况下的其他信息(show cpu、show process、show pool、show ip ospf neighbor 、 show ip rip database、show ip sockets、show bfd session detail、show bfd client)。
2.BFD会话DOWN
故障原因:⑴接口DOWN;
⑵对端配置错误;
排障思路:
首先检查接口状态;其次检查对端BFD配置。
排障步骤:
①检查接口状态是否正常。
在两端设备上分别执行show interface vlan vlan-id命令,检查接口状态是否为UP。
Hostname#show interface vlan 2101
vlan2101:
line protocol is up
Flags: (0xc008063) BROADCAST MULTICAST ARP RUNNING
Type: ETHERNET_CSMACD
Internet address: 10.0.1.5/30
Broadcast address: 10.0.1.7
Queue strategy: FIFO , Output queue: 0/1 (current/max packets)(0)
Metric: 0, MTU: 1500, BW: 100000 Kbps, DLY: 100 usec, VRF: global
Reliability 255/255, Txload 1/255, Rxload 1/255
Ethernet address is 0001.7a5d.bee2
5 minutes input rate 0 bits/sec, 0 packets/sec
5 minutes output rate 0 bits/sec, 0 packets/sec
6450389 packets received; 656647 packets sent
243052 multicast packets received
218637 multicast packets sent
0 input errors; 0 output errors
0 collisions; 0 dropped
如果两端接口中有一端接口状态为DOWN,请检查接口down原因;如果两端接口接口状态都为UP,请执行步骤2。
②检查对端设备上BFD配置是否正确。
如果关联的是OSPF协议,在对端设备上执行show running-config router ospf和show running-config interface vlan vlan-id命令,检查OSPF关联BFD配置。
Hostname#show running-config router ospf
router ospf 100
network 10.10.4.0 0.0.0.255 area 0
network 10.10.6.0 0.0.0.255 area 31
network 10.10.10.10 0.0.0.0 area 0
network 10.10.16.0 0.0.0.255 area 0
Hostname#show running-config interface vlan 3
Building Configuration...
interface vlan3
ip address 10.0.1.10 255.255.255.252
ip ospf network point-to-point
上述信息说明,对端设备上没有配置OSPF关联BFD。
如果关联的是RIP协议,在对端设备上执行show running-config router rip和show running-config interface vlan vlan-id命令,检查RIP关联BFD配置。
Hostname#show running-config router rip
router rip
version 2
network 10.0.0.0
no auto-summary
Hostname#show running-config interface vlan 3
Building Configuration...
interface vlan3
ip address 10.0.1.10 255.255.255.252
ip ospf network point-to-point
上述信息说明,对端设备上没有配置RIP关联BFD。
如果关联的是BGP协议,在对端设备上执行show running-config router bgp命令,检查BGP关联BFD配置。
Hostname#show running-config router bgp
router bgp 100
no auto-summary
no synchronization
neighbor 10.20.28.1 update-source loopback0
neighbor 10.20.28.1 remote-as 100
上述信息说明,对端设备上没有配置BGP关联BFD。
如果关联的是静态路由,在对端设备上执行show running-config ip route命令,检查静态路由关联BFD配置。
Hostname#show running-config ip route
ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5
上述信息说明,对端设备上没有配置静态路由关联BFD。
l 如果对端BFD配置不正确,请修改配置。
l 如果对端BFD配置正确,请执行步骤3。
③如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。
Ⅰ. 上述步骤的执行结果。
Ⅱ. 两端设备的配置文件(show running-config)、日志信息(show logging、show logging buffer)、接口信息(show interface vlan-id)、故障情况下的其他信息(show cpu、show process、show pool、show ip ospf neighbor 、 show ip rip database、show ip sockets、show bfd session detail、show bfd client)。
3.BFD出现频繁振荡
故障原因:⑴接口振荡;
⑵线路拥塞;
⑶BFD参数配置不合理;
排障思路:
首先检查接口是否存在振荡;其次检查线路是否拥塞导致BFD报文丢弃;最后检查BFD的参数配置。
排障步骤:
①检查接口状态是否存在振荡。
执行show logging命令,检查日志信息。
Hostname#show logging
Apr 28 10:14:55.666: [tIfMgt]%LINK-INTERFACE_DOWN-3: Interface vlan2101, changed state to down.
Apr 28 10:14:55.666: [tIfMgt]%LINK-LINEPROTO_DOWN-3: Line protocol on interface vlan2101, changed state to down.
Apr 28 10:14:55.716: [tOSPF]%OSPF-ADJCHG_DOWN-3: Process 100 Nbr [vlan2101:10.0.1.5-10.1.0.254] from Full to Down,KillNbr: Interface down or detached.
Apr 28 10:14:57.183: [tNBFD]%BFD-SESSION_DOWN-4: Session [destination address:10.0.1.6,source address:10.0.1.5,interface:vlan2101,local-discriminator:359] DOWN
Apr 28 10:14:57.183: [tNBFD]%STRT-BFD_STATE-5: Receive bfd session [10.0.1.6 10.0.1.5 vrf global interface vlan2101] change to down
Apr 28 10:16:22.566: [tIfMgt]%LINK-INTERFACE_UP-5: Interface vlan2101, changed state to up.
Apr 28 10:16:22.566: [tIfMgt]%LINK-LINEPROTO_UP-5: Line protocol on interface vlan2101, changed state to up.
Apr 28 10:16:31.83: [tOSPF]%OSPF-ADJCHG_FULL-5: Process 100 Nbr [vlan2101:10.0.1.5-10.1.0.254] from Exchange to Full,ExchangeDone.
Apr 28 10:16:34.383: [tNBFD]%BFD-SESSION_UP-5: Session [destination address:10.0.1.6,source address:10.0.1.5,interface:vlan2101,local-discriminator:359] UP
Apr 28 10:16:34.383: [tNBFD]%STRT-BFD_STATE-5: Receive bfd session [10.0.1.6 10.0.1.5 vrf global interface vlan2101] change to up
上述日志信息说明,接口状态存在振荡。
如果接口状态存在振荡,请检查接口振荡原因; 如果接口状态正常,请执行步骤2。
②检查接口是否出现拥塞。
在lpu的ssp模式下多次执行pw命令,检查端口BFD所在队列是否出现拥塞。
Hostname#connect lpu 0
Convert name lpu 0 to slot number 0
done
LPU-0>en
password:
LPU-0#
LPU-0#debug ssp
BCM.0> pw
bcmPW.0: Status: Not Running. Mode RX. Buffering up to 0 packets.
Rate limit is 2000 (soc intvl 0).
Reporting is enabled for:
Reporting is disabled for: Count DECode Raw DMA CHannel
Dump options are enabled for:
Dump options are disabled for: Count DECode Raw DMA CHannel
RX on for channel(s): -- using default --
RX Info @ time=2846394593: started. Last fill 2846361260. Thread is running.
+verbose for more info
Pkt Size 2040. Pkts/Chain 4. All COS PPS 2000. Burst 2000. Flags 0.
Sys PPS 0. Sys tokens 0. Sys fill 18116666.
Cntrs: Pkts 4414569. Last start 4414569. Tunnel 0. Owned 4407596.
Bad Hndlr 0. No Hndlr 0. Not Running 0.
Thrd Not Running 0. DCB Errs 0.nextunit:1
Registered callbacks:
SSP RX Priority=255. Argument=0x0. COS 0xffffffffffff.
Packets handled 0, owned 0.
Discard Priority= 0. Argument=0x0. COS 0xffffffffffff.
Packets handled 0, owned 0.
Channel Info
Chan 1 is running: Chains 64. COS 0xff. DCB/pkt 1. active chains 64
rpkt 4414569. rbyte 680537578. dpkt 0. dbyte 0. mem fail 0. flags 0.
Queue Info
Queue 0: PPS 200. CurPkts 0. TotPkts 57941. Disc rate 0, qlen 6970, max dcbs 50.
Tokens 800. Fill 2826544593. Max 200. Brst 400. Head 0x0. Tail 0x0.
Queue 1: PPS 250. CurPkts 0. TotPkts 154363. Disc rate 0, qlen 0, max dcbs 50.
Tokens 500. Fill 2845244593. Max 200. Brst 200. Head 0x0. Tail 0x0.
Queue 2: PPS 500. CurPkts 0. TotPkts 1098533. Disc rate 0, qlen 0, max dcbs 100.
Tokens 1199. Fill 2846044593. Max 200. Brst 600. Head 0x0. Tail 0x0.
Queue 3: PPS 600. CurPkts 0. TotPkts 25823. Disc rate 0, qlen 0, max dcbs 256.
Tokens 2048. Fill 2845544593. Max 200. Brst 1024. Head 0x0. Tail 0x0.
Queue 4: PPS 1000. CurPkts 0. TotPkts 123150. Disc rate 0, qlen 0, max dcbs 256.
Tokens 2048. Fill 2845944593. Max 200. Brst 1024. Head 0x0. Tail 0x0.
Queue 5: PPS 400. CurPkts 0. TotPkts 98684. Disc rate 0, qlen 0, max dcbs 256.
Tokens 1200. Fill 2846361260. Max 200. Brst 600. Head 0x0. Tail 0x0.
Queue 6: PPS 300. CurPkts 0. TotPkts 260415. Disc rate 0, qlen 0, max dcbs 256.
Tokens 600. Fill 2845744593. Max 200. Brst 300. Head 0x0. Tail 0x0.
Queue 7: PPS 100. CurPkts 0. TotPkts 0. Disc rate 0, qlen 0, max dcbs 256.
Tokens 600. Fill 2341211261. Max 200. Brst 300. Head 0x0. Tail 0x0.
Queue 8: PPS 1500. CurPkts 0. TotPkts 0. Disc rate 0, qlen 0, max dcbs 256.
Tokens 3000. Fill 2341211261. Max 200. Brst 1000. Head 0x0. Tail 0x0.
Queue 9: PPS 2000. CurPkts 0. TotPkts 2588690. Disc rate 0, qlen 0, max dcbs 256.
Tokens 4000. Fill 2846044593. Max 200. Brst 1000. Head 0x0. Tail 0x0.
检查BFD所在队列6的qlen计数,如果计数非0,说明此接口出现过拥塞,如果持续增长,说明在执行命令期间出现拥塞导致丢包。
如果队列出现丢包,请检查丢包原因; 如果队列没有丢包,请执行步骤3。
③检查BFD参数配置是否合理。
执行show bfd session detail命令,检查BFD参数配置是否合理。
Hostname#show bfd session detail
OurAddr NeighAddr LD/RD State Holddown interface
10.10.51.1 10.10.51.2 789/49533 UP 30 vlan2101
Type:direct
Local State:UP Remote State:UP Up for: 0h:15m:11s Number of times UP:1
Send Interval:10ms Detection time:30ms(10ms*3)
Local Diag:0 Demand mode:0 Poll bit:0
MinTxInt:10 MinRxInt:10 Multiplier:3
Remote MinTxInt:10 Remote MinRxInt:10 Remote Multiplier:3
Registered protocols:OSPF
Agent session info:
Sender:slot 0 Recver:slot 0
上述信息说明,BFD报文的发送和接收最小时间间隔为10毫秒,连续3次收不到报文就会超时,在实际网络中超时时间过小会导致BFD振荡。
如果 BFD的参数配置不合理,请修改配置; 如果BFD的参数配置合理,请执行步骤4。
④如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。
上述步骤的执行结果。
两端设备的配置文件(show running-config)、日志信息(show logging、show logging buffer)、接口信息(show interface vlan-id)、故障情况下的其他信息(show cpu、show process、show pool、show ip ospf neighbor 、 show ip rip database、show ip sockets、show bfd session detail、show bfd client)。
故障案例:静态路由关联BFD配置错误问题
用户组网如下图,设备Device1接口vlan2 ip地址为10.0.1.5/30,Device2接口vlan2 ip地址为10.0.1.6/30。两台设备间配置静态路由,并配置静态路由关联 BFD,发现BFD会话无法创建。

故障分析:⑴执行show bfd session和show bfd client命令,检查是否创建BFD会话。
Device1#show bfd session
OurAddr NeighAddr LD/RD State Holddown interface
没有创建BFD会话。
Device1# show bfd client
ClientName SessionNumber
OSPF 0
BGP 0
isis 0
TRACK 0
RIP 0
STATICRT统计信息SessionNumber为0,判断静态路由没有关联BFD。
⑵执行show running-config interface vlan vlan-id和show running-config ip route 命令,检查接口配置、BFD和静态路由关联配置。
Device1# show running-config interface vlan 2
Building Configuration...
interface vlan2
ip address 10.0.1.6 255.255.255.252
bfd min-transmit-interval 300
bfd min-receive-interval 300
Device1#show running-config ip route
ip route static bfd vlan2 10.0.1.5
ip route 10.0.2.0 255.255.255.0 10.0.1.5
上述信息说明,静态路由的下一跳没有指定接口,关联BFD配置错误。
解决方案:配置静态路由关联BFD,需要指定接口,配置如下:
Device1#config terminal
Device1(config)#ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5
Device1(config)#ip route static bfd vlan2 10.0.1.5
正确配置后再次检查BFD会话。
Device1#show bfd session detail
Total session number: 1
OurAddr NeighAddr LD/RD State Holddown interface
10.0.1.5 10.0.1.6 48/359 UP 1500 vlan2
Type:direct
Local State:UP Remote State:UP Up for: 0h:50m:41s Number of times UP:3
Send Interval:300ms Detection time:1500ms(300ms*5)
Local Diag:1 Demand mode:0 Poll bit:0
MinTxInt:300 MinRxInt:300 Multiplier:5
Remote MinTxInt:300 Remote MinRxInt:300 Remote Multiplier:5
Registered protocols:STATICRT
Agent session info:
l2info_state:0x5
VLAN:2 SMAC:0001.7adf.1524 DMAC:0001.7a58.f7d6 AG:769 AP:289 Slot:6
Sender:slot 6 Recver:slot 6
总结:BFD会话没有创建,大多数情况下为配置问题导致,处理这类故障时应首先检查配置。