亲手教你排除迈普设备BFD故障

BFD(Bidirectional Forwarding Detection)是一种双向转发检测机制,可以提供毫秒级的检测,可以实现链路的快速检测,BFD通过与上层路由协议联动,可以实现路由的快速收敛,确保业务的永续性。这里我以迈普设备为例讲解如何排查BFD故障,其他厂商设备可以参照此思路排查,欢迎各位小伙伴评论交流。

BFD常见故障有三种情形,分别是:1.BFD会话无法创建;2.BFD会话DOWN;3.BFD出现频繁振荡。这里再仔细剖析下导致每种故障的愿因以及排障思路和排障步骤。

1.BFD会话无法创建

故障原因:⑴接口上关闭了BFD会话;

⑵配置错误;

⑶关联协议的状态不正确;

排障思路:

首先检查BFD配置是否正确;其次检查关联协议的状态是否正确。

排障步骤:

①执行show running-config interface vlan vlan-id命令,检查接口BFD配置。

Hostname#show running-config interface vlan 2

Building Configuration...

interface vlan2

ip address 10.0.1.6 255.255.255.252

bfd min-transmit-interval 300

bfd min-receive-interval 300

ip rip bfd disable

如果接口配置相关协议BFD disable,则此接口相关协议不会关联BFD;

如果接口关闭了相关协议的BFD,则修改配置;如果接口未关闭相关协议BFD,请执行步骤2。

②检查配置是否正确

如果BFD关联的是OSPF,执行show running-config router ospf命令,检查OSPF关联BFD的配置是否正确;

Hostname#show running-config router ospf

router ospf 100

bfd all-interfaces

network 10.10.4.0 0.0.0.255 area 0

network 10.10.6.0 0.0.0.255 area 31

network 10.10.10.10 0.0.0.0 area 0

network 10.10.16.0 0.0.0.255 area 0

执行show running-config interface vlan vlan-id命令,检查接口配置是否正确。

Hostname#show running-config interface vlan 3

Building Configuration...

interface vlan3

ip address 10.0.1.10 255.255.255.252

ip ospf network point-to-point

ip ospf bfd

如果BFD关联的是RIP,执行show running-config router rip命令,检查RIP关联BFD的配置是否正确;

Hostname#show running-config router rip

router rip

version 2

network 10.0.0.0

no auto-summary

bfd all-interfaces

执行show running-config interface vlan vlan-id命令,检查接口配置是否正确。

Hostname#show running-config interface vlan 3

Building Configuration...

interface vlan3

ip address 10.0.1.10 255.255.255.252

ip rip bfd

如果BFD关联的是BGP,执行show running-config router bgp命令,检查BGP关联BFD的配置是否正确;

Hostname#show running-config router bgp

router bgp 100

no auto-summary

no synchronization

neighbor 10.20.28.1 update-source loopback0

neighbor 10.20.28.1 remote-as 100

neighbor 10.20.38.1 fall-over bfd

如果BFD关联的是静态路由,执行show running-config ip route命令,检查静态路由关联BFD的配置是否正确。

Hostname#show running-config ip route

ip route static bfd vlan2 10.0.1.5

ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5

如果配置不正确,请修改相关配置;如果配置正确,请执行步骤3。

③检查协议状态是否正确。

如果BFD关联的是OSPF协议,执行show ip ospf neighbor命令,检查OSPF邻居状态。

Hostname#show ip ospf neighbor

ID Pri State Dead Time Address Interface

10.20.19.1 0 Full/ - 00:00:35 10.10.60.2 vlan2

OSPF邻居状态为Full时,对应BFD会话才会创建,否则不会创建BFD会话。

如果BFD关联的是RIP协议,执行show ip route rip命令,检查RIP是否学习到对端路由。

Hostname#show ip route rip

Codes: C - connected, S - static, R - RIP, O - OSPF, OE-OSPF External, M - Management

D - Redirect, E - IRMP, Ex - IRMP external, o - SNSP, B - BGP, i-ISIS

Gateway of last resort is 10.0.1.5 to network 0.0.0.0

R 10.0.0.0/30 [120/1] via 10.0.1.5, 03:58:33, vlan2

RIP只有接收对方通告的路由,才会创建BFD会话,否则不会创建BFD会话。

如果BFD关联的是BGP协议,执行show ip bgp neighbors ip-address命令,检查BGP邻居状态。

Hostname#show ip bgp neighbors

BGP neighbor is 10.1.0.253, remote AS 200, local AS 100, external link

BGP version 4, remote router ID 222.222.222.222 BGP state = Established, up for 00:00:17

Last read , hold time is 180, keepalive interval is 60 seconds

Connect retry timer is 0

Neighbor capabilities:

Route refresh: advertised and received (old and new)

Address family IPv4 Unicast: advertised and received

Received 2 messages, 1 open, 0 update, 1 keepalive_in, 0 notifications, 0 in queue

Sent 2 messages, 1 open, 0 update, 1 keepalive_in, 0 notifications, 0 in queue

Route refresh request: received 0, sent 0

Minimum time between advertisement runs is 30 seconds

Update source is loopback1

For address family: IPv4 Unicast

BGP table version 1, neighbor version 1

Index 1, Offset 0, Mask 0x2

0 accepted prefixes

0 announced prefixes

Connections established 2; dropped 1

Local host: 10.1.0.254, Local port: 1135

Foreign host: 10.1.0.253, Foreign port: 179

Nexthop: 10.1.0.254

Nexthop global: ::

Nexthop local: ::

BGP connection: non shared network

Last error code: 6 , subcode: 0

BGP邻居状态为Established时,才会创建BFD会话,否则不会创建BFD会话。

如果关联协议状态不正常,请参考故障排除手册<单播路由>相关章节的故障处理思路排查路由协议问题。

如果协议状态正常,请执行步骤4。

④如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。

Ⅰ.上述步骤的执行结果。

Ⅱ.两端设备的配置文件(show running-config)、日志信息(show loggingshow logging buffer)、接口信息(show interface interface-name)、故障情况下的其他信息(show cpushow processshow poolshow ip ospf neighborshow ip rip databaseshow ip socketsshow bfd session detailshow bfd client)。

2.BFD会话DOWN

故障原因:⑴接口DOWN;

⑵对端配置错误;

排障思路:

首先检查接口状态;其次检查对端BFD配置。

排障步骤:

①检查接口状态是否正常。

在两端设备上分别执行show interface vlan vlan-id命令,检查接口状态是否为UP。

Hostname#show interface vlan 2101

vlan2101:

line protocol is up

Flags: (0xc008063) BROADCAST MULTICAST ARP RUNNING

Type: ETHERNET_CSMACD

Internet address: 10.0.1.5/30

Broadcast address: 10.0.1.7

Queue strategy: FIFO , Output queue: 0/1 (current/max packets)(0)

Metric: 0, MTU: 1500, BW: 100000 Kbps, DLY: 100 usec, VRF: global

Reliability 255/255, Txload 1/255, Rxload 1/255

Ethernet address is 0001.7a5d.bee2

5 minutes input rate 0 bits/sec, 0 packets/sec

5 minutes output rate 0 bits/sec, 0 packets/sec

6450389 packets received; 656647 packets sent

243052 multicast packets received

218637 multicast packets sent

0 input errors; 0 output errors

0 collisions; 0 dropped

如果两端接口中有一端接口状态为DOWN,请检查接口down原因;如果两端接口接口状态都为UP,请执行步骤2。

②检查对端设备上BFD配置是否正确。

如果关联的是OSPF协议,在对端设备上执行show running-config router ospfshow running-config interface vlan vlan-id命令,检查OSPF关联BFD配置。

Hostname#show running-config router ospf

router ospf 100

network 10.10.4.0 0.0.0.255 area 0

network 10.10.6.0 0.0.0.255 area 31

network 10.10.10.10 0.0.0.0 area 0

network 10.10.16.0 0.0.0.255 area 0

Hostname#show running-config interface vlan 3

Building Configuration...

interface vlan3

ip address 10.0.1.10 255.255.255.252

ip ospf network point-to-point

上述信息说明,对端设备上没有配置OSPF关联BFD。

如果关联的是RIP协议,在对端设备上执行show running-config router ripshow running-config interface vlan vlan-id命令,检查RIP关联BFD配置。

Hostname#show running-config router rip

router rip

version 2

network 10.0.0.0

no auto-summary

Hostname#show running-config interface vlan 3

Building Configuration...

interface vlan3

ip address 10.0.1.10 255.255.255.252

ip ospf network point-to-point

上述信息说明,对端设备上没有配置RIP关联BFD。

如果关联的是BGP协议,在对端设备上执行show running-config router bgp命令,检查BGP关联BFD配置。

Hostname#show running-config router bgp

router bgp 100

no auto-summary

no synchronization

neighbor 10.20.28.1 update-source loopback0

neighbor 10.20.28.1 remote-as 100

上述信息说明,对端设备上没有配置BGP关联BFD。

如果关联的是静态路由,在对端设备上执行show running-config ip route命令,检查静态路由关联BFD配置。

Hostname#show running-config ip route

ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5

上述信息说明,对端设备上没有配置静态路由关联BFD。

l 如果对端BFD配置不正确,请修改配置。

l 如果对端BFD配置正确,请执行步骤3。

③如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。

Ⅰ. 上述步骤的执行结果。

Ⅱ. 两端设备的配置文件(show running-config)、日志信息(show loggingshow logging buffer)、接口信息(show interface vlan-id)、故障情况下的其他信息(show cpushow processshow poolshow ip ospf neighborshow ip rip databaseshow ip socketsshow bfd session detailshow bfd client)。

3.BFD出现频繁振荡

故障原因:⑴接口振荡;

⑵线路拥塞;

⑶BFD参数配置不合理;

排障思路:

首先检查接口是否存在振荡;其次检查线路是否拥塞导致BFD报文丢弃;最后检查BFD的参数配置。

排障步骤:

①检查接口状态是否存在振荡。

执行show logging命令,检查日志信息。

Hostname#show logging

Apr 28 10:14:55.666: [tIfMgt]%LINK-INTERFACE_DOWN-3: Interface vlan2101, changed state to down.

Apr 28 10:14:55.666: [tIfMgt]%LINK-LINEPROTO_DOWN-3: Line protocol on interface vlan2101, changed state to down.

Apr 28 10:14:55.716: [tOSPF]%OSPF-ADJCHG_DOWN-3: Process 100 Nbr [vlan2101:10.0.1.5-10.1.0.254] from Full to Down,KillNbr: Interface down or detached.

Apr 28 10:14:57.183: [tNBFD]%BFD-SESSION_DOWN-4: Session [destination address:10.0.1.6,source address:10.0.1.5,interface:vlan2101,local-discriminator:359] DOWN

Apr 28 10:14:57.183: [tNBFD]%STRT-BFD_STATE-5: Receive bfd session [10.0.1.6 10.0.1.5 vrf global interface vlan2101] change to down

Apr 28 10:16:22.566: [tIfMgt]%LINK-INTERFACE_UP-5: Interface vlan2101, changed state to up.

Apr 28 10:16:22.566: [tIfMgt]%LINK-LINEPROTO_UP-5: Line protocol on interface vlan2101, changed state to up.

Apr 28 10:16:31.83: [tOSPF]%OSPF-ADJCHG_FULL-5: Process 100 Nbr [vlan2101:10.0.1.5-10.1.0.254] from Exchange to Full,ExchangeDone.

Apr 28 10:16:34.383: [tNBFD]%BFD-SESSION_UP-5: Session [destination address:10.0.1.6,source address:10.0.1.5,interface:vlan2101,local-discriminator:359] UP

Apr 28 10:16:34.383: [tNBFD]%STRT-BFD_STATE-5: Receive bfd session [10.0.1.6 10.0.1.5 vrf global interface vlan2101] change to up

上述日志信息说明,接口状态存在振荡。

如果接口状态存在振荡,请检查接口振荡原因; 如果接口状态正常,请执行步骤2。

②检查接口是否出现拥塞。

在lpu的ssp模式下多次执行pw命令,检查端口BFD所在队列是否出现拥塞。

Hostname#connect lpu 0

Convert name lpu 0 to slot number 0

done

LPU-0>en

password:

LPU-0#

LPU-0#debug ssp

BCM.0> pw

bcmPW.0: Status: Not Running. Mode RX. Buffering up to 0 packets.

Rate limit is 2000 (soc intvl 0).

Reporting is enabled for:

Reporting is disabled for: Count DECode Raw DMA CHannel

Dump options are enabled for:

Dump options are disabled for: Count DECode Raw DMA CHannel

RX on for channel(s): -- using default --

RX Info @ time=2846394593: started. Last fill 2846361260. Thread is running.

+verbose for more info

Pkt Size 2040. Pkts/Chain 4. All COS PPS 2000. Burst 2000. Flags 0.

Sys PPS 0. Sys tokens 0. Sys fill 18116666.

Cntrs: Pkts 4414569. Last start 4414569. Tunnel 0. Owned 4407596.

Bad Hndlr 0. No Hndlr 0. Not Running 0.

Thrd Not Running 0. DCB Errs 0.nextunit:1

Registered callbacks:

SSP RX Priority=255. Argument=0x0. COS 0xffffffffffff.

Packets handled 0, owned 0.

Discard Priority= 0. Argument=0x0. COS 0xffffffffffff.

Packets handled 0, owned 0.

Channel Info

Chan 1 is running: Chains 64. COS 0xff. DCB/pkt 1. active chains 64

rpkt 4414569. rbyte 680537578. dpkt 0. dbyte 0. mem fail 0. flags 0.

Queue Info

Queue 0: PPS 200. CurPkts 0. TotPkts 57941. Disc rate 0, qlen 6970, max dcbs 50.

Tokens 800. Fill 2826544593. Max 200. Brst 400. Head 0x0. Tail 0x0.

Queue 1: PPS 250. CurPkts 0. TotPkts 154363. Disc rate 0, qlen 0, max dcbs 50.

Tokens 500. Fill 2845244593. Max 200. Brst 200. Head 0x0. Tail 0x0.

Queue 2: PPS 500. CurPkts 0. TotPkts 1098533. Disc rate 0, qlen 0, max dcbs 100.

Tokens 1199. Fill 2846044593. Max 200. Brst 600. Head 0x0. Tail 0x0.

Queue 3: PPS 600. CurPkts 0. TotPkts 25823. Disc rate 0, qlen 0, max dcbs 256.

Tokens 2048. Fill 2845544593. Max 200. Brst 1024. Head 0x0. Tail 0x0.

Queue 4: PPS 1000. CurPkts 0. TotPkts 123150. Disc rate 0, qlen 0, max dcbs 256.

Tokens 2048. Fill 2845944593. Max 200. Brst 1024. Head 0x0. Tail 0x0.

Queue 5: PPS 400. CurPkts 0. TotPkts 98684. Disc rate 0, qlen 0, max dcbs 256.

Tokens 1200. Fill 2846361260. Max 200. Brst 600. Head 0x0. Tail 0x0.

Queue 6: PPS 300. CurPkts 0. TotPkts 260415. Disc rate 0, qlen 0, max dcbs 256.

Tokens 600. Fill 2845744593. Max 200. Brst 300. Head 0x0. Tail 0x0.

Queue 7: PPS 100. CurPkts 0. TotPkts 0. Disc rate 0, qlen 0, max dcbs 256.

Tokens 600. Fill 2341211261. Max 200. Brst 300. Head 0x0. Tail 0x0.

Queue 8: PPS 1500. CurPkts 0. TotPkts 0. Disc rate 0, qlen 0, max dcbs 256.

Tokens 3000. Fill 2341211261. Max 200. Brst 1000. Head 0x0. Tail 0x0.

Queue 9: PPS 2000. CurPkts 0. TotPkts 2588690. Disc rate 0, qlen 0, max dcbs 256.

Tokens 4000. Fill 2846044593. Max 200. Brst 1000. Head 0x0. Tail 0x0.

检查BFD所在队列6的qlen计数,如果计数非0,说明此接口出现过拥塞,如果持续增长,说明在执行命令期间出现拥塞导致丢包。

如果队列出现丢包,请检查丢包原因; 如果队列没有丢包,请执行步骤3。

③检查BFD参数配置是否合理。

执行show bfd session detail命令,检查BFD参数配置是否合理。

Hostname#show bfd session detail

OurAddr NeighAddr LD/RD State Holddown interface

10.10.51.1 10.10.51.2 789/49533 UP 30 vlan2101

Type:direct

Local State:UP Remote State:UP Up for: 0h:15m:11s Number of times UP:1

Send Interval:10ms Detection time:30ms(10ms*3)

Local Diag:0 Demand mode:0 Poll bit:0

MinTxInt:10 MinRxInt:10 Multiplier:3

Remote MinTxInt:10 Remote MinRxInt:10 Remote Multiplier:3

Registered protocols:OSPF

Agent session info:

Sender:slot 0 Recver:slot 0

上述信息说明,BFD报文的发送和接收最小时间间隔为10毫秒,连续3次收不到报文就会超时,在实际网络中超时时间过小会导致BFD振荡。

如果 BFD的参数配置不合理,请修改配置; 如果BFD的参数配置合理,请执行步骤4。

④如果以上步骤仍不能排除故障,请联系厂家工程师;同时注意收集如下信息。

上述步骤的执行结果。

两端设备的配置文件(show running-config)、日志信息(show loggingshow logging buffer)、接口信息(show interface vlan-id)、故障情况下的其他信息(show cpushow processshow poolshow ip ospf neighborshow ip rip databaseshow ip socketsshow bfd session detailshow bfd client)。

故障案例:静态路由关联BFD配置错误问题

用户组网如下图,设备Device1接口vlan2 ip地址为10.0.1.5/30,Device2接口vlan2 ip地址为10.0.1.6/30。两台设备间配置静态路由,并配置静态路由关联 BFD,发现BFD会话无法创建。

亲手教你排除迈普设备BFD故障

故障分析:⑴执行show bfd sessionshow bfd client命令,检查是否创建BFD会话。

Device1#show bfd session

OurAddr NeighAddr LD/RD State Holddown interface

没有创建BFD会话。

Device1# show bfd client

ClientName SessionNumber

OSPF 0

BGP 0

isis 0

TRACK 0

RIP 0

STATICRT统计信息SessionNumber为0,判断静态路由没有关联BFD。

⑵执行show running-config interface vlan vlan-id和show running-config ip route 命令,检查接口配置、BFD和静态路由关联配置。

Device1# show running-config interface vlan 2

Building Configuration...

interface vlan2

ip address 10.0.1.6 255.255.255.252

bfd min-transmit-interval 300

bfd min-receive-interval 300

Device1#show running-config ip route

ip route static bfd vlan2 10.0.1.5

ip route 10.0.2.0 255.255.255.0 10.0.1.5

上述信息说明,静态路由的下一跳没有指定接口,关联BFD配置错误。

解决方案:配置静态路由关联BFD,需要指定接口,配置如下:

Device1#config terminal

Device1(config)#ip route 10.0.2.0 255.255.255.0 vlan2 10.0.1.5

Device1(config)#ip route static bfd vlan2 10.0.1.5

正确配置后再次检查BFD会话。

Device1#show bfd session detail

Total session number: 1

OurAddr NeighAddr LD/RD State Holddown interface

10.0.1.5 10.0.1.6 48/359 UP 1500 vlan2

Type:direct

Local State:UP Remote State:UP Up for: 0h:50m:41s Number of times UP:3

Send Interval:300ms Detection time:1500ms(300ms*5)

Local Diag:1 Demand mode:0 Poll bit:0

MinTxInt:300 MinRxInt:300 Multiplier:5

Remote MinTxInt:300 Remote MinRxInt:300 Remote Multiplier:5

Registered protocols:STATICRT

Agent session info:

l2info_state:0x5

VLAN:2 SMAC:0001.7adf.1524 DMAC:0001.7a58.f7d6 AG:769 AP:289 Slot:6

Sender:slot 6 Recver:slot 6

总结:BFD会话没有创建,大多数情况下为配置问题导致,处理这类故障时应首先检查配置。