kernel: TCP: time wait bucket table overflow

在运行Tomcat系统上,访问量高峰期或者搞活动时,偶尔显示504页面用F5刷新一下就正常。这时在服务器的syslog里出现以下内核的错误。

 kernel: TCP: time wait bucket table overflow kernel: __ratelimit: 1679 callbacks suppressed 

用netstat命令查看的结果有接近6000的连接处于TIME_WAIT的状态。

 # netstat -tan | grep ’:80 ’ | awk ’{print $6}’ | sort | uniq -c 324 ESTABLISHED 4 FIN_WAIT1 1 LISTEN 23 SYN_RECV 5813 TIME_WAIT 

Cacti的信息基本和netstat命令确认的结果一致。

kernel:TCP:timewaitbuckettableoverflow

TIME_WAIT?

  • TIME_WAIT是本地端主动关闭后一定会出现的状态
  • TIME_WAIT状态并不是表示在等待另一端的响应

kernel:TCP:timewaitbuckettableoverflow

比如,Inisiator为Web服务器Receiver为浏览器时,Web服务器的状态处于TIME_WAIT时浏览器应该是处于等待CLOSED状态。

net.ipv4.tcp_max_tw_buckets

net.ipv4.tcp_max_tw_buckets是控制系统允许处于TIME_WAIT状态的套接字(Sockets)的最大数,这个限制是为了防止Dos(denial-of-service)攻击而存在。默认值是NR_FILE*2,并且会根据系统的内存容量被调整。如果超过net.ipv4.tcp_max_tw_buckets上限,该连接被关闭,并且输出该信息到syslog(/var/log/messages)。

CnetOS7配置2核4GB的内核参数net.ipv4.tcp_max_tw_buckets的默认值是5000,而配置4核8GB的服务器的默认值是6000。

以下是在2核4GB的服务器上的确认结果。

 # sysctl -a | grep net.ipv4.tcp_max_tw_buckets net.ipv4.tcp_max_tw_buckets = 5000 

TIME_WAIT相关参数

需修改以下2个跟TIME_WAIT相关的内核参数。

1)net.ipv4.tcp_tw_recycle

This enables fast recycling of TIME_WAIT sockets. The default value is 0 (disabled). Should be used with caution with loadbalancers.

  • 快速收回处于TIME_WAIT状态的套接字(Sockets)
  • 默认是0(无效)
  • 在使用负载均衡器的环境,需谨慎修改该参数

2)net.ipv4.tcp_tw_reuse

This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle. Note: The tcp_tw_reuse setting is particularly useful in environments where numerous short connections are open and left in TIME_WAIT state, such as web servers and loadbalancers. Reusing the sockets can be very effective in reducing server load. Starting in Linux 2.6.7 (and back-ported to 2.4.27), linux includes alternative congestion control algorithms beside the traditional ‘reno’ algorithm. These are designed to recover quickly from packet loss on high-speed WANs.

  • 从协议安全性,允许将处于TIME_WAIT状态的套接字(Sockets)用于新的TCP连接
  • 默认是0(无效)
  • 一般来讲这是比修改tcp_tw_recycle安全

修改内核参数

1)在/etc/sysctl.conf文件的最尾部,追加以下参数。

 # vi /etc/sysctl.conf net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 

2)使修改的内核参数生效。

 # sysctl -p 

确认结果

再次用netstat命令确认的结果,处于TIME_WAIT状态的连接数变为3位数。

 # netstat -tan | grep ’:80 ’ | awk ’{print $6}’ | sort | uniq -c 1 CLOSE_WAIT 14 ESTABLISHED 3 FIN_WAIT1 1 LISTEN 1 SYN_RECV 531 TIME_WAIT