nf_conntrack: table full, dropping packet

Today I found an online server in monitoring channel, alert saying Timtout connection to xxx.xxx.com which is one of our production entrance servers and then the story began …

1. Phenomenon and disk issue?

  1. It took me over 4 seconds to SSH connect to this production server. For other production servers can be connected in less than 1 second. I also notice there are 50% packet loss to the target server.
  2. Since this entrance server is very lack of disk, initially I was thinking it’s disk issue, so I deleted some files and then restart the process. However, doesn't help. I started to think, it could be a network issue.
  3. I noticed kern.log has error as next, and I steady confirmed it must be a network issue. nf_conntrack: table full, dropping packets.

2. Solve the problem

After Googling it, I knew that conntrack is for stateful firewall.

Pls read Netfilter’s connection tracking system if you are interested. It also include the Netfilter framework basic.

So, in one word, conntrack is created to record connection state to inspect into traffic and avoid DDoS security issue.

2.1. Just tell me how to solve it

From the error above, we can know conntrack table is full. How to review the table size? By typing cat /proc/sys/net/netfilter/nf_conntrack_count. We can get the size.

1
2
root@localhost:/# cat /proc/sys/net/netfilter/nf_conntrack_count
76390

What’s the maximum size? You can get it by typing cat /proc/sys/net/netfilter/nf_conntrack_max.

Let’s just increase it. Recommended size: CONNTRACK_MAX = RAMSIZE (in bytes) / 16384 / (ARCH / 32). Eg, I have 8GB RAM in x86_64 OS, so I made it as 8*1024^3/16384/2=262144, which is of course larger as the nf_conntrack_count.

1
2
sysctl -w net.netfilter.nf_conntrack_max=262144
echo "net.netfilter.nf_conntrack_max=262144" >> /etc/sysctl.conf

Just after that, it works. Network latency becomes good now and no packet loss.

2.2. What if it really exceed this max limit?

  • Option 1. We can remove the module of state, but that will make iptables not providing with full compatible APIs.
  • Option 2. We can use RAW iptable without using CONNTRACK feature.
    RAW table is only applied to PREROUTING and OUTPUT chain. Since it has the highest priority (raw-->mangle-->nat-->filter), so it can handle the connection before tracking mangement. Once after we handle the connection using RAW table, we will skip NAT table and ip_conntract handler.

2.3. How to do without track state?

  1. Review of IPtables, iptables has 4 tables and 5 chains as below graph:

    1. Tables: categorized by different operations to data packets.
      • raw: highest priority, only appied to PREROUTING and OUTPUT chain. When we don’t need to do NAT, we can use RAW table to increase performance.
      • mangle: modify certain data packet
      • nat: NAT, port mapping, address mapping
      • filter: filter
    2. Chains: categorized by different hooks.
      • PREROUTING: packet before going to route table
      • INPUT: after packet passing route table, destination is current machine
      • FORWARDING: after packet passing route table, destination is not current machine
      • OUTPUT: packet comes from current machine and to outside
      • POSTROUTIONG: packet before going to network interface
  2. Mark UNTRACKED connection will be accept:
    CentOS: Change /etc/sysconfig/iptables file, and append UNTRACKED after line of RH-Firewall-1-INPUT.
    To make it as -A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED,UNTRACKED -j ACCEPT
    Other Linux :

    1
    $ sudo iptables -A FORWARD -m state --state UNTRACKED -j ACCEPT
  3. Use raw table rules on these ports.

    1
    2
    3
    4
    5
    # mark destination port and source port as NOTRACK
    $ sudo iptables -t raw -A PREROUTING -p tcp -m multiport --dport 80,81,82 -j NOTRACK
    $ sudo iptables -t raw -A PREROUTING -p tcp -m multiport --sport 80,81,82 -j NOTRACK
    $ iptables -t raw -A OUTPUT -p tcp -m multiport --dports 80,81,82 -j NOTRACK
    $ iptables -t raw -A OUTPUT -p tcp -m multiport --sports 80,81,82 -j NOTRACK

    If you have only one port, use

    1
    2
    3
    4
    $ iptables -t raw -A PREROUTING -p tcp -m tcp --dport 80 -j NOTRACK
    $ iptables -t raw -A OUTPUT -p tcp -m tcp --sport 80 -j NOTRACK
    $ iptables -t raw -A PREROUTING -p tcp -m tcp --sport 80 -j NOTRACK
    $ iptables -t raw -A OUTPUT -p tcp -m tcp --dport 80 -j NOTRACK

3. Conclusion

  1. Timeout connection can’t be a disk issue, if it’s disk issue, it will report Server Internal Error from monitoring probe.
  2. Iptables 4 Table 5 Chains: 4 table: raw–>mangle–>nat–>filter . 5 Chain: PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING.
  3. When we don’t need to do NAT, we can use RAW table to increase performance(eg. Web port). But we need extra DDoS protection method. Remember we need bidirectional NOTRACK setup on RAW table.
  4. Use sysctl -w net.netfilter.nf_conntrack_max=262144 to solve it immediately. Size calculation, pls refer to above equation.

Ref:

  1. http://www.pc-freak.net/blog/resolving-nf_conntrack-table-full-dropping-packet-flood-message-in-dmesg-linux-kernel-log/
  2. http://people.netfilter.org/pablo/docs/login.pdf
  3. https://wiki.mikejung.biz/Sysctl_tweaks#net.core.netdev_max_backlog
  4. http://blog.51cto.com/wushank/1171768
  5. http://www.361way.com/%E5%86%8D%E7%9C%8Bnf_conntrack-table-full%E9%97%AE%E9%A2%98/2404.html
0%