Skip to content

Linux kernel 2.6.9 and bind 9.2.4

After upgrading my Linode to Linux kernel 2.6.9 for the file-system access control lists, my bind 9.2.4 DNS server has become very unstable. The daemon will run perfectly for a few hours, but will suddenly begin consuming all the CPU. The strange thing is that the process’s memory usage remains stable and none of the other bind child processes seem to be affected. My Linode does stop responding to DNS requests, though, so I have to forcibly kill the server processes and restart the daemon.

I’ve tried upgrading to bind 9.3.0 to no avail, and running bind in the highest debug mode has yielded very little information. According to the logs I produced over a three-day debugging period, the server’s failure is initiated by a normal DNS request. My best guess is that loading uncached DNS entries into memory is killing bind, but I’m not sure. To make matters worse, bind is refusing to let me strace it.

Help?

UPDATE - 12/14/2004 - Well, after upgrading to a new revision of kernel 2.6.9 and disabling some of the filtering processes on the server, my bind processes no longer go crazy. Although I still don’t know for sure what caused the problem, I suspect that bind doesn’t like being paged.

Linode uses a custom I/O token distribution mechanism which rate-limits the amount of I/O a single account can use in order to prevent a DOS attack on other clients on the same server. However, such a mechanism really hurts Linodes that heavily swap. With my memory-intensive filtering processes nearly always active, my Linode’s kernel was forced to swap the usually sleeping bind process. Whenever a DNS request was sent to my server, the kernel had to pull the bind process back into memory. Unfortunately, my Linode was usually low on I/O tokens at that precise moment, resulting in a troubling situation: the kernel couldn’t page in the process. I’m pretty sure this is what caused my problems.

Post a Comment

Your email is never published nor shared. Required fields are marked *