Bug 24212

Summary: named using 100% cpu and then locks up
Product: Mageia Reporter: Tom Cox <tomc>
Component: Release (media or process)Assignee: Guillaume Rousse <guillomovitch>
Status: RESOLVED WONTFIX QA Contact:
Severity: critical    
Priority: Normal CC: sysadmin-bugs, tmb
Version: 6   
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Source RPM: bind-9.10.8.P1-1.mga6 CVE:
Status comment:

Description Tom Cox 2019-01-20 16:59:20 CET
A little more than a week ago, the named daemon started locking up after a couple of hours of running. Nothing has changed on my system other than some unrelated updates. I have figured out and corrected the problem. It is in the Mageia 6 rpm package.  Here's more information and what I had to fix.

I noticed that prior to the lock up, named was using 100% of one of the CPU cores.  It functions fine until the lock up.  Shortly after the lock up, it does not respond to anything; kill -TERM, rndc stop, etc.  There are no warning/error messages in the system log.  Turning on trace did not show any warning/error messages.  To use trace, I had to run named in the foreground because the lock/unlock messages came so fast that just a couple of seconds would create a 50M file.

I did notice in the logs that there were some indications of possible configuration issues: 

named[10926]: the working directory is not writable
named[29103]: checkhints: b.root-servers.net/A (199.9.14.201) missing from hint

But those have been there from the beginning, so I didn't see them as something to chase down immediately.  Besides, named works fine up until it locks up.

After a lot of using strace and debugging, I wasn't getting anywhere. I kept coming back to the fact that nothing has changed on my system.  Along the way to figuring out the real problem, I also found and resolved some other issues.  All of these are in the rpm package Mageia 6 contains. I started by running named-checkconf on the original rpm file.

named-checkconf /etc/named.conf.orig
/etc/named.conf.orig:22: dnssec-lookaside 'auto' is no longer supported
/etc/named.root.key:1: managed-key for root from 2010 without updated managed-key from 2017

1. The named.conf file contains 'include "/etc/named.root.key";'  This was what was causing named to spin.  The file contains an old root key.  According to the ISC, this key started being phased out in 2017 with final revocation Jan 11, 2019.  Gee, that's when my problem started.  See  https://www.isc.org/downloads/bind/bind-keys/.

I noticed that named.conf contained bindkeys-file "/etc/named.iscdlv.key";
which contains both the old key and the new one.  The named documentation also notes that the keys are built in to named.  Commenting out the include "/etc/named.root.key" solved my problem.  named stop using 100% cpu and hasn't locked up since.

2. For dnssec-lookaside, the auto option is no longer supported.  I changed it to no.

3. The working directory message was about /var/named.  The owner.group was root.root.  I changed it to named.named.

4. The root zone file, named.ca, included in Mageia 6 is really old.  I used dig to get the current root servers and replaced named.ca with that.  There are no more checkhints messages in the log and there are quite a few more root servers.

As I noted, everything is fine now.  I just wanted to let you know about the issue.
Comment 1 Thomas Backlund 2019-01-20 20:04:49 CET
Thanks for the detailed report.

Assignee: bugsquad => guillomovitch
CC: (none) => tmb

Comment 2 Guillaume Rousse 2019-01-30 22:23:24 CET
I just fixed all those issues (excepted the usage of deprecated dnssec-lookaside directive, which was already fixed) in latest cauldron package, bind-9.11.5.P1-6.mga7.

Regarding mageia 6, I'm a bit more relunctant to provide a package update, because none of those issues are actually security issues, and all of them occurs in configuration files, meaning they can be manually corrected. A package update may as well bring enhancements than hurt running configurations. Opinions welcome here.

Status: NEW => ASSIGNED

Comment 3 Tom Cox 2019-01-30 23:06:19 CET
I'm OK with that.  Anyone looking for answers to this issue will probably come across this bug page and note the named.root.key fix which caused the original problem that got me here.

I will also note that technically, trying to use a no longer valid DNSKEY could be considered a security issue.  However, I think it is a very minor one.  It's up to you to decide whether or not to address it in Mageia 6.
Comment 4 Guillaume Rousse 2019-03-02 10:18:25 CET
Fixed in cauldron, not in mageia 6.

Status: ASSIGNED => RESOLVED
Resolution: (none) => WONTFIX