Bug 4056 - Installer stage1 cannot resolve host names
Summary: Installer stage1 cannot resolve host names
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO, PATCH
Depends on:
Blocks:
 
Reported: 2012-01-07 15:46 CET by Luca Olivetti
Modified: 2012-06-26 08:54 CEST (History)
9 users (show)

See Also:
Source RPM: drakx-installer-binaries
CVE:
Status comment:


Attachments
capture file of tcpdump (3.29 KB, application/octet-stream)
2012-01-08 18:31 CET, Luca Olivetti
Details
use getaddrinfo() (2.56 KB, patch)
2012-03-16 13:55 CET, Thierry Vignaud
Details | Diff

Description Luca Olivetti 2012-01-07 15:46:38 CET
Trying a network install, the installer cannot download a list of mirrors.
The network is configured correctly and working, since wget on the tty2 console can download the url with the mirror list.
Also, introducing manually the name of a mirror doesn't work.
In both cases, the error message in the log is "unknown host mirrors.mageia.org" or "unknown host <name of the mirror>".

If I introduce the IP address of a mirror, installation goes on.

The error message "unknown host" comes from dns.c in stage-1, in the part conditionally compiled with dietlibc (#if defined(__dietlibc__))

http://svnweb.mageia.org/soft/drakx/trunk/mdk-stage1/dns.c?revision=451&view=markup

so I suspect the function "mygethostbyname" doesn't work.
Comment 1 Luca Olivetti 2012-01-07 15:47:34 CET
Note: tried both the x86_64 and the i586 installer, both do the same.
Comment 2 Manuel Hiebel 2012-01-07 17:30:20 CET
Hello thanks for the bug report.
Have you tried with cauldron?

Source RPM: (none) => drakx-installer-images

Comment 3 Thierry Vignaud 2012-01-07 17:45:08 CET
You mean you are still in the stage1, not the graphical stage2?
Comment 4 Thomas Backlund 2012-01-07 20:07:50 CET
Luca: this was with a "special pxe setup", and no ordinary mageia install was it not? 

Please be _specific_ about what you do as normal netinstalls does not have dns problems...

CC: (none) => tmb

Comment 5 Luca Olivetti 2012-01-07 20:45:31 CET
Ok, trying to be more specific.
I downloaded the network install boot.iso here
http://www.mageia.org/en/downloads/dl.php?product=mageia-1-netboot-i586

loop mounted the iso to see if it had some special requirement (e.g., like the loop mounted squashfs for the live cd), and seeing that it only has files in isolinux (i.e. kernel images and initrd), I copied those to boot via pxe.
So, yes, it's a special install, but, no, there should be no difference between what I did and booting from the same image on a cd (after all it just amount to loading a kernel and an initrd).

@Manuel Hiebel, no, I didn't try with cauldron, but dns.c hasn't changed in 10 months.
Comment 6 Luca Olivetti 2012-01-07 20:46:28 CET
Actually I first tried with the x86_64 iso
http://www.mageia.org/en/downloads/dl.php?product=mageia-1-netboot-x86_64
Comment 7 Luca Olivetti 2012-01-07 20:50:15 CET
@Thierry, yes, still in stage 1.
I only manage to download (and enter) stage 2 if I specify the ip address of a mirror (it's currently installing).
Comment 8 AL13N 2012-01-07 21:17:43 CET
could this be some kind of dietlibc issue?

i don't see how pxe could be different to it, except that with dhcp, the lease is somehow copied maybe dns regarding that?

but wget works, so...

CC: (none) => alien

Comment 9 Luca Olivetti 2012-01-07 23:08:05 CET
(In reply to comment #8)

 
> but wget works, so...

Don't know how significant it is but wget (busybox) uses getaddrinfo, not gethostbyname.
Comment 10 AL13N 2012-01-07 23:19:41 CET
afaik getaddrinfo can handle alot more and complex things, like ipv6 and multiple ip addresses...

i wonder if your network has some special things that most networks don't have. not that this shouldn't be fixed, though
Comment 11 Luca Olivetti 2012-01-07 23:23:30 CET
Maybe gethostbyname is buggy in dietlibc?
The only "special" thing in my network is that I use maradns as a dns server instead of bind, however I tried editing /etc/resolv.conf (in the tty2 console) to point to 8.8.8.8, the result is the same (wget works, stage-1 says "unknown host").
Comment 12 AL13N 2012-01-07 23:36:16 CET
do you have ipv6 routing?

ipv6 should be the preferred setup, so maybe it tries ipv6 first or something,

perhaps you could tcpdump your network and see what kind of DNS calls it does make... and how they differ...
Comment 13 Thomas Backlund 2012-01-07 23:36:57 CET
we should really switch to getaddrinfo()
Comment 14 AL13N 2012-01-08 01:54:28 CET
yes, but not enough time now...
Comment 15 Luca Olivetti 2012-01-08 12:18:49 CET
(In reply to comment #12)

> perhaps you could tcpdump your network and see what kind of DNS calls it does
> make... and how they differ...

Good idea.
It turns out that mirrors.mageia.org is a CNAME, an mygethosbyname isn't capable to resolve that (apparently when it receives a CNAME replies it tries again, then it retries appending the local domain):

12:12:07.792316 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  50152+ A? mirrors.mageia.org. (36)
12:12:07.793615 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  50152- 2/0/0 CNAME[|domain]       
12:12:12.786617 arp who-has 192.168.10.26 tell lspro.ventoso.local                                      
12:12:12.787473 arp reply 192.168.10.26 is-at dc:0e:a1:4e:bf:36 (oui Unknown)                           
12:12:18.809065 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  50152+ A? mirrors.mageia.org. (36)
12:12:18.810307 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  50152- 2/0/0 CNAME[|domain]       
12:12:23.816691 arp who-has lspro.ventoso.local tell 192.168.10.26                                      
12:12:23.816759 arp reply lspro.ventoso.local is-at 00:16:01:41:ad:18 (oui Unknown)                     
12:12:25.820668 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  32618+ A? mirrors.mageia.org.ventoso.local. (50)
12:12:25.822116 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  32618 NXDomain*- 0/1/0 (97)
12:12:36.837611 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  32618+ A? mirrors.mageia.org.ventoso.local. (50)
12:12:36.839068 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  32618 NXDomain*- 0/1/0 (97)
12:12:41.836626 arp who-has 192.168.10.26 tell lspro.ventoso.local
12:12:41.837487 arp reply 192.168.10.26 is-at dc:0e:a1:4e:bf:36 (oui Unknown)
12:12:43.849549 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  9037+ A? mirrors.mageia.org.ventoso.local. (50)
12:12:43.850990 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  9037 NXDomain*- 0/1/0 (97)
12:12:54.866563 IP 192.168.10.26.41258 > lspro.ventoso.local.domain:  9037+ A? mirrors.mageia.org.ventoso.local. (50)
12:12:54.868215 IP lspro.ventoso.local.domain > 192.168.10.26.41258:  9037 NXDomain*- 0/1/0 (97)


While busybox (using ping here) resolves it correctly:

12:14:40.620419 IP 192.168.10.26.37396 > lspro.ventoso.local.domain:  2+ AAAA? mirrors.mageia.org. (36)
12:14:40.621493 IP lspro.ventoso.local.domain > 192.168.10.26.37396:  2- 1/0/0 CNAME[|domain]
12:14:40.622963 IP 192.168.10.26.47306 > lspro.ventoso.local.domain:  3+ AAAA? alamut.mageia.org. (35)
12:14:40.624247 IP lspro.ventoso.local.domain > 192.168.10.26.47306:  3- 1/0/0 AAAA[|domain]
Thierry Vignaud 2012-01-08 17:54:35 CET

CC: (none) => thierry.vignaud
Source RPM: drakx-installer-images => drakx-installer-binaries

Comment 16 Luca Olivetti 2012-01-08 18:31:42 CET
Created attachment 1348 [details]
capture file of tcpdump

This is a capture made by tcpdump, so it's easier to analyze with wireshark.
In the last trace (name resolution made by busybox/ping), I forgot to add the last packet (query for the A record instead of AAAA), which is present in this capture.
Note that in all the replies for the A record, there's both the CNAME and the A record for alamut.mageia.org.
Thierry Vignaud 2012-01-08 19:28:20 CET

CC: (none) => mageia
Summary: Installer cannot resolve host names => Installer stage1 cannot resolve host names

Comment 17 Pascal Terjan 2012-01-09 02:29:03 CET
The only thing I noticed in your dns answer is the flags, the request asked for recursion and your dns replies that recursion was not desired and that it is not supported.

dietlibc has some code checking it, and it is too late for me to think clearly enough to know if this is the reason, but it may ignore the answer because of it:

if ((inpkg[2]&0xf9) != (_res.options&RES_RECURSE?0x81:0x80)) continue;        /* not answer */

Anyway, I think this is a bug in your server as it should not give the answer if it says it doesn't do recursion.

CC: (none) => pterjan

Comment 18 Luca Olivetti 2012-01-09 06:09:09 CET
It's possible that the server is buggy but

1) busybox manages to resolve those hosts anyway and
2) I modified /etc/resolv.conf to point to 8.8.8.8 with the same result

truth to be told, I also removed the "search ventoso.local" from /etc/resolv.conf but it still tried to resolve mirrors.mageia.org.ventoso.local
Comment 19 Luca Olivetti 2012-01-09 06:37:12 CET
I'll have to do some more tests, but it's possible that's a configuration error on my part: I specify both my local server and 8.8.8.8 as nameservers, it's possible that mygethostbyname only contacts the first server, sees the "no recursion available" (though neither dig nor nslookup complain about it) and gives up, while busybox contacts the second server (8.8.8.8) and gets an answer.
Comment 20 Pascal Terjan 2012-01-09 09:53:08 CET
The answer comes from 192.168.10.5

I found http://osdir.com/ml/linux.lib.dietlibc/2003-11/msg00003.html which is consistent with what I saw.

Me can probably remove this line as it enforce some consistency in the answer which other tools don't seem to care about, so it is probably safe.
Comment 21 Pascal Terjan 2012-01-09 13:36:51 CET
RFC 1035 reads:

RD              Recursion Desired - this bit may be set in a query and
                is copied into the response.  If RD is set, it directs
                the name server to pursue the query recursively.
                Recursive query support is optional.

So not having the bit set is definitely a maradns bug.

That said, I'd vote for removing the check in dietlibc.
Comment 22 Pascal Terjan 2012-01-09 13:40:37 CET
maradns-1.3.11:
Bugfix: RD value is now correctly echoed to client again
Bugfix: RA bit has (generally) a reasonably sensible value, since some embedded devices actually check this bit.
(2008.03.23)

So either you have a very old version or they broke it again :)
Comment 23 Luca Olivetti 2012-01-09 18:00:25 CET
Yes, I have a very old version (long story) with this bug *but* I also specify an alternate nameserver (8.8.8.8).
A working resolver should either ignore the bug or try the next server, e.g.:

$ nslookup mirrors.mageia.org
;; Got recursion not available from 192.168.10.5, trying next server
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
mirrors.mageia.org      canonical name = alamut.mageia.org.
Name:   alamut.mageia.org
Address: 212.85.158.146

So there's a bug in maradns, but there's also a bug in stage-1 (and confusing, since busybox does the "right thing").
Comment 24 Luca Olivetti 2012-01-09 18:10:21 CET
Additional test: with only 192.168.10.5 as a nameserver, stage-1 fails and busybox works, with only 8.8.8.8 both work.
So apparently busybox is ignoring the RD and RA bits.
Comment 25 Mike Leitner 2012-01-26 13:32:45 CET
I have treid this with the latest release of the boot.iso also boot_unfree.iso , but always the same result. When i choose static it works.

CC: (none) => Tecardo

Comment 26 Mike Leitner 2012-01-26 13:35:56 CET
Wait, i have changed dns and it worked.
Comment 27 Thierry Vignaud 2012-01-26 14:10:17 CET
(In reply to comment #13)
> we should really switch to getaddrinfo()

Patch welcome :-)
Comment 28 Thierry Vignaud 2012-02-22 18:59:50 CET
@tmb: ping :-)
Comment 29 Thierry Vignaud 2012-03-16 13:55:55 CET
Created attachment 1778 [details]
use getaddrinfo()

Can you try with this patch?
Thierry Vignaud 2012-03-16 13:56:06 CET

Keywords: (none) => NEEDINFO, PATCH

Comment 30 Luca Olivetti 2012-03-16 16:43:05 CET
In the meantime I upgraded maradns, but I can revert to the old version.
Where may I download a stage1 built with this patch?
Comment 31 Thierry Vignaud 2012-03-16 17:10:31 CET
You'll need to:
1) rebuild drakx-installer-binaries with that patch
2) install the patched package
3) rebuild drakx-installer-images
4) use boot.iso from the rebuild drakx-installer-images
Comment 32 Thierry Vignaud 2012-03-19 19:07:02 CET
Have've you tried?
Comment 33 Luca Olivetti 2012-03-20 00:20:33 CET
No, sorry, tight work schedule leaves me no time for anything else.
Comment 34 AL13N 2012-04-09 09:21:30 CEST
so, i assume the patch has been included and the images have been rebuild by now...

perhaps the original user with broken DNS implementation can try again?
Comment 35 Mike Leitner 2012-04-09 12:57:14 CEST
that was me. I can try it. but I don't have time until next week.
Comment 36 AL13N 2012-04-09 13:00:03 CEST
it would be nice to have it tested before release freeze (i think 14th april)
Comment 37 Thierry Vignaud 2012-04-09 13:52:55 CEST
No it hasn't been merged since I got no returns.
Comment 38 AL13N 2012-04-09 14:16:52 CEST
ah.

can we merge it and if it should fail revert them back? i'm guessing it's likely easier to test for most people.
Comment 39 Luca Olivetti 2012-04-09 15:07:30 CEST
Sorry, too busy with work, with a prebuilt image I could probably test it.
Thierry Vignaud 2012-04-23 11:58:29 CEST

Version: 1 => Cauldron

Thierry Vignaud 2012-04-23 11:59:34 CEST

Severity: major => normal

Comment 40 Marja Van Waes 2012-05-09 14:59:13 CEST
(In reply to comment #29)
> Created attachment 1778 [details]
> use getaddrinfo()
> 
> Can you try with this patch?

(In reply to comment #31)
> You'll need to:
> 1) rebuild drakx-installer-binaries with that patch
> 2) install the patched package
> 3) rebuild drakx-installer-images
> 4) use boot.iso from the rebuild drakx-installer-images

Has anyone tried this, now?

CC: (none) => marja11

Comment 41 Marja Van Waes 2012-05-26 13:05:10 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja
Comment 42 Thierry Vignaud 2012-06-01 19:50:07 CEST
Just fixed in cauldron.
Just wait for next boot.iso

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 43 Thierry Vignaud 2012-06-05 12:19:19 CEST
Can you confirm if it works better for you with cauldron stage1?
Comment 44 Thierry Vignaud 2012-06-20 10:37:06 CEST
ping?
Comment 45 Luca Olivetti 2012-06-25 19:00:21 CEST
I tried this boot.iso

http://ftp.fi.muni.cz/pub/linux/mageia/distrib/cauldron/i586/install/images/boot-nonfree.iso

which says "June 5" so I suppose is after the fix.
and I see no change (i.e. doesn't resolve with buggy maradns as primary and google as secondary, it does with google as primary).
I also tried the boot.iso from mageia 2, same problem.
Comment 46 Luca Olivetti 2012-06-25 19:01:35 CEST
It says "June 5" in the banner, the file date on the mirror is June 20.
Comment 47 Dave Hodgins 2012-06-26 02:02:06 CEST
(In reply to comment #45)
> I tried this boot.iso
> 
> http://ftp.fi.muni.cz/pub/linux/mageia/distrib/cauldron/i586/install/images/boot-nonfree.iso
> 
> which says "June 5" so I suppose is after the fix.
> and I see no change (i.e. doesn't resolve with buggy maradns as primary and
> google as secondary, it does with google as primary).
> I also tried the boot.iso from mageia 2, same problem.

That's normal.  When more than one nameserver is listed in /etc/resolv.conf,
lookups stop with the first name server that responds, provided the response
is not servfail.

If the maradns server is running, but for some reason responding with not
found, lookups stop there.

Where is the "buggy" maradns server running, any what's wrong with it?

CC: (none) => davidwhodgins

Remco Rijnders 2012-06-26 07:30:17 CEST

CC: (none) => remco

Comment 48 AL13N 2012-06-26 08:04:10 CEST
that's the original problem. you can read about it above.
Comment 49 Remco Rijnders 2012-06-26 08:11:23 CEST
As maradns maintainer, I'm interested in hearing what version of maradns is giving these problems.
Comment 50 Luca Olivetti 2012-06-26 08:54:37 CEST
@Remco, don't worry, it's a very old version that I had on my hacked linkstation lspro, anyway the problem is described in comment #17 and in comment #22

@Dave, see the original report and comment #15, under the same conditions stage-1 doesn't resolve the name but busybox does.

Note You need to log in before you can comment on or make changes to this bug.