Trying a network install, the installer cannot download a list of mirrors. The network is configured correctly and working, since wget on the tty2 console can download the url with the mirror list. Also, introducing manually the name of a mirror doesn't work. In both cases, the error message in the log is "unknown host mirrors.mageia.org" or "unknown host <name of the mirror>". If I introduce the IP address of a mirror, installation goes on. The error message "unknown host" comes from dns.c in stage-1, in the part conditionally compiled with dietlibc (#if defined(__dietlibc__)) http://svnweb.mageia.org/soft/drakx/trunk/mdk-stage1/dns.c?revision=451&view=markup so I suspect the function "mygethostbyname" doesn't work.
Note: tried both the x86_64 and the i586 installer, both do the same.
Hello thanks for the bug report. Have you tried with cauldron?
Source RPM: (none) => drakx-installer-images
You mean you are still in the stage1, not the graphical stage2?
Luca: this was with a "special pxe setup", and no ordinary mageia install was it not? Please be _specific_ about what you do as normal netinstalls does not have dns problems...
CC: (none) => tmb
Ok, trying to be more specific. I downloaded the network install boot.iso here http://www.mageia.org/en/downloads/dl.php?product=mageia-1-netboot-i586 loop mounted the iso to see if it had some special requirement (e.g., like the loop mounted squashfs for the live cd), and seeing that it only has files in isolinux (i.e. kernel images and initrd), I copied those to boot via pxe. So, yes, it's a special install, but, no, there should be no difference between what I did and booting from the same image on a cd (after all it just amount to loading a kernel and an initrd). @Manuel Hiebel, no, I didn't try with cauldron, but dns.c hasn't changed in 10 months.
Actually I first tried with the x86_64 iso http://www.mageia.org/en/downloads/dl.php?product=mageia-1-netboot-x86_64
@Thierry, yes, still in stage 1. I only manage to download (and enter) stage 2 if I specify the ip address of a mirror (it's currently installing).
could this be some kind of dietlibc issue? i don't see how pxe could be different to it, except that with dhcp, the lease is somehow copied maybe dns regarding that? but wget works, so...
CC: (none) => alien
(In reply to comment #8) > but wget works, so... Don't know how significant it is but wget (busybox) uses getaddrinfo, not gethostbyname.
afaik getaddrinfo can handle alot more and complex things, like ipv6 and multiple ip addresses... i wonder if your network has some special things that most networks don't have. not that this shouldn't be fixed, though
Maybe gethostbyname is buggy in dietlibc? The only "special" thing in my network is that I use maradns as a dns server instead of bind, however I tried editing /etc/resolv.conf (in the tty2 console) to point to 8.8.8.8, the result is the same (wget works, stage-1 says "unknown host").
do you have ipv6 routing? ipv6 should be the preferred setup, so maybe it tries ipv6 first or something, perhaps you could tcpdump your network and see what kind of DNS calls it does make... and how they differ...
we should really switch to getaddrinfo()
yes, but not enough time now...
(In reply to comment #12) > perhaps you could tcpdump your network and see what kind of DNS calls it does > make... and how they differ... Good idea. It turns out that mirrors.mageia.org is a CNAME, an mygethosbyname isn't capable to resolve that (apparently when it receives a CNAME replies it tries again, then it retries appending the local domain): 12:12:07.792316 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 50152+ A? mirrors.mageia.org. (36) 12:12:07.793615 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 50152- 2/0/0 CNAME[|domain] 12:12:12.786617 arp who-has 192.168.10.26 tell lspro.ventoso.local 12:12:12.787473 arp reply 192.168.10.26 is-at dc:0e:a1:4e:bf:36 (oui Unknown) 12:12:18.809065 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 50152+ A? mirrors.mageia.org. (36) 12:12:18.810307 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 50152- 2/0/0 CNAME[|domain] 12:12:23.816691 arp who-has lspro.ventoso.local tell 192.168.10.26 12:12:23.816759 arp reply lspro.ventoso.local is-at 00:16:01:41:ad:18 (oui Unknown) 12:12:25.820668 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 32618+ A? mirrors.mageia.org.ventoso.local. (50) 12:12:25.822116 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 32618 NXDomain*- 0/1/0 (97) 12:12:36.837611 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 32618+ A? mirrors.mageia.org.ventoso.local. (50) 12:12:36.839068 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 32618 NXDomain*- 0/1/0 (97) 12:12:41.836626 arp who-has 192.168.10.26 tell lspro.ventoso.local 12:12:41.837487 arp reply 192.168.10.26 is-at dc:0e:a1:4e:bf:36 (oui Unknown) 12:12:43.849549 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 9037+ A? mirrors.mageia.org.ventoso.local. (50) 12:12:43.850990 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 9037 NXDomain*- 0/1/0 (97) 12:12:54.866563 IP 192.168.10.26.41258 > lspro.ventoso.local.domain: 9037+ A? mirrors.mageia.org.ventoso.local. (50) 12:12:54.868215 IP lspro.ventoso.local.domain > 192.168.10.26.41258: 9037 NXDomain*- 0/1/0 (97) While busybox (using ping here) resolves it correctly: 12:14:40.620419 IP 192.168.10.26.37396 > lspro.ventoso.local.domain: 2+ AAAA? mirrors.mageia.org. (36) 12:14:40.621493 IP lspro.ventoso.local.domain > 192.168.10.26.37396: 2- 1/0/0 CNAME[|domain] 12:14:40.622963 IP 192.168.10.26.47306 > lspro.ventoso.local.domain: 3+ AAAA? alamut.mageia.org. (35) 12:14:40.624247 IP lspro.ventoso.local.domain > 192.168.10.26.47306: 3- 1/0/0 AAAA[|domain]
CC: (none) => thierry.vignaudSource RPM: drakx-installer-images => drakx-installer-binaries
Created attachment 1348 [details] capture file of tcpdump This is a capture made by tcpdump, so it's easier to analyze with wireshark. In the last trace (name resolution made by busybox/ping), I forgot to add the last packet (query for the A record instead of AAAA), which is present in this capture. Note that in all the replies for the A record, there's both the CNAME and the A record for alamut.mageia.org.
CC: (none) => mageiaSummary: Installer cannot resolve host names => Installer stage1 cannot resolve host names
The only thing I noticed in your dns answer is the flags, the request asked for recursion and your dns replies that recursion was not desired and that it is not supported. dietlibc has some code checking it, and it is too late for me to think clearly enough to know if this is the reason, but it may ignore the answer because of it: if ((inpkg[2]&0xf9) != (_res.options&RES_RECURSE?0x81:0x80)) continue; /* not answer */ Anyway, I think this is a bug in your server as it should not give the answer if it says it doesn't do recursion.
CC: (none) => pterjan
It's possible that the server is buggy but 1) busybox manages to resolve those hosts anyway and 2) I modified /etc/resolv.conf to point to 8.8.8.8 with the same result truth to be told, I also removed the "search ventoso.local" from /etc/resolv.conf but it still tried to resolve mirrors.mageia.org.ventoso.local
I'll have to do some more tests, but it's possible that's a configuration error on my part: I specify both my local server and 8.8.8.8 as nameservers, it's possible that mygethostbyname only contacts the first server, sees the "no recursion available" (though neither dig nor nslookup complain about it) and gives up, while busybox contacts the second server (8.8.8.8) and gets an answer.
The answer comes from 192.168.10.5 I found http://osdir.com/ml/linux.lib.dietlibc/2003-11/msg00003.html which is consistent with what I saw. Me can probably remove this line as it enforce some consistency in the answer which other tools don't seem to care about, so it is probably safe.
RFC 1035 reads: RD Recursion Desired - this bit may be set in a query and is copied into the response. If RD is set, it directs the name server to pursue the query recursively. Recursive query support is optional. So not having the bit set is definitely a maradns bug. That said, I'd vote for removing the check in dietlibc.
maradns-1.3.11: Bugfix: RD value is now correctly echoed to client again Bugfix: RA bit has (generally) a reasonably sensible value, since some embedded devices actually check this bit. (2008.03.23) So either you have a very old version or they broke it again :)
Yes, I have a very old version (long story) with this bug *but* I also specify an alternate nameserver (8.8.8.8). A working resolver should either ignore the bug or try the next server, e.g.: $ nslookup mirrors.mageia.org ;; Got recursion not available from 192.168.10.5, trying next server Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: mirrors.mageia.org canonical name = alamut.mageia.org. Name: alamut.mageia.org Address: 212.85.158.146 So there's a bug in maradns, but there's also a bug in stage-1 (and confusing, since busybox does the "right thing").
Additional test: with only 192.168.10.5 as a nameserver, stage-1 fails and busybox works, with only 8.8.8.8 both work. So apparently busybox is ignoring the RD and RA bits.
I have treid this with the latest release of the boot.iso also boot_unfree.iso , but always the same result. When i choose static it works.
CC: (none) => Tecardo
Wait, i have changed dns and it worked.
(In reply to comment #13) > we should really switch to getaddrinfo() Patch welcome :-)
@tmb: ping :-)
Created attachment 1778 [details] use getaddrinfo() Can you try with this patch?
Keywords: (none) => NEEDINFO, PATCH
In the meantime I upgraded maradns, but I can revert to the old version. Where may I download a stage1 built with this patch?
You'll need to: 1) rebuild drakx-installer-binaries with that patch 2) install the patched package 3) rebuild drakx-installer-images 4) use boot.iso from the rebuild drakx-installer-images
Have've you tried?
No, sorry, tight work schedule leaves me no time for anything else.
so, i assume the patch has been included and the images have been rebuild by now... perhaps the original user with broken DNS implementation can try again?
that was me. I can try it. but I don't have time until next week.
it would be nice to have it tested before release freeze (i think 14th april)
No it hasn't been merged since I got no returns.
ah. can we merge it and if it should fail revert them back? i'm guessing it's likely easier to test for most people.
Sorry, too busy with work, with a prebuilt image I could probably test it.
Version: 1 => Cauldron
Severity: major => normal
(In reply to comment #29) > Created attachment 1778 [details] > use getaddrinfo() > > Can you try with this patch? (In reply to comment #31) > You'll need to: > 1) rebuild drakx-installer-binaries with that patch > 2) install the patched package > 3) rebuild drakx-installer-images > 4) use boot.iso from the rebuild drakx-installer-images Has anyone tried this, now?
CC: (none) => marja11
Hi, This bug was filed against cauldron, but we do not have cauldron at the moment. Please report whether this bug is still valid for Mageia 2. Thanks :) Cheers, marja
Just fixed in cauldron. Just wait for next boot.iso
Status: NEW => RESOLVEDResolution: (none) => FIXED
Can you confirm if it works better for you with cauldron stage1?
ping?
I tried this boot.iso http://ftp.fi.muni.cz/pub/linux/mageia/distrib/cauldron/i586/install/images/boot-nonfree.iso which says "June 5" so I suppose is after the fix. and I see no change (i.e. doesn't resolve with buggy maradns as primary and google as secondary, it does with google as primary). I also tried the boot.iso from mageia 2, same problem.
It says "June 5" in the banner, the file date on the mirror is June 20.
(In reply to comment #45) > I tried this boot.iso > > http://ftp.fi.muni.cz/pub/linux/mageia/distrib/cauldron/i586/install/images/boot-nonfree.iso > > which says "June 5" so I suppose is after the fix. > and I see no change (i.e. doesn't resolve with buggy maradns as primary and > google as secondary, it does with google as primary). > I also tried the boot.iso from mageia 2, same problem. That's normal. When more than one nameserver is listed in /etc/resolv.conf, lookups stop with the first name server that responds, provided the response is not servfail. If the maradns server is running, but for some reason responding with not found, lookups stop there. Where is the "buggy" maradns server running, any what's wrong with it?
CC: (none) => davidwhodgins
CC: (none) => remco
that's the original problem. you can read about it above.
As maradns maintainer, I'm interested in hearing what version of maradns is giving these problems.
@Remco, don't worry, it's a very old version that I had on my hacked linkstation lspro, anyway the problem is described in comment #17 and in comment #22 @Dave, see the original report and comment #15, under the same conditions stage-1 doesn't resolve the name but busybox does.