Bug 28539 - Full network install fails on both MGA8 and Cauldron with custom local repository
Summary: Full network install fails on both MGA8 and Cauldron with custom local reposi...
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: IN_ERRATA9
Depends on:
Blocks:
 
Reported: 2021-03-06 03:45 CET by Frank Griffin
Modified: 2023-06-20 11:30 CEST (History)
5 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
New real report.bug.xz for cauldron (432.27 KB, application/octet-stream)
2021-03-06 03:50 CET, Frank Griffin
Details
report.bug.xz for princeton MGA8 mirror (535.59 KB, application/octet-stream)
2021-03-06 16:27 CET, Frank Griffin
Details
ethernet report.bug (31.93 KB, application/octet-stream)
2021-03-06 20:54 CET, Frank Griffin
Details
Enable to alter default downloaded (1.38 KB, patch)
2022-01-16 23:24 CET, Thierry Vignaud
Details | Diff
Enable to alter default downloaded (1.90 KB, patch)
2022-01-17 00:20 CET, Thierry Vignaud
Details | Diff
Enable to alter default downloaded (1.91 KB, patch)
2022-01-17 00:33 CET, Thierry Vignaud
Details | Diff
allow to retry downloading packages up to 3 times (1.29 KB, patch)
2022-01-19 01:20 CET, Thierry Vignaud
Details | Diff

Description Frank Griffin 2021-03-06 03:45:03 CET
If you do a full nonfree network install from the nonfree ISO it fails due to missing packages or dependencies.  The error for the first package gives a popup for that package and the option to continue and not see the message again.  If you choose this, the install continues, and then terminates with the same popup but a variable number of packages.  At that point, if you click OK (which is misleading in itself), the install backtracks to the repo selection panel.

This happens both in today's cauldron and the released MGA8 as taken from math.princeton.edu.

To reproduce, boot from the nonfree network ISO, select your repository, select Custom desktop install, and the select every category in Desktop/Server/Workstation and run the install.

I could swear I entered a bug for this recently, but I can't find it.

I will attach the bug report for today's cauldron case.  But this is pretty easy to reproduce, and the list of problem packages changes every time.  I can understand this in cauldron, but I was disappointed to see this behavior in the released MGA8.

I don't think this is an installer issue, but rather a dependency issue between packages.
Comment 1 Frank Griffin 2021-03-06 03:50:22 CET
Created attachment 12425 [details]
New real report.bug.xz for cauldron
Comment 2 Frank Griffin 2021-03-06 03:53:22 CET
I think in the previous report it was found that curl was failing for these packages (or at least some of them) for a bogus reason something like a DNS failure to resolve a repository hostname that had been successfully resolved through the entire install process previously.
Comment 3 Frank Griffin 2021-03-06 04:58:07 CET
Now I'm starting to recall.  The fresh install failed on real hardware but worked in a VBox VM.  It was an HTTP network install against my local copy of the math.princeton repository, but the fact that it now fails against the real math.princeton repository puts it in a different light.
Comment 4 Dave Hodgins 2021-03-06 05:40:14 CET
The local mirror in comment 1 was synced from cauldron, not Mageia 8 ...
selecting kernel-desktop-latest-5.10.20-2.mga9.x86_64

CC: (none) => davidwhodgins

Comment 5 Dave Hodgins 2021-03-06 05:44:06 CET
Also, the mirror was synced while firefox was in the middle of building ...
* adding a reason to already rejected package firefox-en_GB-78.8.0-1.mga9.noarch: unsatisfied firefox[== 0:78.8.0]
Comment 6 Frank Griffin 2021-03-06 05:53:57 CET
(In reply to Dave Hodgins from comment #4)
> The local mirror in comment 1 was synced from cauldron, not Mageia 8 ...
> selecting kernel-desktop-latest-5.10.20-2.mga9.x86_64

No, I tried it twice, the first time using the "Mageia 8" selection in stage 1 and selecting math.princeton.edu from the list, and the second time selecting my own repository sync'd from princeton cauldron.  Both failed, but with a different set of failing packages.

As an aside, why do we give the first error the option to proceed if we are eventually going to prevent the install from completing anyway ?

I can retry and provide the report.bug from the MGA8 princeton attempt, but I don't see the point; the error is the same even if the package list is different.  And it's highly unlikely that this has anything to do with the kernel anyway.
Comment 7 Frank Griffin 2021-03-06 05:57:13 CET
The attachment I provided was for cauldron (as I said), but the error occurs as well on the supposedly stable princeton MGA8 without my local repository involved.
Comment 8 Dave Hodgins 2021-03-06 06:03:24 CET
The local mirror shown in the bug report was synced from a mirror that had synced from mageia cauldron, after mga9 firefox-l10n had been built (in the
release repo), but before mga9 firefox itself had been built.

I'd have to see the bug report from the Mageia 8 install that failed to debug
it.

With Mageia 8 the firefox problem wouldn't happen unless the testing repos were
also enabled. Don't enable the testing repos during an install (with rare
exceptions).
Comment 9 Frank Griffin 2021-03-06 06:11:41 CET
OK, I'll retry the install and make sure the testing repos aren't installed, but I don't recall stage2 giving the option to activate them (I could be wrong, as I just automatically enable everything in the repository list anyway, but I'd be surprised if the official install gave the option to enable testing repos at all).  After all, that isn't something you'd want to enable casual users to do.

Stay tuned...
Comment 10 Dave Hodgins 2021-03-06 06:16:07 CET
(In reply to Frank Griffin from comment #6)
> As an aside, why do we give the first error the option to proceed if we are
> eventually going to prevent the install from completing anyway ?

Not all errors prevent getting to a working installation where the errors
can then be fixed more easily. It depends on what packages happened to be
in the same transaction (when an error occurs none of the packages in the
same transaction are installed). That's why a file conflict during an
upgrade may or may not cause the upgrade to completely fail, so all
file conflicts must be fixed, just in case a critical package happened to
be in the same transaction.
Comment 11 Dave Hodgins 2021-03-06 06:19:36 CET
Also, just fyi, I did a Mageia 8 install using the nonfree iso with the
princeton mirror while testing another problem, with no errors.
Aurelien Oudelet 2021-03-06 12:03:25 CET

Summary: Full network install fails on both MGA8 and Cauldron => Full network install fails on both MGA8 and Cauldron with custom local repository
Source RPM: all => (none)
Assignee: bugsquad => mageiatools
Component: RPM Packages => Installer
CC: (none) => ouaurelien

Comment 12 Morgan Leijström 2021-03-06 12:32:44 CET
(In reply to Frank Griffin from comment #3)
> The fresh install failed on real hardware but
> worked in a VBox VM.

Makes me wonder if it is the default downloader that fail.  Which is netinstaller using?

I have found that i need to switch to wget on one laptop, but not the others, meaning it seem to depend on wifi hardware at least in my case...

If you try that failing setup again, but use ethernet?

Bug 24362 - Change default package downloader to wget

CC: (none) => fri

Comment 13 Frank Griffin 2021-03-06 16:19:41 CET
I was right, stage 2 doesn't allow you to use testing repositories.

Morgan, I think you're onto something with wifi vs ethernet.  In my previous testing, I found that the install worked fine into a VM, and the difference there is that the VM uses ethernet.

I'll try an ethernet install later today.
Comment 14 Frank Griffin 2021-03-06 16:27:58 CET
Created attachment 12429 [details]
report.bug.xz for princeton MGA8 mirror
Comment 15 Dave Hodgins 2021-03-06 18:25:14 CET
* Installation failed, some files are missing:
 http://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/media/core/release/locales-fr-2.32-2.mga8.x86_64.rpm

However the file file is there
locales-fr-2.32-2.mga8.x86_64.rpm	2020-11-26 11:54 	510K

So the command
'/usr/bin/curl' '-q' '-R' '-f' '--disable-epsv' '--connect-timeout' '60' '--anyauth' '--stderr' '-' '-O'
is failing which makes this bug report a dupe of bug 24362

Agreed?
Comment 16 Frank Griffin 2021-03-06 20:51:42 CET
I'm not sure.  bug#24362 describes scenarios that get a lot further than mine, but you're gonna love this....

I tried an install using ethernet (an ethernet USB adapter).  Going against the same Princeton MGA8 repository that with wireless runs all the way through until it aborts with package errors, it fails immediately saying it can't find media.cfg.  The install gives the choice of the wireless and ethernet NICs, so it recognizes both, and it retrieves stage 2 correctly with either.  But with ethernet, it fails directly after formatting the root partition.

I'll attach the ethernet report.bug.

This is very deja vu.  Last year, I bought an Asus Vivobook S15 and had these same problems (both of them).  I've been trying to remember how I got around them, but I did.  I've been happy with the machine, so I just bought another identical to it, and the same problems are back.  I suspect that what I ended up doing was running a wireless install, letting it fail, and then repeatedly running update installs on top of that until one finally completed.  The network works perfectly on the older system now.
Comment 17 Frank Griffin 2021-03-06 20:54:54 CET
Created attachment 12430 [details]
ethernet report.bug
Comment 18 Dave Hodgins 2021-03-06 22:05:53 CET
* '/usr/bin/curl' '-q' '-R' '-f' '--disable-epsv' '--connect-timeout' '60' '--anyauth' '-s' '--stderr' '-' '-O' 'http://mirror.math.princeton.edu/pub/mageia/distrib
/8/x86_64/install/stage2/VERSION'
* error: curl failed: exited with 6

From man curl ...
6      Couldn't resolve host. The given remote host was not resolved.

But ...
* HTTP: trying to retrieve /pub/mageia/distrib/8/x86_64/install/stage2/mdkinst.sqfs from mirror.math.princeton.edu
* HTTP: connecting to server mirror.math.princeton.edu:80 (no proxy)
* is-at: 128.112.18.21

* HTTP: GET http://mirror.math.princeton.edu//pub/mageia/distrib/8/x86_64/install/stage2/mdkinst.sqfs)

* HTTP: server response '200'

$ wget http://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/install/stage2/VERSION -O -
--2021-03-06 16:03:07--  http://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/install/stage2/VERSION
Resolving mirror.math.princeton.edu (mirror.math.princeton.edu)... 128.112.18.21
Connecting to mirror.math.princeton.edu (mirror.math.princeton.edu)|128.112.18.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6
Saving to: ‘STDOUT’

-                                          0%[                                                                                   ]       0  --.-KB/s               18.45
-                                        100%[==================================================================================>]       6  --.-KB/s    in 0s      

2021-03-06 16:03:10 (902 KB/s) - written to stdout [6/6]

So it still looks like a curl problem even though
$ curl http://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/install/stage2/VERSION -o -
18.45
it's working here.
Comment 19 Dave Hodgins 2021-03-06 23:41:53 CET
Given that no one else has encountered this problem, I'm wondering if it's
a flaky internet connection that curl is very sensitive to.

Does /sbin/ifconfig -a show any RX or TX errors for the active interface?
Comment 20 Frank Griffin 2021-03-07 00:34:28 CET
Is there a way to get the installer to use wget ?

I don't think ifconfig works from busybox, and I have yet to get a bootable system from the install.  It's unlikely to be internet though, because every other host in the house uses the same single internet connection, including the identical machine from last year, and none of them have problems.

Doing an update on top of the aborted install doesn't install anything else, but neither does it allow setting a root password.

What seemed to work was starting a fresh install but deselecting the format for the root partition.  This caused the installation of several hundred packages without error, allowed setting of a root password and defining the default user, and went through the final configuration dialog, but the resulting system still hangs during boot complaining about being unable to bind the codec for hdaudio (and yes, all repos including tainted were included).
Comment 21 Frank Griffin 2021-03-07 05:30:20 CET
I'm thinking that this has to be an installer bug in network handling.  How else would a NIC that had DNS resolution in the earlier part of the install suddenly lose it (curl RC 6 with ethernet) or suddenly lose the ability to see parts of the file tree that are really there (wifi) ?

I'll try a few more iterations of install/noformat to see if this evens itself out.
Comment 22 Frank Griffin 2021-03-08 03:54:10 CET
Update:

After doing the aforementioned fresh install but not allowing the root partition to be formatted (which allows already installed packages to remain), as I said the install completed without errors (installing far fewer packages than a real full install would have (~3500)).  

That got me a system which would not boot, as described above.

Re-running the install from cauldron using the "update" rather than "install" option resulted in a bootable system which appears to be working.

Comments welcome; I'm nowhere with this right now.
Comment 23 Frank Griffin 2021-03-09 03:04:52 CET
I have extra unused root partitions on both of the identical machines that saw these errors, so I'm going to try starting from scratch to see if they still reproduce with cauldron.
Comment 24 Frank Griffin 2021-08-22 23:05:17 CEST
This is still happening in current cauldron.  Is there any way to force the install to use wget ?
Morgan Leijström 2021-12-07 14:47:36 CET

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=24362

Comment 25 Thierry Vignaud 2022-01-16 23:24:38 CET
Created attachment 13087 [details]
Enable to alter default downloaded

This untested patch allows you to use the downloader=wget option (like urpmi supports --downloader=wget)

CC: (none) => thierry.vignaud

Comment 26 Frank Griffin 2022-01-17 00:09:41 CET
Good timing, Thierry, I'm just about due for two fresh cauldron installs.
Comment 27 Thierry Vignaud 2022-01-17 00:20:58 CET
Created attachment 13088 [details]
Enable to alter default downloaded

You would need to actually include wget in the installer for that to work (+ a BR in drakx-installer-stage2.spec)

Attachment 13087 is obsolete: 0 => 1

Comment 28 Thierry Vignaud 2022-01-17 00:33:14 CET
Created attachment 13089 [details]
Enable to alter default downloaded

Attachment 13088 is obsolete: 0 => 1

Comment 29 Frank Griffin 2022-01-17 02:22:11 CET
OK, so what do I actually do to enable this ?  I recall posts in the distant past which apply patches during install, but my memory is dim on on they were applied and where they had to be to be referenced during the install.

And will this work before you include wget in the installer ?
Comment 30 Thierry Vignaud 2022-01-17 19:17:28 CET
Nope you need to rebuild drakx-installer-stage and then overwrite x86_64/install/stage2/mdkinst.sqfs on your local mirror.
Anyway I've just submitted drakx-installer-stage2-18.48-1.mga9 to BS
Comment 31 Frank Griffin 2022-01-17 19:32:35 CET
OK, I'll wait for that to hit and try an install.
Comment 32 Frank Griffin 2022-01-17 22:00:07 CET
Tried the install, but I get the same errors and tty3 shows that it is still using curl.
Comment 33 Frank Griffin 2022-01-17 22:08:17 CET
Oh, and this probably has nothing to do with your change, but during stage1 HTTP host selection failed with a hostname but worked OK with the IP address.  According to tty3, DHCP is returning the correct value for the DNS server and /etc/resolv.conf shows that address correctly, but hostname resolution still fails.

If you don't see an immediate reason for this relative to this bug, I'll open another.
Comment 34 Thierry Vignaud 2022-01-18 13:02:57 CET
It worked for me using "linux downloader=wget"
Comment 35 Morgan Leijström 2022-01-18 14:06:37 CET
Where are options like this documented?
Comment 36 Frank Griffin 2022-01-18 16:41:45 CET
(In reply to Thierry Vignaud from comment #34)
> It worked for me using "linux downloader=wget"

Aggh, it's always something.  The laptop I'm using (Asus VivoBook S15) will only boot the UEFI install image, not the original one that would allow you to type "linux downloader=wget".  When I boot the UEFI image, I get a black rectangle in the center which persists for several seconds and then gets replaced by the first ncurses screen saying "Checking for USB devices".  There is no opportunity I can find for supplying command-line parameters.

If memory serves, that black rectangle used to contain an image of the legacy boot screen where parameters could be entered.  Should it still ?

I'll try this later on a desktop system that (as far as I know) should be able to use the legacy boot.
Comment 37 Frank Griffin 2022-01-18 20:52:46 CET
Well, I'd like to say "good news, bad news", but it's all bad.

To start with, on the desktop system which uses ethernet, a curl install runs to completion with no errors.

When I tried to enter "linux downloader=wget" in the legacy ncurses panel on the desktop, the input field would not accept an "=".  I ended up using "linux downloader wget", and it used curl anyway.

So unless someone can tell me how to shoehorn "linux downloader=wget" into a UEFI image boot as I described above, I'm stuck testing this.  curl only seems to fail for wireless, but the wireless is accessing an in-house mirror copy on the same subnet, so there is no question of internet blips to one of the MGA mirrors.
Comment 38 Frank Griffin 2022-01-18 21:03:08 CET
Could we possibly change the install to use --retry with curl ?  That shouldn't be too intrusive.
Comment 39 Thierry Vignaud 2022-01-19 00:58:54 CET
Drakx just leverages urpmi, so one just has to alter urpm::download::sync_curl().
See http://gitweb.mageia.org/software/rpm/urpmi/tree/urpm/download.pm#n403
Comment 40 Thierry Vignaud 2022-01-19 01:14:04 CET
You can build your own boot.iso
Just rebuild drakx-installer-images with %debug set and just abuse BOOT_AUTOMATIC_METHOD=
See http://svnweb.mageia.org/packages/cauldron/drakx-installer-images/current/SPECS/drakx-installer-images.spec?view=markup&pathrev=1768827#l140

Sg like 'BOOT_AUTOMATIC_METHOD=" downloader=wget"' should just work.

You don't have to actually enable debug; you can just set the above variable right before:
THEME=%_vendor-%{theme} make -C images KERNELS="%{kernels}"

then "rpm2cpio drakx-install-images*rpm|cpio -id '*.iso' " and voila you've your own customized boot.iso
Comment 41 Thierry Vignaud 2022-01-19 01:14:41 CET
(last comment was in reply to comment #37 vs UEFI)
Comment 42 Thierry Vignaud 2022-01-19 01:17:00 CET
(In reply to Thierry Vignaud from comment #39)
> Drakx just leverages urpmi, so one just has to alter
> urpm::download::sync_curl().
> See http://gitweb.mageia.org/software/rpm/urpmi/tree/urpm/download.pm#n403

Actually if you can reproduce the bug with just urpmi, then you can try "urpmi --retry=3" as a workaround.
It'll just pass the option to curl
Comment 43 Thierry Vignaud 2022-01-19 01:20:53 CET
Created attachment 13091 [details]
allow to retry downloading packages up to 3 times

If your testing with urpmi works with your flaky connection, then this patch is all what is needed for drakx to assume the same behavior
Comment 44 Morgan Leijström 2023-03-02 15:59:08 CET
Did this get fixed so it works to enter "linux downloader=wget" with UEFI boot?

Keywords: (none) => FOR_ERRATA9

Felix Miata 2023-06-11 00:55:38 CEST

CC: (none) => mrmazda

Comment 45 Morgan Leijström 2023-06-20 11:30:01 CEST
For BIOS boot mode the solution to manually set "linux downloader=wget" in BIOS boot mode is confirmed in Comment 34 and https://bugs.mageia.org/show_bug.cgi?id=24362#c43

Entered in https://wiki.mageia.org/en/Mageia_9_Errata#NetInstaller

Do we have a method for UEFI mode boot?
( i.e can user press [E] and add that boot option? )

When we have a solution for UEFI, update errata and close as fixed.

Keywords: FOR_ERRATA9 => IN_ERRATA9


Note You need to log in before you can comment on or make changes to this bug.