| Summary: | Identical fresh install from the same repository works in a VBox VM but fails on real hardware | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Frank Griffin <ftg> |
| Component: | Installer | Assignee: | Mageia tools maintainers <mageiatools> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, mageia |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
bzip of /root/drakx directory
report.bug.xz from successful VM install full report.bug.xz from failure Real hardware report.bug.xz VM report.bug.xz curl command that failed New real report.bug.xz |
||
|
Description
Frank Griffin
2021-01-10 01:30:33 CET
Created attachment 12205 [details]
bzip of /root/drakx directory
I should point out that the install selects Custom desktop, and checks every package category under Workstation, Server, and Graphical. Most users probably don't select this much, but it's worked for me for years. The conflicts start at line 850 of install.log. It appears that some of the packages to be installed require the deletion of other packages which are required by still other packages. Thanks for the report, Frank. Would it be worth attaching also the same files from the successful in-VB installation - to look for differences. CC:
(none) =>
lewyssmith Created attachment 12207 [details]
report.bug.xz from successful VM install
Thanks for the last, but there is some confusion re the attachments. Sorted! bug.jcf real h/w fail bug/ install.log which starts at the beginning & shows all the failures from l.824 The first failure is for lib64gtk+3_0 : l816 retrieve l821 installing l825 failed for want of libxkbcommon.so.0 (lib64xkbcommon0-1.0.3-1.mga8.x86_64.rpm) as do loads of other things. However, lib64xkbcommon0-1.0.3-1.mga8.x86_64.rpm is retrieved l755; but *does not seem to get installed* in the next batch l760. I suspect the same sort of thing for all the other failures. report.bug.xz VB success, for comparison. report.bug includes a host of information, pkg installation starts at l.52250 There is obviously something very basic going wrong. Assigning to MGAtools group, CC'ing Martin who will probably spot it. CC:
(none) =>
mageia As Lewis points out, lib64xkbcommon0 was retrieved, but not scheduled to be installed. Unfortunately your ddebug.log stops at the end of package selection - I guess the install didn't complete, so it wasn't updated after that - so we can't see if there are any errors there. If this error is repeatable, can you capture a report.bug whilst the installer is still running, by using Ctrl-Alt-F2 to switch to the debug console, inserting a formatted USB stick, and entering the command 'bug' (which should write report.bug to the USB stick). Created attachment 12213 [details]
full report.bug.xz from failure
* Installation failed, some files are missing:
http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/texlive-20200406-9.mga8.x86_64.rpm
http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64vamp-plugin-sdk2-2.10-1.mga8.x86_64.rpm
http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/python3-dulwich-0.20.14-1.mga8.x86_64.rpm
You may need to update your urpmi database.
At the time of the real hardware install, looks like the local cauldron mirror
was last synced from a mirror that was in the middle of syncing.CC:
(none) =>
davidwhodgins Unfortunately not. While it is a network install, the repository is a machine I control, and I was careful not to update it between the VM install and the real one. Moreover, each of those files exists: [ftg@ftgme2 ~]$ cd /mnt/cauldron/x86_64/media/core/release [ftg@ftgme2 /mnt/cauldron/x86_64/media/core/release]$ ls texlive* texlive-20200406-9.mga8.x86_64.rpm texlive-collection-basic-20200406-5.mga8.noarch.rpm texlive-context-20200406-5.mga8.noarch.rpm texlive-dist-20200406-5.mga8.noarch.rpm texlive-doc-20200406-5.mga8.noarch.rpm texlive-fonts-asian-20200406-5.mga8.noarch.rpm texlive-fontsextra-20200406-5.mga8.noarch.rpm texlive-fonts-sources-20200406-5.mga8.noarch.rpm texlive-pythontex-0.17-1.mga8.noarch.rpm texlive-texmf-20200406-5.mga8.noarch.rpm [ftg@ftgme2 /mnt/cauldron/x86_64/media/core/release]$ ls lib64vamp* lib64vamp-plugin-sdk2-2.10-1.mga8.x86_64.rpm lib64vamp-plugin-sdk-devel-2.10-1.mga8.x86_64.rpm lib64vamp-plugin-sdk-static-devel-2.10-1.mga8.x86_64.rpm [ftg@ftgme2 /mnt/cauldron/x86_64/media/core/release]$ ls python3-dulwich* python3-dulwich-0.20.14-1.mga8.x86_64.rpm I suspect the mirror had those files, but had not yet synced the repo data that indexes what packages are available. Please try the test again with a freshly synced local mirror after ensuring the mirror being synced from is up-to-date as per https://mirrors.mageia.org/status Also, can you check to see if texlive was installed in the vb test? (In reply to Frank Griffin from comment #9) > Unfortunately not. While it is a network install, the repository is a > machine I control, and I was careful not to update it between the VM install > and the real one. However you have updated it between when you reported the failure and when you captured the latest report.bug.xz, and the package it was originally failing on is now installed successfully. I suspect Dave is right. It's a problem I frequently hit with the mirror I sync my local repository to, which is a tier 3 mirror. That gives several chances of being unlucky :-( OK, I'll resync and rerun both the VM and real install, but I should point out that the mirror does not sync automatically, but only when I do it manually. So I'm virtually certain that the repository contents were identical between the two tests. I sync another system to math.princeton.edu, and then do a manual rsync to the repository used for the tests. Created attachment 12214 [details]
Real hardware report.bug.xz
Attachment 12205 is obsolete:
0 =>
1 Created attachment 12215 [details]
VM report.bug.xz
Different missing package this time:
* Installation failed, some files are missing:
http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64mcpp0-2.7.2-14.mga8.x86_64.rpm
You may need to update your urpmi database.
I am absolutely sure that the repository was unaltered between the VM install and the real one. And again, the file is there:
[root@ftgfiles1 ~]# ls -l /mnt/cauldron/x86_64/media/core/release/lib64mcpp0-2.7.2-14.mga8.x86_64.rpm
-rw-r--r-- 1 ftg ftg 77454 Feb 13 2020 /mnt/cauldron/x86_64/media/core/release/lib64mcpp0-2.7.2-14.mga8.x86_64.rpm
I think if the repo was out of sync, it would affect the VM install as well.
How much space is available for /var in both places? The installer executed the command below. curl successfully fetched all files but the first. It looks like an intermittent network problem to me. Ctrl-Alt-F4 lets you see the kernel log (or Ctlr-Alt-F2 to the debug console and use 'dmesg') - does anything show up there? Any clues in the http server logs? '/usr/bin/curl' '-q' '-R' '-f' '--disable-epsv' '--connect-timeout' '60' '--anyauth' '--stderr' '-' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64mcpp0-2.7.2-14.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xml2-devel-2.9.10-6.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/default-windowmaker-desktop-0.95.9-3.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xsetroot-1.1.2-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/kernel-userspace-headers-5.10.6-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64ncurses-devel-6.2-20201205.1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xmessage-1.0.5-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/gcc-10.2.1-0.20210109.1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/iceauth-1.0.8-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xdpyinfo-1.3.2-4.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/x11-data-bitmaps-1.1.2-3.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/rootfiles-11.0-16.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/isl-0.18-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/windowmaker-0.95.9-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xli-20061110-15.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/libx11-common-1.7.0-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xmodmap-1.0.10-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/sessreg-1.1.2-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64ffi-devel-3.3-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xfont2_2-2.0.4-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/x11-server-xorg-1.20.10-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64dmx1-1.1.4-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64unistring-devel-0.9.10-4.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xinit-1.4.1-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xxf86dga1-1.1.5-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xcursor1-1.2.0-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64lzma-devel-5.2.5-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/glibc-devel-2.32-9.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/glib2.0-common-2.66.4-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64isl15-0.18-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64acl-devel-2.2.53-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64uuid-devel-2.36.1-5.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/binutils-2.35.1-6.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xxf86misc1-1.0.4-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xaw7-1.0.13-3.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64zlib-devel-1.2.11-9.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/task-x11-1-10.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/gcc-cpp-10.2.1-0.20210109.1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/sound-theme-freedesktop-0.8-8.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64mpc3-1.2.1-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/glib-gettextize-2.66.4-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xcrypt-devel-4.4.17-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xrdb-1.2.0-2.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/telepathy-filesystem-0.0.2-8.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/windowmaker-theme-mageia-0.2-11.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/mcpp-2.7.2-14.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/multiarch-utils-1.0.14-3.mga8.noarch.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/libstdc++-devel-10.2.1-0.20210109.1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xauth-1.1-1.mga8.x86_64.rpm' '\ -O' 'http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/xinitrc-2.4.21-30.mga8.noarch.rpm' Created attachment 12216 [details]
curl command that failed
That didn't work so well. Let's try it as an attachment.
(In reply to Dave Hodgins from comment #17) > How much space is available for /var in both places? The root partition is 40GB for the VM and 60GB for the real. (In reply to Martin Whitaker from comment #19) > Created attachment 12216 [details] > curl command that failed > > That didn't work so well. Let's try it as an attachment. In early tries of fresh installs which produced the cauldron system I'm using at present the install failed well into the processing because of a curl error, which eventually went away, but which was odd because the error code was -6 which says that it couldn't resolve the hostname of the repository (ftgfiles1), which it had been doing successfully all along. This stopped when I used the IP address instead. I'll try another test with that. The test using the IP address for the repository fails in the same way, but on a different package in the original "missing" message, so it isn't a case of my DNS not answering the phone (or not fast enough). Anyway, no other application on any of my other internally networked machines (or for that matter on this machine running a cauldron from a few weeks ago) has any problem with my DNS. I would try to target the video difference between the VM and the real hardware, except that the "missing" packages, by and large, have nothing to do with graphics. Is there a way to get the install to use wget rather than curl ? Not that I'm aware of. The question reminded me that curl tries to use a number of connections in parallel. I suspect the problem may be that while it's ok under virtualbox, on real hardware it's too fast, so some connections are being refused by the server. In my opinion, the netinstall iso should be changed to use wget, not curl. In the meantime two options that might workaround the problem: - Slow down the machine by temporarily lowering the speed in bios settings - Do just a minimal install, no X or desktop, just minimal with urpmi, and assuming that can be done without the problem showing up, post install change the urpmi config to use wget. Third option. :-) Since it's a local server you control, increase the number of connections allowed from a single ip address, though I have no idea where that is set in apache or other http servers. The thing is, this arrangement has been working just fine for years. This problem is something new. Noticing much reference to curl, I wonder whether using to aria2 or wget would change anything. BTAIM, Frank is in the best possible hands, I can contribute nothing more. CC:
lewyssmith =>
(none) I notice that the bug file I posted just contains the curl command followed by an install message saying a file (always the first file in the curl retrieval list) can't be found, but it doesn't include the actual curl error message and RC. I'm going to rerun the test, but flip to tty3 when the first error occurs to see if the missing info is there. After that, since my server exports the repository via NFS, I'm going to try an NFS install. Years ago, I had changed from NFS to HTTP because of bug#2577. That will take curl out of the equation. No luck on either front. There is no more information on tty3 than in the report.bug. I find it interesting that it's always the first -O file on the command line that appears in the error message, and I'm starting to suspect that whatever is going wrong results in an install error message that uses the first file in the retrieval set no matter what file is really involved, and I also suspect that the "file not found" bit may be bogus and hiding the real error. It would be great if tty3 contained more actual detail from the curl command and not just what the installer code *thinks* is happening. bug#2577 not only hasn't been fixed, but has gotten worse. Originally, you got to stage 2 and just got errors about the 32-bit repositories not being found. Now, the mount of the NFS exported directory is failing. Again, we don't get the text of the actual mount command or the actual output, just the assumption from the installer code as to what went on. So an NFS install is impossible at this point. Created attachment 12339 [details]
New real report.bug.xz
Still occurring. One more thing of interest. The same real hardware installed cauldron just fine when I first got the machine. This was back on Dec 26 (according to the timestamp on /boot/dracut). So it's not the hardware that causes the problem, it's in the install itself, and it appears to have broken between Dec 26 and Jan 10. The first error encountered is
* Installation failed, some files are missing:
http://ftgfiles1/mnt/cauldron/x86_64/media/tainted/release/lib64opencore-amr0-0.1.5-3.mga8.tainted.x86_64.rpm
You may need to update your urpmi database
https://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/media/tainted/release/lib64opencore-amr0-0.1.5-3.mga8.tainted.x86_64.rpm exists dated a year
ago.
There's also
* Installation failed, some files are missing:
http://ftgfiles1/mnt/cauldron/x86_64/media/core/release/lib64xerces-c3.2-3.2.3-5.mga8.x86_64.rpm
https://mirror.math.princeton.edu/pub/mageia/distrib/8/x86_64/media/core/release/lib64xerces-c3.2-3.2.3-5.mga8.x86_64.rpm is dated 2 months ago.
So first step is figuring out why the packages are not in the local repo.
When each of the above errors is detected, it doesn't just stop the package
that requires those packages to fail to install, it stops all of the other
packages that happened to be in the same transaction. As different installs
will group transactions differently (same repos/hardware), the results of not
having the packages in the same transaction can vary too.
Any subsequent package install that requires one of the packages in one of the
failed transactions will all fail. A cascade effect.
I rsync from mirror.math.princeton.edu and my repo has both of those files, and would have had for a while. I don't rsync automatically, so if I don't do it manually I'm absolutely sure that nothing sneaks in that wasn't there before. The other thing is that every time I attempt this install the errors involve a different number and set of packages. I do the manual rsync immediately before running the install, so there's no mystery there either. I've tried to reproduce this, doing an all DE, all package group install from my private mirror, first with ftp, then with http. Both installs completed without error. There's something causing http transactions to fail on your system Frank, but I don't have the expertise to help you debug that. Does "smartctl -a /dev/sda" or "dmesg|grep sda" (replacing sda with the drive with the repos) show any errors? Have you tried installing directly from the princeton mirror rather than from your local mirror? My thanks to everyone who tried to help, and my apologies for the noise as well. I finally figured this out. There are actually two copies of the repository: one that syncs from Princeton and a working copy that syncs from the first. Both sync operations are manual, and I perform one after the other, so they should be identical. Somehow, I introduced an NFS loop between the two hosts for the /mnt/cauldron directory which houses the repository on each system. So the working host which thought it was syncing from the primary was actually syncing from itself *through* the primary. This wasn't obvious, because the sync lag showed rsync actually transferring files. I eliminated the loop and my next test only had one package failure rather than 70 or 54 or 30, and I'm guessing this is just a metadata mirror lag from Princeton. I still don't understand why the VM install worked, but it must have been hardware differences. Resolution:
(none) =>
INVALID |