Description of problem: This applies, so far, to a Mate session on Intel CPU and nvidia graphics hardware with a high bandwidth wired connection to the internet. Plasma, Xfce and Cinnamon are also installed. An upgrade from Mageia 8 to 9 via the mgaapplet looks fine at first but stops without any apparent reason after dealing with about two thirds of the available packages. In this particular case the process stopped at the installation of dkms-nvidia-current. There have been other similar reports of this happening where dkms has been involved. Some of the logs from the most recent session are available but further investigation is needed, perhaps by way of htop monitoring in a background terminal session. gkrellm shows that the computer is continuing to work - half the cores are running at 10% or more. Version-Release number of selected component (if applicable): Mageia 9 x86_64 How reproducible: Twice, once with nouveau driver, followed by another with the proprietary nvidia driver. Steps to Reproduce: 1. Start mgaapplet afresh in a normal Mageia 8 desktop session $ killall mgaapplet $ mgaapplet --session 2. Using the applet choose to upgrade to the latest Mageia 9 distribution 3. Watch the packages being downloaded and installed 4. until nothing more seems to be happening. 5. A monitor such as gkrellm will show what is happening with the CPUs and disk drives. Some idea of how far the installation has proceeded can be gained by $ rpm -qa|grep mga8|wc -l $ rpm -qa|grep mga9|wc -l Those numbers should not change after the stall point.
Darn. Half asleep - should have been filed against mgaapplet not Mageia9.
The description should be amended to reference mgaonline-3.31-3.mga9.src.rpm. Sorry about that.
How long did you wait? Because this sounds like you reached the point where kernel modules and initrd get build. And depending on your hardware this can take some time... Whilst kernel module and initrd build, there is no graphical process. You can see the progress only when you use the CLI way for upgrading...
Per comment 0 this also happened with nouveau, no dkms for that should build? -- Another tester reported stall when it seemed to do dkms for wifi https://ml.mageia.org/l/arc/qa-discuss/2023-05/msg00347.html -- For next testing, tip from Dave H in same thread: When testing upgrades on real hardware, before starting the upgrade I make sure htop and strace are installed. Open a second terminal (alt+ctrl+f2 etc.), and start htop running there. If things stall, I can just switch terminals to see what's going on, including running strace from within htop if needed. And i can add that maybe also have another terminal up with journalctl -f i.e dkms building progress should be visible there
CC: (none) => fri
In reply to comment 3. In both cases the wait was several hours - probably 6 or more. Shall retry with Morgan's journalctl tip.
Speaking of mgaapplet upgrades, do you use Wayland? I never get a relpy when asking if still relevant for mga9... Bug 29182 - mgaapplet-upgrade-helper crashed under Wayland session
I know nothing about Wayland so only login to Wayland sessions for testing purposes. Using Mate exclusively for everything else but it looks like we need to cover Wayland in testing upgrades.
Restarted mgaonline upgrade on a fully updated Mageia8 system with nvidia proprietary driver. Chose download all at once, storing the RPMs in the default urpmi location. That took about 90 minutes. Installation started. Some four minutes to install 1734 packages and then the dkms build of nvidia-current started. make seems to be the busiest process, at about 9% CPU usage. Leaving this to run for as long as it takes.
More than 3 hours later, make has used 22 minutes of CPU time, preload 3 minutes, top, scheduler and pipewire and a few other processes a minute each. dkms has used about 3 seconds. Does this all seem reasonable? Still going.
Checked to see how many kernels were installed - just one. Only kernel-userspace-headers has been updated to mga9 so far.
"time urpmi dkms-nvidia-current" in an m9 x86_64 vb guest shows ... real 4m32.621s user 5m33.009s sys 0m48.870s also with just one kernel. Check in /var/lib/dkms/nvidia-current/*/build/ for a make.log file. It's created during the make, but appears to be deleted once it finishes ok. If the make.log is there, any obvious errors? What are are last dozen or so lines?
CC: (none) => davidwhodgins
The nvidia 470 branch has a build directory but no make.log. The 525 branch has an empty build directory. It looks like these directories have never been visited since the original installation. The other thing that puzzles me is what is the dkms build aimed at? There are no mga9 kernels installed at this stage AFAICS.
Tried brute force, in caveman fashion; used urpmi directly on the cached kernel-desktop-latest rpm but that immediately raised a conflict with the resident dkms-nvidia-current and I don't know how to resolve that safely. Time to crash out of this.
What does the following show for the order? grep -e dkms -e nvidia -e kernel /root/.MgaOnline/*.log
$ su - # cd .MgaOnline # grep -e dkms -e nvidia -e kernel *.log > mgaonline_log_search # ll total 1652 -rw-r--r-- 1 root root 1665612 Jun 1 23:39 gurpmi_upgrade_to_9_M708E2ns.log -rw-r--r-- 1 root root 10022 Jun 2 06:27 mgaonline_log_search -rw-r--r-- 1 root root 5905 Jun 1 15:21 urpmi.cfg.backup.68698 Thanks Dave. Results attached.
Created attachment 13861 [details] Results of log search for mgaapplet upgrade
Looks like it would be better to compress a copy of gurpmi_upgrade_to_9_M708E2ns.log and attach that. The results of the search don't show what is going wrong, but is does show that there are desktop, server, and linus kernels to build for. The first dkms build is for the running m8 kernel. I don't think the dkms build for the other kernels will happen until they are booted.
Also, what does df -h show?
$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 2.1M 16G 1% /run /dev/sda3 53G 16G 36G 30% / tmpfs 16G 48K 16G 1% /tmp /dev/sda1 2.4G 324K 2.4G 1% /boot/EFI /dev/nvme0n1p2 11G 4.5G 6.0G 43% /localrepo /dev/nvme0n1p1 905G 501G 405G 56% /home /dev/sdb2 3.6T 2.5T 984G 72% /data tmpfs 3.2G 292K 3.2G 1% /run/user/1000 /dev/sdc1 916G 608G 263G 70% /run/media/lcl/gemma gomeisa:/home/lcl/topaz 705G 110G 559G 17% /home/lcl/pad gomeisa:/home/lcl/ruby 705G 110G 559G 17% /home/lcl/quinckler tmpfs 2.0G 98M 2.0G 5% /home/lcl/.cache Shall attach the gurpmi log.
Created attachment 13862 [details] gurpmi_upgrade_to_9_M708E2ns.log As it says on the can.
This happened to me as well. # less gurpmi_upgrade_to_9_pTWTmNir.log here is from the last output when it hung up Preparing kernel 5.15.110-desktop-2.mga8 for module build: (This is not compiling a kernel, just preparing kernel symbols) Storing current .config to be restored when complete Running Generic preparation routine make mrproper.... using /proc/config.gz make oldconfig.... make prepare....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
CC: (none) => brtians1
To put some of Len's comments about 'long times' into perspective, he has very powerful hardware. Things like dkms builds should happen quickly. There is clearly a problem with upgrades involving dkms (terra incognito for me, so I can say nothing constructive about that). The comment 20 attached compressed log file should tell the cogniscenti something.
Source RPM: (none) => mgaonline-3.31-3.mga9.src.rpmCC: (none) => lewyssmith
The thing I noticed, my amateur perspective, is it is linking dkms to the wrong kernel. The old MGA8 versus new MGA9 kernel.
(In reply to Brian Rockwell from comment #21) > This happened to me as well. > > # less gurpmi_upgrade_to_9_pTWTmNir.log > > here is from the last output when it hung up > > Preparing kernel 5.15.110-desktop-2.mga8 for module build: > (This is not compiling a kernel, just preparing kernel symbols) > Storing current .config to be restored when complete > Running Generic preparation routine > make mrproper.... > using /proc/config.gz > make oldconfig.... > make > prepare...................................................................... > ............................................................................. > ............................................................................. > ............................................................................. > ............................................................................. > ............................................................................. > ............................................................................. > ............................................................................. > ............................................................................. > ...................... This remember my bug https://bugs.mageia.org/show_bug.cgi?id=31621 , perhaps in upgrade by mgaonline would be good if could skip the dkms building until system reboot? BTW upgrading with urpmi never bite me this issue
Tried to recreate the issue by downloading dkms-nvidia-current-525.116.04-1.mga9.nonfree.x86_64.rpm and installing it on an m8 system. It installed cleanly for the running kernel-desktop-5.15.110-2.mga8
(In reply to Dave Hodgins from comment #25) > Tried to recreate the issue by downloading > dkms-nvidia-current-525.116.04-1.mga9.nonfree.x86_64.rpm > and installing it on an m8 system. It installed cleanly for the running > kernel-desktop-5.15.110-2.mga8 Because is a mga8 system, i think that in the moment that dkms rebuild during the upgrade we have a system that is mixed mga9 with mga8, if you try run the mga8 kernel in a mga9 system when the dkms build is triggered you will have the issue, at less is what happens to me on my report but i did upgrade with classic installer and the cause that after the reboot the system runs in mga8 kernel was fixed, the main thing is the system is still running mga8 kernel and try to build the module with mga9 "tools"
Assigning this to the 'tools' people for the upgrade process.
Assignee: bugsquad => mageiatoolsCC: lewyssmith => (none)
Setting for errata for now.
Keywords: (none) => FOR_ERRATA9
This really is severe https://ml.mageia.org/l/arc/qa-discuss/2023-07/msg00100.html
Priority: Normal => release_blocker
I have reproduced the root cause as follows: 1. In VirtualBox, install a minimal system from the Mageia-9-rc1-x86_64 ISO. I selected the Xfce DE plus the Configuration and Console Tools categories, but this should be reproducible on any Mageia 9 system. 2. Install a dkms module package. I chose dkms-broadcom-wl, but this should be reproducible with any other dkms module. This should build and install the dkms module for the 6.3.9 kernel without any problem. 3. Add a full set of Mageia 8 urpmi media by urpmi.addmedia --distrib <mirror-url>/distrib/8/x86_64 4. Install the current Mageia 8 kernel plus its development package by urpmi kernel-desktop-5.15.117-2.mga8-1-1.mga8 urpmi kernel-desktop-devel-5.15.117-2.mga8-1-1.mga8 5. Attempt to build the dkms module for that kernel by e.g. dkms build -m broadcom-wl -v 6.30.223.271-66.mga9.nonfree -k 5.15.117-desktop-2.mga8 This will hang at the 'make prepare' step. Adding tmb to CC.
CC: (none) => mageia, tmbSummary: In some cases the upgrade process started by mgaapplet stalls, for no obvious reason. => In some cases the upgrade process started by mgaapplet stalls, due to dkms hanging when building modules for a Mageia 8 kernel
Downgrading 'make' to make-4.3-2.mga8 allows the dkms build to complete. Adding make to /etc/urpmi/skip.list before performing the online upgrade allows the upgrade to complete without error.
Source RPM: mgaonline-3.31-3.mga9.src.rpm => make-4.4.1-1.mga9
For now I made a helpful temporary entry *to be reverted for release* (Assuming it get fixed), pointing to comment 31 https://wiki.mageia.org/mw-en/index.php?title=Mageia_9_Release_Notes&action=historysubmit&type=revision&diff=58987&oldid=58981
Keywords: (none) => IN_RELEASENOTES9
I remember Thomas had fixed the kernel Makefile for make 4.4 last year, but that will not help for older kernels and we need a workaround :(
CC: (none) => pterjan
Heh, good memory :) So technically I could make next mga8 kernel update "make 4.4" compliant as we require fully updated mga8 before running distro upgrades and that would take care of this...
thumbs up :)
I am not sure if this is enough as new versions of dkms packages will be rebuilt for older still installed kernel I believe
IIRC dkms packages are only getting built during boot time if you boot the elder kernel. Or if running elder kernel when installing something for dkms. We can add in update instructions that user need to be running latest kernel in updates or backports when starting an online upgrade.
(In reply to Pascal Terjan from comment #36) > I am not sure if this is enough as new versions of dkms packages will be > rebuilt for older still installed kernel I believe hm, it should only build for running kernel, not for older ones...
A little out of my element, but as I recall the new versions for installed-but-not-running kernels are not built until those kernels are booted. Even so, having a fully-updated kernel in place before an upgrade attempt doesn't mean that the user is actually USING that updated kernel. We can warn users that they MUST be using the latest kernel, but how many actually read those warnings? In my observations, the more experience people have, the less likely they are to read documentation.
Can mgaapplet be improved to check both that the system is fully updated *and* is running latest installed kernel, before proposing upgrade? Users advanced enough to choose urpmi probably know where to search for information and help if they fail to read release notes.
Would it still hang compiling dkms for older, still installed kernels?
fix for this is now in kernel and kernel-linus 5.15.120-2.mga8 currently building
The ideas of comment#38 and comment#40 don't works because if you have a fully updated Mageia 8 System (what is recommended in the upgrade process) usually you have the Latest kernel for Mageia 8, so the upgrade will be triggered and if the user have any dkms module the issue will be produced. Possible solutions 1: Fix the kernel make files to be compatible with make 4.4 as suggest comment#34 or 2: In a upgrade by mgaonline Block the dkms build until the reboot
Depends on: (none) => 32093
Depends on: (none) => 32094
What about the mga8 backport kernels?
(In reply to Morgan Leijström from comment #44) > What about the mga8 backport kernels? already fixed since 6.0.8-3
Mageia8 -> 9, x86_64, Mate Intel Core i9 and GTX1080 with nvidia driver. With the .120 kernel in place the mgaapplet upgrade worked without a hitch. Download all at once, instalation finished within two hours and rebooted smoothly to the desktop with nvidia and virtualbox drivers rebuilt. All Desktop functions seem to be working.
Amendment to comment 46 - starting with kernel desktop 5.15.120-desktop-2.mga8. Rebooted to 6.3.9-1.mga9.
Updated rel notes under https://wiki.mageia.org/en/Mageia_9_Release_Notes#Online-Upgrade Added note in Errata under https://wiki.mageia.org/en/Mageia_9_Errata#If_upgrade_failed
Keywords: FOR_ERRATA9 => IN_ERRATA9
An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGASA-2023-0237.html https://advisories.mageia.org/MGASA-2023-0238.html
Resolution: (none) => FIXEDStatus: NEW => RESOLVED