| Summary: | After an upgrade from Mageia5->6, the first boot may fail, either with a black screen or a kernel panic message, depending on hardware | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Thomas Andrews <andrewsfarm> |
| Component: | Release (media or process) | Assignee: | ISO building group <isobuild> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | normal | ||
| Priority: | High | CC: | alejandro.anv, lovaren, mageia, mageia, marja11, sysadmin-bugs, tmb |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
journal.log of a failed boot
Xorg log ddebug log from the upgrade install, compressed so Bugzilla will accept it Journal of failed vbox login vbox xorg.log vbox ddebug.log vbox install.log vbox output of ls /boot install.log from real hardware |
||
|
Description
Thomas Andrews
2017-05-09 03:56:44 CEST
Thomas Andrews
2017-05-09 03:58:04 CEST
Priority:
Normal =>
release_blocker Forgot to mention, the Mageia 5 install on real hardware was using Grub 2 for booting, while the VirtualBox guest was using grub-legacy. Also forgot to mention, on real hardware I can not establish a wifi connection from the installer, so I would have been unable to get updates from the repositories until after the first boot. This is normal for this hardware, and expected. In VirtualBox, I could have gotten the updates, but chose not to. On real H/W, does Ctrl-Alt-F1 get you to a tty login prompt? If so, can you provide the usual logs: journalctl -b > journal.log /var/log/Xorg.0.log /root/drakx/ddebug.log If not, try booting to run level 3 (add 3 to the boot options in the grub2 menu), which should take you straight to a tty login prompt. Same as above, but journalctl -b -1 > journal.log to get the log from the failed graphical boot. CC:
(none) =>
mageia The keyboard command failed, but booting to run level 3 worked. I was able to get the requested logs. There were other copies of the last two, but evidence suggests they were left from the Mageia 5 installation. I can provide those too if you wish. Further information, in the hope that it will help... The Mageia 5 install was fully updated before beginning the upgrade to Mageia 6. Being more specific on the hardware, the motherboard is an Intel DQ45CB, and I am using the DVI-D port of the on-board graphics. The Processor is an Intel Core 2 Duo E8400. The board's BIOS/firmware is not the latest available version, but it is not the oldest, either. A clean, non-upgrade install of Cauldron from an earlier Classical RC test iso is working very well on this hardware. In fact, it is getting the latest round of updates at this moment. Created attachment 9287 [details]
journal.log of a failed boot
Created attachment 9288 [details]
Xorg log
Created attachment 9289 [details]
ddebug log from the upgrade install, compressed so Bugzilla will accept it
Marja Van Waes
2017-05-09 22:59:46 CEST
Assignee:
bugsquad =>
isobuild I tried reproducing this in VirtualBox (mga6 x86_64 host). I first did a clean install from the Mageia-5.1-x86_64-DVD ISO, choosing a default KDE install. I cloned that and did an upgrade from the 8th May Mageia-6-rc-x86_64-DVD ISO. This rebooted to a working Plasma desktop.
I then repeated the upgrade with a fresh clone. This time I got the same blank screen TJ reports. Right-Ctrl-F2 (Vbox equivalent to Ctrl-Alt-F2) got me to a login prompt, from which I could log in as root and look at the journal. The problem was that the sddm user had not been created, which lead to a segfault in sddm-helper:
sddm[1981]: Greeter starting...
sddm[1981]: Adding cookie to "/var/run/sddm/{50051ce7-ac3c-41ee-93b6-5c8c9b53ec0a}"
sddm[1981]: Failed to find the sddm user. Owner of the auth file will not be changed.
sddm-helper[2012]: [PAM] Starting...
sddm-helper[2012]: [PAM] Authenticating...
sddm-helper[2012]: [PAM] returning.
sddm-helper[2012]: pam_unix(sddm-greeter:session): session opened for user sddm by (uid=0)
sddm-helper[2012]: pam_systemd(sddm-greeter:session): Failed to get user data.
sddm-helper[2012]: pam_systemd(sddm-greeter:session): Failed to get user data.
kernel: sddm-helper[2019]: segfault at 14 ip 0000000000426c7c sp 00007ffda54b96d0 error 4 in sddm-helper[400000+38000]
sddm[1981]: Greeter session started successfully
sddm-helper[2012]: pam_unix(sddm-greeter:session): session closed for user sddm
sddm-helper[2012]: [PAM] Closing session
sddm-helper[2012]: [PAM] Ended.
sddm[1981]: Auth: sddm-helper exited with 11
sddm[1981]: Greeter stopped.
Looking in report.bug, I find during the package upgrade:
qtdeclarative5-5.6.2-10.mga6.x86_64
useradd: existing lock file /etc/shadow.lock without a PID
useradd: cannot lock /etc/shadow
user sddm does not exist - using root
group sddm does not exist - using root
sddm-0.14.0-13.mga6.x86_64
From my root login, I ran
adduser sddm
then killed the /usr/bin/sddm process. This then respawned, and got me a working graphical login, from which I could log in to a working Plasma desktop.
I strongly suspect that this is a different bug to the one TJ is seeing with real H/W, which looks much more like a graphics driver problem.
This morning I was able to use "Advanced" in Grub 2 to boot into the last Mageia 5 kernel on this system, and complete the upgrade by selecting the kernel.org mirror, and get the 463 (!) updates (including a new Mageia 6 kernel). One big change, for various reasons, mostly just evolving personal preference, I have moved the Mageia 6 install on this hardware to be my production install. As such, there has been some hardware shuffling. The wifi card has been removed, and a wired connection is now used. The optical drive has been removed, to be put into a different set of hardware, and will eventually be replaced. And a hot plug drive bay has been added. Also, the production install's bootloader is the one currently being used, not the one installed by the upgrade. When I rebooted into the upgrade test install, I had neglected to update the production install's Grub 2 menu, so it booted into the original Mageia 6 kernel. That boot showed the three question marks, followed by an out-of-proportion Mageia 6 splash screen, but the boot did go on to completion. Updating the production install's grub 2 menu resulted in a normal-looking boot into the new kernel in the upgrade test install. I'm hoping this information helps, and doesn't just confuse things even more. I think at least some of your problems stem from not having an initrd. Looking in ddebug.log, I find Generating grub configuration file ... Found theme: /boot/grub2/themes/maggy/theme.txt Found linux image: /boot/vmlinuz-4.9.26-desktop-1.mga6 Found linux image: /boot/vmlinuz-4.4.59-desktop-1.mga5 Found initrd image: /boot/initrd-4.4.59-desktop-1.mga5.img Found linux image: /boot/vmlinuz-4.4.30-desktop-2.mga5 Found initrd image: /boot/initrd-4.4.30-desktop-2.mga5.img Found linux image: /boot/vmlinuz-desktop so it clearly wasn't created during the upgrade. Not having an initrd means that the video driver isn't being loaded early in the boot process. And looking in journal.log, it appears not to be being loaded until after SDDM has started. This could well be the cause of the black screen. I'm at a bit of a loss here. CC'ing tmb, in the hope he might have some ideas. CC:
(none) =>
tmb (In reply to Martin Whitaker from comment #10) > > I'm at a bit of a loss here. CC'ing tmb, in the hope he might have some > ideas. If you are at a loss, I certainly am. Most of this is far over my head, but my instincts say that it doesn't sound like a hardware-specific issue. Yet apparently I'm the only one to have seen it, and that DOES sound like it's hardware-specific. Or at least something specific to MY system. I noticed I failed to remove the M5 4.4.30 kernel before doing the upgrade, but that shouldn't have affected anything, should it? In any case, it's something a user might do, so if it IS an issue it needs to be found. I can try to recreate the problem, but of course I would need to recreate my M5 install first. That will take a while, as I have other commitments that would prevent me from getting to it today. But, I suppose it would be nice to know if it happens every time, and if it still happens with the newly-minted M6 iso. Also, I do have a 32-bit install on another partition of the same hard drive, and I could try an upgrade on that. I didn't bother before, because it failed on the 64-bit install. That one also still has the 4.4.30 kernel (server because of the amount of RAM) on it, which I could remove or keep before the upgrade, whichever you believe would be best. Please advise on what I can do to help track this down. The fact that you are seeing what looks like the same problem in VirtualBox suggests it's something specific to your Mageia 5 installs. Were these old installs you have been using for a while or freshly created to test upgrades? Did you clone the Vbox system to preserve a copy of the pre-upgrade state? Could you attach the same logs from the Vbox system, so I can check it really is the same problem. Also /root/drakx/install.log might be helpful (from both systems), and the output from 'ls -l /boot'. (In reply to Thomas Andrews from comment #11) > I noticed I failed to remove the M5 4.4.30 kernel before doing the upgrade, > but that shouldn't have affected anything, should it? In any case, it's > something a user might do, so if it IS an issue it needs to be found. Forgot to answer this bit. It shouldn't affect anything, and in my (failed) attempts to reproduce this bug I haven't removed it, and it hasn't caused any problems. i can't reproduced neither, i just finished to update 10 real machines and all goes smoothly. CC:
(none) =>
mageia It's morning here, and a beautiful, sunny day, and I am behind in my outdoor work, so much will have to wait until the end of my work day. But, I can answer a couple of questions. The real hardware installs (64 and 32 bit) were created about a month ago, with the idea of using them for testing upgrades. Some of my more commonly-used apps were added in addition to the "standard" KDE install, to simulate an actual user's situation. However, they were essentially left alone after that. I did use the 32-bit install to check out a kernel update. The Vbox install is an old one, and has been used mostly to test Vbox upgrades. It was cloned before the upgrade. The one difference from what might be a "standard" KDE install is that it does share a folder with the test system. Again, I would expect this to be common among real-world Vbox Mageia users. This morning I was able to use the keyboard command Martin used in comment 8 to get to a keyboard prompt, but I haven't gone farther than that. The keyboard command didn't work on real hardware, leading me to suspect I am indeed seeing the same bug Martin saw in his test. There is another bug that seems to be unique to my M5 KDE systems - bug 20242. I see it with any system where I add one of my HP printers. M6 doesn't have it, which is why I switched to that as my production install several months ago. I have no idea if there's a relationship or not. I don't recall if I had installed my printers on the upgraded real hardware system while attempting to track down that bug. I may have. Also, I don't remember any more now whether the Mageia 5.1 iso that I have been using is the Official release or the last one I synced during testing. There shouldn't be a difference, but I suppose there might be. Created attachment 9315 [details]
Journal of failed vbox login
Created attachment 9316 [details]
vbox xorg.log
Created attachment 9317 [details]
vbox ddebug.log
Created attachment 9318 [details]
vbox install.log
Created attachment 9319 [details]
vbox output of ls /boot
Thanks TJ. At a quick glance, your vbox problem looks like the same problem I reported in comment 8, which seems to be quite intermittent. To confirm this, can you log in as root and run the following command: adduser sddm then reboot. I believe that will fix the problem. Could you also attach the install.log and 'ls /boot' output from the real H/W system. I KNEW I was forgetting something! I will attach it forthwith. Be aware that I was able to complete the upgrade by booting into the M5 kernel. I do not know if the install.log reflects that. Adding sddm did fix the vbox problem. Getting the first round of updates as I type. A little apprehensive about using the already-set Mirrorlist to do this, as I have had several bad experiences with it in the past, but I shall soldier on. I still had the three question marks, but that was most likely because I forgot the "nomodeset" kernel option. Created attachment 9321 [details]
install.log from real hardware
(In reply to Thomas Andrews from comment #22) > I KNEW I was forgetting something! I will attach it forthwith. Be aware that > I was able to complete the upgrade by booting into the M5 kernel. I do not > know if the install.log reflects that. No, the install.log just shows the original upgrade run from the CI ISO. Sadly it doesn't show any helpful error messages, so I'm none the wiser. Let's test if my theory about what's wrong is correct. Can you boot into the original Mageia-6 kernel and as root run dracut -f |& tee dracut.log If this reports success, run 'update-grub2' in your production install (I assume this is how you are updating your grub2 menu), and then reboot into the original Mageia-6 kernel. If I'm right, it should now boot properly. If it doesn't work, please attach the dracut.log file that should have been created by running the above command. > I still had the three question marks, but that was most likely because I > forgot the "nomodeset" kernel option. You shouldn't need 'nomodeset' with Vbox, and indeed, I'd expect it to make matters worse. This is another mystery. Severe thunderstorms took out my Internet service yesterday afternoon, and it only came back around 10:00 this morning, less than an hour ago. I will see what I can do to try these suggestions this evening. "nomodeset" did indeed make things worse. I removed it, and the vbox boot is successful. Still has the three question marks, though. Something for another bug. Can you boot the kernel in question in text mode? Have you considered adding $vt_handoff nomodeset noacpi to the kernel command line? I had a problem with booting the kernel to a working desktop and that's when I decided to add those parameters and it worked. CC:
(none) =>
hamnisdude (In reply to Martin Whitaker from comment #24) > > Let's test if my theory about what's wrong is correct. Can you boot into the > original Mageia-6 kernel and as root run > > dracut -f |& tee dracut.log > > If this reports success, run 'update-grub2' in your production install (I > assume this is how you are updating your grub2 menu), and then reboot into > the original Mageia-6 kernel. If I'm right, it should now boot properly. > > If it doesn't work, please attach the dracut.log file that should have been > created by running the above command. > The boot was normal after the requested commands. However, I don't know how much help that would be, as the boot into the original kernel to perform the dracut command was also normal. From where I'm sitting, it appears that completing the install by booting into the old M5 kernel, then getting updates and booting into the new M6 kernel "fixed" it somewhere during the process. I'm thinking that the only way to re-create the problem for which I filed the bug would be to re=create an M5 install and attempt the upgrade all over again. Even then it might not happen again. If there's an intermittent problem with the vbox upgrades, it's not much of a stretch to think there could be one with real hardware, too. The fact that no one else is seeing it, and I really only saw it once, makes me more and more suspicious that may be the case. Perhaps this bug should be downgraded from Release blocker to something else. (In reply to Kristoffer Grundström from comment #26) > Can you boot the kernel in question in text mode? > > Have you considered adding $vt_handoff nomodeset noacpi to the kernel > command line? > > I had a problem with booting the kernel to a working desktop and that's when > I decided to add those parameters and it worked. I hadn't considered it. My knowledge of the various kernel options is VERY limited, so I usually stay away from experimenting with them unless I get some expert advice. But thank you for the idea. If it hadn't started working more or less on its own, I would have tried it. A thought on this: At the time, this hardware had only a wifi Internet connection, and as such could not be connected from the DVD(usb stick), and so I could not get the updates from the online repositories before attempting that first boot. By the time I thought to try booting into the old M5 kernel, I had moved the hardware to having an Ethernet connection, and was able to get those updates, complete the upgrade, and boot into the M6 kernel. Could that explain the whole thing? Might I have been able to complete that first boot if I had had those updates installed? If the upgrade had needed some packages that weren't on the DVD, I'd have expected to find an error message in the install logs, but even after a second look, I can't see anything. The only anomaly I see is that the initrd file for the upgraded kernel wasn't created, and I can see no explanation for that. Another upgrade attempt, on different hardware, using the first Official test iso, has failed on the first boot. Hardware: ASRock motherboard, Athlon X2 7750, 8GB RAM, Geforce 9800GT video (nvidia340 driver), Atheros wifi card, no Ethernet connection. Everything prior to the first boot attempt went smoothly, almost TOO smoothly. Because I was on a wifi-only system, I did not attempt to make an Internet connection or get updates, as many wifi-only users would have to do. On the first boot I saw the new Grub 2 menu, and went with the default "Mageia" option. The boot failed a few seconds later with a message about a kernel panic. A second boot attempt failed in the same fashion. On the third attempt, I used the "Advanced" option of the Grub 2 menu to boot using the newest Mageia 5 kernel. That boot proceeded more or less normally. I saw a message that the nvidia driver was being built and installed, after which I could log in. The wifi connection came up automatically upon logging in, and after changing a few System Settings (disable screen locker and power saver), I set up the kernel.org repositories and got the 260+ updates that were waiting, including the 4.9.32 kernel. Once that was complete, I rebooted again, this time using the "Mageia" menu option. While I saw the three question marks instead of the plymouth animation, the boot was otherwise successful. It is my opinion that this is the same bug as described above, with symptoms differing because of the different hardware. One potential commonality in the two M5 systems, though I can't confirm it any more: If I recall correctly, VirtualBox was installed on the original hardware. If that is so, it seems to me that dkms would have attempted to build a new vbox kernel module during that first boot, and if it didn't have everything it needed to do so, the boot could fail. And if somehow the second system didn't have everything it needed to build the nvidia module for the new kernel, it too would fail. Just speculation on my part, as I know too little about the nuts and bolts of the process to do anything else. I will change the title description to something better reflecting the new range of symptoms. Because there is a relatively easy workaround to a successful upgrade install, I am going to reduce the status from release blocker to "High." I would have put it to "Normal," had I not seen the failure with the second system. However, it really ought to be fixed if we can before the Official release. Severity:
critical =>
normal If this is worth an entry in the Errata, please add it. Is this the same bug?
I upgraded from console (removed any i586-devel packages, removemedia, addmedia, urpmi --replacefiles --auto-update --auto) and after reboot a get no login page. X seems to be working (intel graphics card) but I get this errors:
[root@anv anv]# service dm status
Redirecting to /bin/systemctl status dm.service
● prefdm.service - Display Manager
Loaded: loaded (/usr/lib/systemd/system/prefdm.service; static; vendor preset: enabled)
Active: active (running) since lun 2017-07-31 08:59:34 CEST; 10min ago
Main PID: 2836 (sddm)
CGroup: /system.slice/prefdm.service
├─2836 /usr/bin/sddm -nodaemon
└─2849 /usr/libexec/Xorg -nolisten tcp -auth /var/run/sddm/{d99c0b9d-701e-4264-9d0c-e22cc529b6ad} -background none -noreset -displayfd 18
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: [PAM] Authenticating...
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: [PAM] returning.
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: pam_unix(sddm-greeter:session): session opened for user sddm by (uid=0)
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: pam_systemd(sddm-greeter:session): Failed to get user data.
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: pam_systemd(sddm-greeter:session): Failed to get user data.
jul 31 08:59:38 anv.localdomain sddm[2836]: Greeter session started successfully
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: [PAM] Closing session
jul 31 08:59:38 anv.localdomain sddm-helper[2868]: pam_unix(sddm-greeter:session): session closed for user sddm
jul 31 08:59:38 anv.localdomain sddm[2836]: Auth: sddm-helper exited with 11
jul 31 08:59:38 anv.localdomain sddm[2836]: Greeter stopped.
lines 1-18/18 (END)CC:
(none) =>
alejandro.anv (In reply to Alejandro Vargas from comment #33) > Is this the same bug? > > I upgraded from console (removed any i586-devel packages, removemedia, > addmedia, urpmi --replacefiles --auto-update --auto) and after reboot a get > no login page. X seems to be working (intel graphics card) but I get this > errors: It looks like a different bug to me. I suggest opening a new bug report. Attaching the output from 'journalctl -ab' may help. As so much has happened since this bug has been addressed, and since Mageia 5 is long into EOL, I question the present validity of this bug. Changing its designation to reflect that. Status:
NEW =>
RESOLVED |