Bug 11292 - Kernel 3.11.1 (either -linus or -desktop) fails to boot on a Core i3 x86-64 machine.
Summary: Kernel 3.11.1 (either -linus or -desktop) fails to boot on a Core i3 x86-64 m...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL: https://ml.mageia.org/l/arc/dev/2013-...
Whiteboard:
Keywords:
: 11366 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-09-26 12:54 CEST by Shlomi Fish
Modified: 2013-10-28 16:26 CET (History)
4 users (show)

See Also:
Source RPM: dracut-032-1.mga4.src.rpm
CVE:
Status comment:


Attachments
Screenshot of the kernel OOPS at boot time. (63.99 KB, image/jpeg)
2013-09-28 17:29 CEST, Shlomi Fish
Details
from rd.break=pre-pivot (53.77 KB, text/plain)
2013-10-15 18:59 CEST, Chris Denice
Details

Description Shlomi Fish 2013-09-26 12:54:39 CEST
Description of problem:

Kernel 3.11.1 fails to boot on a Core i3 x86-64 machine - whether it is kernel-linus-latest or kernel-desktop-latest. While booting it displays an error that it cannot find /etc/fstab and some other errors and then I get a malfunctioning prompt of kernel 3.10.10.

I'll supply a screenshot (taken by a camera) soon.

Reproducible: 

Steps to Reproduce:
Comment 1 Bit Twister 2013-09-26 13:17:34 CEST
(In reply to Shlomi Fish from comment #0)
>  and then I get a
> malfunctioning prompt 

If it has Cannot add dev-disk-by\x2dpartlabel-*.device this would be a duplicate of bug 11290

CC: (none) => junknospam

Comment 2 Shlomi Fish 2013-09-27 14:13:27 CEST
OK, I now (In reply to Bit Twister from comment #1)
> (In reply to Shlomi Fish from comment #0)
> >  and then I get a
> > malfunctioning prompt 
> 
> If it has Cannot add dev-disk-by\x2dpartlabel-*.device this would be a
> duplicate of bug 11290

I don't think it has that.
Comment 3 Shlomi Fish 2013-09-27 14:17:05 CEST
Anyway, I now tried to build a new kernel from the kernel 3.11.2 sources from kernel.org and after it was installed, booting with it resulted in a mal-functioning system where /tmp could not be accessed, and kernel 3.10.10 was reported as the kernel above the login prompt on the virtual terminals, while `uname -r` showed 3.11.2 (can't really explain that.).

Regards,

-- Shlomi Fish
Manuel Hiebel 2013-09-27 21:19:37 CEST

CC: (none) => tmb
Assignee: bugsquad => mageia

Comment 4 Shlomi Fish 2013-09-28 06:52:54 CEST
(In reply to Shlomi Fish from comment #3)
> Anyway, I now tried to build a new kernel from the kernel 3.11.2 sources
> from kernel.org and after it was installed, booting with it resulted in a
> mal-functioning system where /tmp could not be accessed, and kernel 3.10.10
> was reported as the kernel above the login prompt on the virtual terminals,
> while `uname -r` showed 3.11.2 (can't really explain that.).
> 
> Regards,
> 
> -- Shlomi Fish

With the 3.11.2 kernels (both -linus and -desktop), I am getting a different symptom: they OOPS.

Regards,

-- Shlomi Fish
Comment 5 Bit Twister 2013-09-28 09:55:58 CEST
(In reply to Shlomi Fish from comment #4)
> 
> With the 3.11.2 kernels (both -linus and -desktop), I am getting a different
> symptom: they OOPS.

Guessing it is going to be system specific.
I rsync'ed in a complete up to date 3.11.2-desktop-1.mga4 system, changed UUID for swap and / to match the rsync src system and was finally able to get an operational boot.

Anyone formatting / and doing a full restore from backups is going to be talking to $DEITY. no activity while it is looking for old UUID for /, finally times out 1+minutes later and you are sitting at a dead dracut prompt.  :(
Comment 6 Colin Guthrie 2013-09-28 14:21:41 CEST
See the messages on the systemd list. Seems the newer dracut defaults to systemd-in-initrd which is currently causing some problems.

Latest dracut build should solve this.

Please report if rebuilding the initrd for the new kernel (after booting an old, working one) works for you? (with new dracut installed obviously!)
Comment 7 Shlomi Fish 2013-09-28 16:02:55 CEST
Hi Colin,

(In reply to Colin Guthrie from comment #6)
> See the messages on the systemd list. Seems the newer dracut defaults to
> systemd-in-initrd which is currently causing some problems.
> 

Which ones? Do you have a link?

> Latest dracut build should solve this.

Which one is it? What version and -rel?

> 
> Please report if rebuilding the initrd for the new kernel (after booting an
> old, working one) works for you? (with new dracut installed obviously!)

OK, I will, but you need to answer my questions first.

Regards,

-- Shlomi Fish
Comment 8 Bit Twister 2013-09-28 16:54:42 CEST
(In reply to Shlomi Fish from comment #7)
> Hi Colin,
> 
> Which ones? Do you have a link?
> 
> > Latest dracut build should solve this.
> 
> Which one is one? What version and -rel?

dracut-033-1.mga4

Here is a handy link to see what in coming in the next update
http://pkgsubmit.mageia.org/
Comment 9 Shlomi Fish 2013-09-28 17:29:45 CEST
Created attachment 4384 [details]
Screenshot of the kernel OOPS at boot time.

This is the screenshot of the kernel OOPS I get at boot time, after I ran 
«dracut ./new-initrd-3.11.2-desktop-1.mga4.img 3.11.2-desktop-1.mga4» and copied it on top of the initrd.
Comment 10 Shlomi Fish 2013-09-28 17:30:49 CEST
(In reply to Bit Twister from comment #8)
> (In reply to Shlomi Fish from comment #7)
> > Hi Colin,
> > 
> > Which ones? Do you have a link?
> > 
> > > Latest dracut build should solve this.
> > 
> > Which one is one? What version and -rel?
> 
> dracut-033-1.mga4
> 
> Here is a handy link to see what in coming in the next update
> http://pkgsubmit.mageia.org/

Thanks! I'm still getting the kernel OOPS with this dracut installed.

Regards,

-- Shlomi Fish
Comment 11 Colin Guthrie 2013-09-28 17:47:23 CEST
(In reply to Shlomi Fish from comment #7)
> Hi Colin,
> 
> (In reply to Colin Guthrie from comment #6)
> > See the messages on the systemd list. Seems the newer dracut defaults to
> > systemd-in-initrd which is currently causing some problems.
> > 
> 
> Which ones? Do you have a link?

Sorry, thinko - meant cauldron list. Should be pretty obvious which ones.
 
> > Latest dracut build should solve this.
> 
> Which one is it? What version and -rel?

As Bit Twister already said, it's a good idea to check the pkgsubmit URL to see what's latest and greatest.

As you're still getting the oops, it's likely a different issue than the dracut one.
Comment 12 Chris Denice 2013-10-02 17:15:26 CEST
Hi guys,
I got exactly the same kernel panic message (as attachement 4384) with all my old computers and latest kernel 3.11.2-desktop-2mga

one has this:
Model name:            Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
Card:Intel 810 and later: Intel Corporation|82Q963/Q965 Integrated Graphics Controller [DISPLAY_VGA] (rev: 02)


and the others:
Model name:            Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
Card:NVIDIA GeForce 400 series and later: NVIDIA Corporation|GK107 [GeForce GTX 650] [DISPLAY_VGA] (rev: a1)

cheers

CC: (none) => dirteat

Comment 13 Chris Denice 2013-10-15 18:12:31 CEST
Oopaa, new kernel 3.12.0-desktop-rc5.1, new bug:

dracut: Mounted root filesystem /dev/sda1
dracut: Switching root 
switch root: cannot access /sbin/init

I have not clue where to put that bug, here or 11366 or a new one; sounds like dracut related?

cheers,
chris.
Comment 14 Colin Guthrie 2013-10-15 18:26:41 CEST
(In reply to Chris Denice from comment #13)
> Oopaa, new kernel 3.12.0-desktop-rc5.1, new bug:

Please put "rd.break=pre-pivot" on the kernel command line and verify that /sysroot/sbin/init exists and is a valid init binary (note that any symlinks will be relative to /sysroot, so if it's a broken symlink be sure to validate if it would work if /sysroot == /)

> I have not clue where to put that bug, here or 11366 or a new one; sounds
> like dracut related?

It's virtually impossible to say at this stage. It really needs to be triaged first to find out where the problem is.
Comment 15 Chris Denice 2013-10-15 18:58:24 CEST
Thanks Colin for the feedback. Here you go:

cd /sbin
ls -l init
lrwxrwxrwx 1 root root 22 Oct 10 16:22 init -> ../lib/systemd/systemd
ls -l ../lib/systemd/systemd
-rwxr-xr-x 1 root root 1070568 Oct  8 23:00 ../lib/systemd/systemd*

Looks good!

Adding rd.break=pre-pivot to the command line argument gives me a shell, I have attached the rdsosreport.txt I am able to get.

Then, I typed "exit" twice and the system went on booting with the following messages:

switch_root:/# exit:
switch_root: cannot access /sbin/init: Not such file or directory
switch_root: failed to execute /sbin/init: Not such file or directory

and then the kernel panic logs of bug 11366 appears again (it was not logged previously, appears only with rd.break=pre-pivot).

[] Kernel panic - not syncing blabla
blablabal

[] drm_kms_helper: panic occured, switching back to text console
-
Comment 16 Chris Denice 2013-10-15 18:59:17 CEST
Created attachment 4436 [details]
from rd.break=pre-pivot
Comment 17 Chris Denice 2013-10-21 09:37:29 CEST
Hi guys,
after the mass rebuild, all my initrd were redumped, including the old ones, and my previously working kernel 3.10.10-3 failed to boot with exactly the Slomy's screen shot message.

So, I guess there is no doubt anymore that it comes from dracut.

If someone has an idea on how to fix this, that would be utterly cool as all my initrd are now unbootable :) [I nuked the .old trying to play with dracut from a rescue boot.iso...]

Cheers,
chris.

PS: I remember we had this discussion on the ML a while ago, but that is pretty crazy, and a fortiori ultra dangerous, to make a package rebuilding all the initrd during an update no? Is that plymouth package?
Comment 18 Chris Denice 2013-10-21 09:39:49 CEST
*** Bug 11366 has been marked as a duplicate of this bug. ***
Comment 19 Colin Guthrie 2013-10-21 12:33:02 CEST
I'm not 100% sure this is actually the same issue, as the problem encountered by Chris *may* be due to a new module I added to dracut for use in the stage1 of the installer... It hacks about with switch_root binary and could be getting in the way here.

You would see this problem if it included the "mgainstaller" when the initrd was rebuilt. I don't *think* this is actually what's happening, I'm just paranoid :)
Thomas Bigot 2013-10-21 14:39:55 CEST

CC: (none) => thomas.bigot

Comment 20 Chris Denice 2013-10-21 21:34:39 CEST
At last....

this is this *$@#!^@ of "plymouth" module, and/or its required dependency to "drm".

If I regenerate an initrd with

dracut -f --omit "plymouth" initrd-3.12.0-desktop-0.rc5.2.mga4.img 3.12.0-desktop-0.rc5.2.mga.4

(which removes also the "drm" modules from the initrd)

I can boot all kernels; and kms works fine too drm modules get loaded fine afterwards.

That does not explain why loading drm/plymouth in the initrd produces a kernel panic and a "cannot switch root" though. I can do more tests if needed, let me know guys.

Cheers,
Chris.
Comment 21 Shlomi Fish 2013-10-22 07:23:43 CEST
Hi all,

(In reply to Chris Denice from comment #20)
> At last....
> 
> this is this *$@#!^@ of "plymouth" module, and/or its required dependency to
> "drm".
> 
> If I regenerate an initrd with
> 
> dracut -f --omit "plymouth" initrd-3.12.0-desktop-0.rc5.2.mga4.img
> 3.12.0-desktop-0.rc5.2.mga.4
> 
> (which removes also the "drm" modules from the initrd)
> 

Doing that allows me to boot, but:

1. The / filesystem gets mounted read-only after boot.

2. The kernel version reported before the login prompt (on Ctrl+Alt+F1) is 3.10.10. The "uname -a" kernel version is fine.

Can anyone help with that?

Regards,

-- Shlomi Fish
Comment 22 Chris Denice 2013-10-22 09:07:21 CEST
Hi Shlomi,
are you sure you did not mix a different name for the initrd and kernel version in dracut--> ?

dracut -f --omit "plymouth" INITRDNAME KERNELVERSIONNAME

Or maybe check your /boot/grub/menu.lst to be sure that the initrd and kernel version match also there.

Otherwise I don't know!
Comment 23 Colin Guthrie 2013-10-22 11:03:49 CEST
Shlomi,

1) Check your fstab. Does it have a definition for / in there? The initrd will typically mount it ro and then userspace should remount it rw later. "systemctl status systemd-remount-fs.service" should be the bit that does this. If it's "Active" and looks OK, try running the binary /usr/lib/systemd/systemd-remount-fs manyall and see if it remounts the fs rw properly. Someone recently reported a problem upstream (totally different distro) whereby the unit had run OK, but the fs was still ro. Running manually worked fine too. If you've got the same problem, then it would certainly be interesting (I couldn't really work out what's doing this for the other guy, but we can probably dig further - probably best to do it on a different bug tho'.


2) The kernel version shown on the prompt is actually updated via some scripts and is unrelated to the current boot kernel. I can't recall which scripts it is that does it... should be easy enough to fix tho'. Again a separate bug.
Comment 24 Colin Guthrie 2013-10-22 11:07:21 CEST
Chris,

As a further test, try --omit ' plymouth ' --add ' drm '

Hopefully this will break too and we can narrow it down to a sole drm issue that "just happens" without any plymouth poking.
Comment 25 Chris Denice 2013-10-22 11:24:21 CEST
Hi Colin,
done!

It works fine with only drm included (I checked with lsinitrd that it was really in)!

cheers.
Comment 26 Chris Denice 2013-10-23 20:30:13 CEST
Hi Colin, I have seen you have now banned systemd from initrd modules.

That triggers a new kernel panic on all my machines; if I add systemd by hand, I am recovering bootable kernels.

But now, more intestingly, if I generate a new initrd with plymouth + drm + systemd included, the system boots fine.

So, I am just completely lost with this mess...

To summarize on 3.12.0-desktop-0.rc6.1:

This works:

dracut-034 with dracut modules:
bash
dash
i18n
ifcfg
drm
plymouth
kernel-modules
resume
rootfs-block
terminfo
udev-rules
systemd
base
fs-lib
shutdown

This fails:
dracut-034 with dracut modules:
bash
dash
i18n
ifcfg
drm
plymouth
kernel-modules
resume
rootfs-block
terminfo
udev-rules
base
fs-lib
shutdown


This fails:
dracut-034 with dracut modules:
bash
dash
i18n
ifcfg
kernel-modules
resume
rootfs-block
terminfo
udev-rules
base
fs-lib
shutdown


so now, it looks like it is only due to missing systemd module ????
Comment 27 Colin Guthrie 2013-10-24 10:47:56 CEST
Unfortunately I won't be able to look until next week as I'll be away, but if you're chatting with others on IRC, please do point them here so they can help debug if they are similarly affected!

So are you saying now that the only way you can boot is with the systemd module?

I'll have to try and replicate your disk layout to test. Sadly the systemd module causes other issues on some h/w (namely mine) and there are some situations where it doesn't actually work well (networking being one - which is admittedly a very corner case).

Anyway, I won't have much time until next week sadly to poke properly. It also smells a bit kernely too which is not my area and I'm still to boot my regular machine on a new kernel due to other compatibility issues!
Comment 28 Chris Denice 2013-10-24 13:26:21 CEST
> So are you saying now that the only way you can boot is with the systemd module?

yes, exactly.

No pb, I see Master Tmb is in CC of these posts, so he can bounce back.

My fstab below, for all the problematic machines I have always / and /usr on separated partitions, I'll try to test if I put everything on /.

# Entry for /dev/sda1 :
UUID=5243cfa3-260f-4ebc-ae89-e4f1a4713f5d / ext4 relatime,acl 1 1
# Entry for /dev/sda6 :
UUID=d3e6cc32-78d7-11db-9e3b-bb4798278896 /home ext3 acl,relatime 1 2
none /proc proc defaults 0 0
# Entry for /dev/sdb1 :
UUID=c78cc77e-d432-400a-ba86-5a94a2437f16 /share xfs defaults 1 2
# Entry for /dev/sda7 :
UUID=06a67d1a-8d4d-4127-b775-a648873d57ec /usr ext4 relatime,acl 1 2
# Entry for /dev/sda5 :
UUID=0dd316ce-72f3-47e4-9782-e5291418a9e7 swap swap defaults 0 0
Comment 29 Chris Denice 2013-10-24 17:13:51 CEST
FOUND!!
missing dracut module usrmount for people having partitioned their disk under
/
/usr

for some reason, systemd is taking care of this. But if we remove it, then we have to add usrmount. I would suggest to add it in /etc/dracut.conf.d/50-mageia.conf ?



Works fine:

lsinitrd initrd-3.12.0-desktop-0.rc6.1.mga4.img

dracut-034 with dracut modules:
bash
dash
i18n
ifcfg
drm
plymouth
kernel-modules
resume
rootfs-block
terminfo
udev-rules
usrmount <----------------------------------
base
fs-lib
shutdown
Comment 30 Chris Denice 2013-10-24 17:25:33 CEST
which is weird as my /etc/fstab is clearly specifying I have a separated / and /usr partition.

So generating the initrd should automatically include an usrmount I guess? dracut bug?
Comment 31 Shlomi Fish 2013-10-24 19:49:08 CEST
Hi Chris,

(In reply to Chris Denice from comment #29)
> FOUND!!
> missing dracut module usrmount for people having partitioned their disk under
> /
> /usr
> 
> for some reason, systemd is taking care of this. But if we remove it, then
> we have to add usrmount. I would suggest to add it in
> /etc/dracut.conf.d/50-mageia.conf ?
> 
> 
> 

Can you share the dracut command line invocation?

Regards,

-- Shlomi Fish

> Works fine:
> 
> lsinitrd initrd-3.12.0-desktop-0.rc6.1.mga4.img
> 
> dracut-034 with dracut modules:
> bash
> dash
> i18n
> ifcfg
> drm
> plymouth
> kernel-modules
> resume
> rootfs-block
> terminfo
> udev-rules
> usrmount <----------------------------------
> base
> fs-lib
> shutdown
Comment 32 Chris Denice 2013-10-24 19:58:55 CEST
yep:

within /boot:

dracut -f --add "usrmount" initrd-3.12.0-desktop-0.rc6.1.mga4.img 3.12.0-desktop-0.rc6.1.mga4

override your old initrd-3.12.0-desktop-0.rc6.1.mga4.img, so you may want to be sure you have a backup of a working one somewhere else.

That's it.

then
lsinitrd initrd-3.12.0-desktop-0.rc6.1.mga4.img

to check that you really have done what you wanted. Some modules are banned in the
/etc/dracut.conf.d/ (as systemd for instance), so you shouldn't see it in the initrd.

cheers.
Bit Twister 2013-10-24 20:13:58 CEST

CC: junknospam => (none)

Comment 33 Shlomi Fish 2013-10-24 20:19:46 CEST
(In reply to Chris Denice from comment #32)
> yep:
> 
> within /boot:
> 
> dracut -f --add "usrmount" initrd-3.12.0-desktop-0.rc6.1.mga4.img
> 3.12.0-desktop-0.rc6.1.mga4

Thanks I'll look into it later.

Regards,

-- Shlomi Fish
Comment 34 Shlomi Fish 2013-10-25 08:03:28 CEST
Hi Chris,

(In reply to Chris Denice from comment #32)
> yep:
> 
> within /boot:
> 
> dracut -f --add "usrmount" initrd-3.12.0-desktop-0.rc6.1.mga4.img
> 3.12.0-desktop-0.rc6.1.mga4
> 

Many thanks! This worked like a charm and I can now boot the up-to-date kernels with seemingly no glitches.

Regards,

-- Shlomi Fish
Comment 35 Chris Denice 2013-10-25 18:45:59 CEST
Reported upstream,
that's indeed a bug in dracut; a typo in modules.d/98usrmount/module-setup.sh preventing adding usrmount if /sbin/init is on a separated /usr partition.

They have already provided a patch that I pushed into dracut.

Cheers,
chris.
Comment 36 Colin Guthrie 2013-10-27 19:27:24 CET
Seeing as Shlomi confirmed the fix, I guess we can close this one? Please reopen if I've missed something while I've been away :)

Thanks very much for working on this Chris!!

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 37 claire robinson 2013-10-28 08:15:02 CET
Is this likely to affect Mageia 3/2 also? We have a kernel update in testing at the moment.
claire robinson 2013-10-28 08:17:02 CET

CC: (none) => eeeemail

Comment 38 Chris Denice 2013-10-28 10:06:55 CET
No pb Colin, that was easier since I was affected.

to claire: I don't think so, dracut is a completely different version and I used mga2 and mga3 on those very same computers with the two / and /usr partitions without troubles (Colin can confirm/infirm).

Cheers,
chris,
Comment 39 Colin Guthrie 2013-10-28 11:19:36 CET
Yeah as Chris said, the version of dracut in mga3 is older and should pre-date the introduction of this bug AFAICT.
Comment 40 claire robinson 2013-10-28 12:07:25 CET
Thankyou both
Comment 41 Chris Denice 2013-10-28 16:03:55 CET
For the record, my upstream report started some discussions on potentially related bugs. I post it here in case we will be affected (We are not right now):

http://www.mail-archive.com/initramfs@vger.kernel.org/msg03390.html
Comment 42 Colin Guthrie 2013-10-28 16:26:08 CET
It shouldn't really apply to us but thanks for the headsup :)

Note You need to log in before you can comment on or make changes to this bug.