Bug 11636 - Preparing MGA2 system for MGA3 upgrade results in an unstable system if booted via sysvinit.
Summary: Preparing MGA2 system for MGA3 upgrade results in an unstable system if boote...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: QA Team
QA Contact:
URL:
Whiteboard: advisory MGA2-64-OK MGA2-32-OK
Keywords: validated_update
Depends on:
Blocks:
 
Reported: 2013-11-10 18:49 CET by Christian Müller
Modified: 2013-11-22 20:35 CET (History)
4 users (show)

See Also:
Source RPM: mageia-prepare-upgrade-2-2.mga2.noarch.rpm
CVE:
Status comment:


Attachments
Screenshot of x86_64 vb guest (52.43 KB, image/png)
2013-11-22 16:54 CET, Dave Hodgins
Details

Description Christian Müller 2013-11-10 18:49:49 CET
Description of problem:

After installing mageia-prepare-upgrade-2-2.mga2.noarch.rpm and choosing 'prepare upgrade' on boot the system doesn't boot anymore

Version-Release number of selected component (if applicable):

mageia-prepare-upgrade-2-2.mga2.noarch.rpm

How reproducible:
look above. I experienced this to times now, on totally different Hardware, and once on i586, once on x64, both up-to-date Mageia 2. Both systems were previously upgraded from Mageia 1 and had a separated and clean /boot partition on sda1. When booting the upgrade-preparation the kernel boots up, and then the startup process runs in many errors, files in /var/run and /var/lock could not be touched or be found. The system boots up to console mode, with no network and keyboard not working correctly. When rebooting with ctrl-alt-del and choosing normal Mageia2 startup it runs into the same errors.

I then decided to do an upgrade via DVD and ran into this error:
https://bugs.mageia.org/show_bug.cgi?id=11635

Please let me know if you need some further information, and I could try to attach some more detailed error messages. I could leave the broken system untouched for a while.

Reproducible: 

Steps to Reproduce:
Comment 1 Christian Müller 2013-11-10 18:51:13 CET
It should read: I experienced this *two* times now ;-)
claire robinson 2013-11-13 17:44:11 CET

CC: (none) => mageia

Comment 2 Colin Guthrie 2013-11-13 18:27:39 CET
So as you can imagine, this process is somewhat hard to debug without additional information :)

It's also a process that was heavily tested and has worked well on a huge number of installs now so whatever problems remain should be fairly easy to fix.

First of all have you read all the information about this in the wiki?

https://wiki.mageia.org/en/Mageia_3_Errata#Upgrade_Issues

In particular do you have any third party packages installed?


Secondly, can you describe your partition layout? Do you have a separate /usr or /var partition? What filesystems are you using for /, /usr and /var (depending on partition layout, there may only be one!).


It looks like the conversion actually worked OK, BUT something is not quite right that prevents it from working 100%. As the conversion simply boots your system after it's done it's work, the fact you get the same errors rebooting into the normal boot option is actually quite understandable. Perhaps it's something really simple like /run folder not existing in your root fs? If this is the case, perhaps simply mkdir'ing it will fix things? That said, I cannot really think how this could occur as /run is part of the filesystem package on mga2, so this is a bit of a long shot.


Finally can you check the root filesystem to see if:

1. You are using systemd for boot under Mageia 2. If you are using sysvinit, things will likely break somewhat.
2. /var/run and /var/lock are symlinks to /run and /run/lock respectively.
3. /bin, /sbin, /lib and /lib64 are symlinks to the same folders in /usr

Hopefully with the above info we'll get a bit closer to knowing what went wrong.

Cheers.
Comment 3 Christian Müller 2013-11-17 14:17:08 CET
(In reply to Colin Guthrie from comment #2)
> So as you can imagine, this process is somewhat hard to debug without
> additional information :)

I'm aware of that ;)

> First of all have you read all the information about this in the wiki?
> 
> https://wiki.mageia.org/en/Mageia_3_Errata#Upgrade_Issues

yes, I did.
 
> In particular do you have any third party packages installed?

nope, only Google Earth.

> Secondly, can you describe your partition layout? Do you have a separate
> /usr or /var partition? What filesystems are you using for /, /usr and /var
> (depending on partition layout, there may only be one!).

sda1 is /boot
sda5 is swap
sda6 is /

> It looks like the conversion actually worked OK, BUT something is not quite
> right that prevents it from working 100%. As the conversion simply boots
> your system after it's done it's work, the fact you get the same errors
> rebooting into the normal boot option is actually quite understandable.
> Perhaps it's something really simple like /run folder not existing in your
> root fs? If this is the case, perhaps simply mkdir'ing it will fix things?
> That said, I cannot really think how this could occur as /run is part of the
> filesystem package on mga2, so this is a bit of a long shot.
> 
> 
> Finally can you check the root filesystem to see if:
> 
> 1. You are using systemd for boot under Mageia 2. If you are using sysvinit,
> things will likely break somewhat.
> 2. /var/run and /var/lock are symlinks to /run and /run/lock respectively.
> 3. /bin, /sbin, /lib and /lib64 are symlinks to the same folders in /usr

2. Both sysmlinks don't exist. I created /var/run, but /run/lock doesn't exist at all. /run ist empty, only a folder 'plymouth' in there. That looks like the main problem to me.
3. This looks all ok.
Comment 4 Colin Guthrie 2013-11-17 15:00:26 CET
(In reply to Christian Müller from comment #3)

Thanks for the info. Nothing strange or weird there to get in the way so all should be OK.

> > Finally can you check the root filesystem to see if:
> > 
> > 1. You are using systemd for boot under Mageia 2. If you are using sysvinit,
> > things will likely break somewhat.

You didn't answer this question.... Are you definitely using systemd?

> > 2. /var/run and /var/lock are symlinks to /run and /run/lock respectively.

> 
> 2. Both sysmlinks don't exist. I created /var/run, but /run/lock doesn't
> exist at all. 

Interesting. The fact that these have gone but the /bin, /lib etc. dirs are OK, suggests that they *were* created OK, but then on boot they were cleared out by a rogue run of the mandriva script that clears out /var/run and /var/lock. The default shipped on mga2 would also delete the top level symlinks too which is handy. The conversion script should have dropped a little fix onto the filesystem to prevent this from being run again under systemd by removing the /lib/systemd/system/sysinit.target.wants/mandriva-clean-var-run-lock.service symlink that triggers it.

As they have been cleared out, it looks like you are actually running sysvinit and thus a systemd fix for this would have no effect...

You might also find that the symlinks disappear again after creating them on next boot if this theory is correct.


> /run ist empty, only a folder 'plymouth' in there. That looks
> like the main problem to me.

This is indeed very interesting. /run is meant to be a tmpfs and should be mounted by your dracut initrd and it at very least creates a /run/initramfs folder. Are you inspecting the filesystem here from a working system or looking at it after one of the weirdly broken boots? If the latter can you double check that /run is indeed a mountpoint (type: mountpoint /run)?

The fact that /run/lock doesn't exist yet is expected. The /run/lock folder is created early on after the tmpfs is mounted.

> > 3. /bin, /sbin, /lib and /lib64 are symlinks to the same folders in /usr> 3. This looks all ok.

OK, so the conversion looks fine. It's just something is busted afterwards.

I'm suspecting that you actually have a mostly sysvinit+mkinitrd based system and this is where things broke done. This is likely not as heavily tested scenario and thus it's not cropped up until now.

If it's not that.... then the mystery deepens!! ;)
Comment 5 Christian Müller 2013-11-17 15:24:35 CET
(In reply to Colin Guthrie from comment #2)
> You didn't answer this question.... Are you definitely using systemd?

looks like sysvinit.

> As they have been cleared out, it looks like you are actually running
> sysvinit and thus a systemd fix for this would have no effect...
> 
> You might also find that the symlinks disappear again after creating them on
> next boot if this theory is correct.
> 
> 
> > /run ist empty, only a folder 'plymouth' in there. That looks
> > like the main problem to me.
> 
> This is indeed very interesting. /run is meant to be a tmpfs and should be
> mounted by your dracut initrd and it at very least creates a /run/initramfs
> folder. Are you inspecting the filesystem here from a working system or
> looking at it after one of the weirdly broken boots? If the latter can you
> double check that /run is indeed a mountpoint (type: mountpoint /run)?

I used the installer DVD and its rescue system to inspect the filesystem. I can't use the native system, because when the broken boot-up has finished, neither keyboard nor network is working, so I can't log on. Could it be the initrd is broken? Does the prepare-upgrade package create a new initrd?
Comment 6 Colin Guthrie 2013-11-17 19:09:01 CET
(In reply to Christian Müller from comment #5)
> (In reply to Colin Guthrie from comment #2)
> > You didn't answer this question.... Are you definitely using systemd?
> 
> looks like sysvinit.

OK, this is indeed a problem. I guess the upgrade preparation package should have required systemd as an init system too and not just dracut.

> > As they have been cleared out, it looks like you are actually running
> > sysvinit and thus a systemd fix for this would have no effect...
> > 
> > You might also find that the symlinks disappear again after creating them on
> > next boot if this theory is correct.
> > 
> > 
> > > /run ist empty, only a folder 'plymouth' in there. That looks
> > > like the main problem to me.
> > 
> > This is indeed very interesting. /run is meant to be a tmpfs and should be
> > mounted by your dracut initrd and it at very least creates a /run/initramfs
> > folder. Are you inspecting the filesystem here from a working system or
> > looking at it after one of the weirdly broken boots? If the latter can you
> > double check that /run is indeed a mountpoint (type: mountpoint /run)?
> 
> I used the installer DVD and its rescue system to inspect the filesystem. I
> can't use the native system, because when the broken boot-up has finished,
> neither keyboard nor network is working, so I can't log on. Could it be the
> initrd is broken? Does the prepare-upgrade package create a new initrd?

So yes, the prepare-upgrade system does indeed create a new initrd. It should be a dracut based initrd (as it requires dracut and that will then be used directly rather than mkinitrd which was the older version.

You should be able to do the following to get a working system:

1. Boot with the rescue system.
2. Ensure the network is working.
3. Mount your partitions to /mnt (and /mnt/boot for the boot altohugh this is not strictly needed).
4. mount -o bind /etc/resolv.conf /mnt/etc/resolv.conf
5. mount -o bind /proc /mnt/proc
6. mount -o bind /sys /mnt/sys
7. mount -o bind /dev /mnt/dev
8. chroot /mnt /bin/bash
9. You are now in a chrooted version of your install. You should be able to run urpmi from here with network support.
10. urpmi systemd-sysvinit
11. It should warn you that some other pkg has to be removed, but then should install OK.
12. Recreate the /var/run -> /run and /var/lock -> /run/lock sysmlinks (the latter will look broken but that's OK).
13. exit
14. umount all the bind mounts
15. umount the partitions.
16. Reboot.

You should now be OK again, both as a working system and be able to upgrade via urpmi to mga3 (although the umask issue when going vie the DVD installer on bug #11635 does appear to be separate)

HTHs.
Comment 7 Christian Müller 2013-11-17 22:07:22 CET
> HTHs.

that helped a lot indeed :)
Mageia 2 boots up again, and I'm now running the online-upgrade.
Thanks a lot :)
Comment 8 Colin Guthrie 2013-11-17 22:36:30 CET
Great!!

OK, so the fix is simple too so others don't get tripped up. Just add a Requires to the prepare-upgrade pkg. Will do that tomorrow. :)
Comment 9 Colin Guthrie 2013-11-22 15:45:58 CET
==== Advisory Text ====

The preparation utility for ensuring a Mageia 2 system was ready to upgrade to Mageia 3 only works if your Mageia 2 system was running systemd.

This update ensures that you are booting with systemd prior to upgrading to Mageia 3 (which is systemd-only anyway) and avoids the instability of your system during this small window when upgrading.

Assignee: bugsquad => qa-bugs
Summary: mageia upgrade preparation breaks system => Preparing MGA2 system for MGA3 upgrade results in an unstable system if booted via sysvinit.

Comment 10 Colin Guthrie 2013-11-22 15:47:27 CET
SRPM: mageia-prepare-upgrade-2-3.mga2.src.rpm
RPMS: mageia-prepare-upgrade-2-3.mga2.noarch.rpm
Comment 11 claire robinson 2013-11-22 16:50:02 CET
Advisory uploaded.

Whiteboard: (none) => advisory

Comment 12 Dave Hodgins 2013-11-22 16:54:52 CET
Created attachment 4517 [details]
Screenshot of x86_64 vb guest

Worked ok on my Mageia 2 i586 vb guest, but failed on my Mageia 2 x86_64 guest.
After it failed, with several messages about being unable to access files in
/var/run, and /var/lock, I rebooted the guest to run level 1.

As shown in the screenshot, /var/run has been renamed to /var/run.runmove~,
and /var/lock to /var/lock.lockmove~.

CC: (none) => davidwhodgins

Comment 13 Colin Guthrie 2013-11-22 17:03:20 CET
Is your x86_64 system still running sysvinit per-chance? This is the symptoms you'd see there. The renameing to runmove~ and lockmove~ is expected. It does that so as not to delete things that exists, but it then creates symlinks after. The symlinks are then probably deleted on next boot by sysvinit when "cleaning" up those folders. This is ultimately what happened in this bug report.

The never version of the upgrade prep package should have ensured that you got a systemd boot.
Comment 14 Dave Hodgins 2013-11-22 17:14:09 CET
Yes. On the i586, I rebooted, after installing the prepare upgrade, but didn't
notice the upgrade stanza, so rebooted normally, into systemd, then rebooted,
selecting the upgrade.

On x86_64, I installed the prepare upgrade, then selected the upgrade stanza on
the first reboot.

So it looks like two reboots are needed, after installing the prepare upgrade
package. One to handle the sysvinit to systemd change, and then one to handle
the move.
Comment 15 Dave Hodgins 2013-11-22 17:15:50 CET
Hmm. Claire said it worked for her. Anything you'd like be to check, before I
roll back to a snapshot, and test again?
Comment 16 Dave Hodgins 2013-11-22 17:18:28 CET
Found it. Hadn't done urpmi.update, so it installed the older version.
Comment 17 claire robinson 2013-11-22 17:25:12 CET
Seems ok to me mga2 64
Comment 18 Dave Hodgins 2013-11-22 17:39:43 CET
Validating the update.

Someone from the sysadmin team please push 11636.adv to updates.

Keywords: (none) => validated_update
Whiteboard: advisory => advisory MGA2-64-OK MGA2-32-OK
CC: (none) => sysadmin-bugs

Comment 19 Colin Guthrie 2013-11-22 18:42:08 CET
Cool. Glad it ended up being that simple :)
Comment 20 Thomas Backlund 2013-11-22 20:35:39 CET
Update pushed:
http://advisories.mageia.org/MGAA-2013-0124.html

Status: NEW => RESOLVED
CC: (none) => tmb
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.