Bug 22059 - Incorrect UUID for /boot partition may be used when creating the initrd, leading to an unbootable system ( _only_ with stage2's diskdrake; diskdrake from drakxtools works fine)
Summary: Incorrect UUID for /boot partition may be used when creating the initrd, lead...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard: (MGA6)
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2017-11-21 16:38 CET by Martin Whitaker
Modified: 2019-12-23 19:08 CET (History)
3 users (show)

See Also:
Source RPM: drakx-installer-stage2
CVE:
Status comment:


Attachments
Proposed fix (1.99 KB, text/plain)
2017-11-27 16:20 CET, Martin Whitaker
Details
Proposed fix (updated) (1.96 KB, text/plain)
2017-12-08 10:09 CET, Martin Whitaker
Details

Description Martin Whitaker 2017-11-21 16:38:39 CET
Steps to reproduce:

1. Start with a GPT partitioned disk with 4 FAT32 partitions.
2. Boot from the 64-bit Classic Installer ISO in UEFI mode.
3. Choose Custom Partitioning.
4. Clear all existing partitions (use the "Clear all" button).
5. Create a new EFI system partition (/boot/EFI).
6. Create a new ext4 partition (/boot).
7. Create a new swap partition.
8. Create a new ext4 partition (/).
9. Continue with installation.

(I don't think UEFI mode is important here - that's just the steps I used when I found this bug)

When you reboot after the installation is complete, the system will eventually drop into the debug shell with the message

dracut Warning: Could not boot.
dracut Warning: /dev/disk/by-uuid/XXXX-XXXX does not exist

where XXXX-XXXX is the UUID of the second pre-existing FAT32 partition.

The cause of the problem is that udevd is not automatically rerunning its rules when the new partition table is written (e.g. the links in /dev/disk/by-uuid aren't getting updated).

This appears to only affect the installer - when I make the same changes using diskdrake in a running system, the links in /dev/disk/by-uuid do get updated.
Comment 1 Marja Van Waes 2017-11-21 18:15:03 CET
Thanks for the report, Martin.

Setting version to cauldron, as we always do with installer bugs, since already released ISOs cannot be fixed.

Putting "(MGA6)" on the whiteboard to indicate that this issue was last seen with a Mageia 6 ISO.

CC: (none) => marja11, pterjan
Assignee: bugsquad => mageiatools
Whiteboard: (none) => (MGA6)
Source RPM: (none) => drakx-installer-stage2
Version: 6 => Cauldron
Summary: Incorrect UUID for /boot partition may be used when creating the initrd, leading to an unbootable system => Incorrect UUID for /boot partition may be used when creating the initrd, leading to an unbootable system ( _only_ with stage2's diskdrake; diskdrake from drakxtools works fine)

Comment 2 Martin Whitaker 2017-11-27 16:20:09 CET
Created attachment 9805 [details]
Proposed fix

On experiment, it seems that it is the udev 60-blocks.rule that causes the soft links in /dev/disk/by-uuid to get updated after the partition table is written and partitions are formatted. Unfortunately we know from bug 20074 that adding that rule to stage 2 causes other problems.

The attached patch works round this problem by calling 'udevadm trigger' after all partitions are formatted. This brute force solution is ugly, but it works.
Martin Whitaker 2017-11-27 16:20:21 CET

Keywords: (none) => PATCH

Comment 3 Martin Whitaker 2017-12-08 10:09:47 CET
Created attachment 9821 [details]
Proposed fix (updated)

Updated with an improvement to the comment describing the fix. No functional changes.

Attachment 9805 is obsolete: 0 => 1

Comment 4 Mageia Robot 2018-01-09 22:18:27 CET
commit 13d0e32733b8c1827335a1551dedbbf88daf369f
Author: Martin Whitaker <mageia@...>
Date:   Mon Nov 27 15:06:16 2017 +0000

    installer: force update of /dev/disk/by-uuid after partitioning (mga#22059)
    
    Because stage2 does not include the udev 60-blocks.rule, udev does not
    automatically update the soft links in /dev/disk/by-uuid after we write
    the partition table and format the partitions. We need these links to
    be updated before we create the initrd. It would be cleaner to fix this
    with a udev rule, but for now, use brute force.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=13d0e32733b8c1827335a1551dedbbf88daf369f
Comment 5 Thierry Vignaud 2018-01-10 17:50:08 CET
(In reply to Martin Whitaker from comment #2)
> On experiment, it seems that it is the udev 60-blocks.rule that causes the
> soft links in /dev/disk/by-uuid to get updated after the partition table is
> written and partitions are formatted. Unfortunately we know from bug 20074
> that adding that rule to stage 2 causes other problems.

We would really need to fix that eventually in order to have the same code path in standalone diskdrake & in drakx.
I'm not confortable with that :-)
Maybe a udevadm settle call, or disabling temporary udev or whatever.
Maybe we're missing yet another rules?

Vava el udev ... :-)

CC: (none) => thierry.vignaud

Comment 6 Thierry Vignaud 2018-01-10 17:55:37 CET
Or maybe some missing hwdb file or sg like that.
We should also check that /etc/udev/hwdb.bin does got generated in the installer (permissions vs squashfs and the like)... I fear we're missing a link to tmpfs there...
Comment 7 Thierry Vignaud 2018-01-10 17:57:35 CET
/usr/lib/udev/rules.d/99-systemd.rules also has block rules. Nothing that pops up in my mind but including it may be worth a try.
Comment 8 Martin Whitaker 2018-01-11 00:50:59 CET
(In reply to Thierry Vignaud from comment #5)
> We would really need to fix that eventually in order to have the same code
> path in standalone diskdrake & in drakx.
> I'm not confortable with that :-)

If I describe something as a brute force fix, you can be sure I'm not happy with it either...and also that I've tried and failed to find a cleaner solution.

> Maybe a udevadm settle call, or disabling temporary udev or whatever.

The problem with 'udevadm settle' is that it only checks that udevd has finished processing any events it has received. It doesn't check whether there are any events still to come from the kernel. The only reason it seems to help is because it adds a bit of delay. Running usleep would do just as well ;-)

> Maybe we're missing yet another rules?

Well, as I said, the 60-blocks.rule does fix this bug, so I didn't try any others.  But if we enable that rule in stage 2, udevd will send BLKRRPART to the kernel when it sees a write to the raw device, causing a race when we are writing the partition table and informing the kernel ourselves, and there doesn't seem to be any way to prevent udevd doing that.

If someone else can come up with a udev rule that only does that actions we want, that would be great - but I won't hold my breath...
Comment 9 Martin Whitaker 2019-12-23 19:08:44 CET
No better fix has been proposed, so closing

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.