Bug 32097

Summary: Plasma Install with encrypted root, grub2-mkconfig fails
Product: Mageia Reporter: Ulrich Beckmann <bequimao.de>
Component: InstallerAssignee: Mageia Bug Squad <bugsquad>
Status: RESOLVED WORKSFORME QA Contact:
Severity: normal    
Priority: Normal CC: davidwhodgins, lewyssmith, mageia
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: Installation configuration
File ddebug.log
File report.bug

Description Ulrich Beckmann 2023-07-11 22:00:24 CEST
UEFI-Install on Sony Vaio E Series with encrypted root
Classical isos from 07/03, x86_64 fails

An error occurred, grub2-mkconfig failed,
The installer falls back to the configuration screen.
UEFI boot order is unchanged.

Tried later to rescue the installation in chroot:

(chroot) grub2-install /dev/sda
Installing for x86_64-efi platform.
EFI variables are not supported on this system.
EFI variables are not supported on this system.
grub2-install: error: efibootmgr failed to register the boot entry: No such file or directory.
(chroot) 

Ulrich
Comment 1 Ulrich Beckmann 2023-07-11 22:06:29 CEST
Created attachment 13920 [details]
Installation configuration
Comment 2 Ulrich Beckmann 2023-07-11 22:12:06 CEST
Created attachment 13921 [details]
File ddebug.log
Comment 3 sturmvogel 2023-07-12 07:48:39 CEST
Isn't this documented?

Quote:
"If you wish to use encryption on your / partition you must ensure that you have a separate /boot partition. The encryption option for the /boot partition must NOT be set, otherwise your system will be unbootable."

https://doc.mageia.org/installer/8/en/content/diskPartitioning.html
Comment 4 Ulrich Beckmann 2023-07-12 16:59:35 CEST
(In reply to sturmvogel from comment #3)
> Isn't this documented?
> 

Yes, I know. It is a testcase I tested in Mageia 8, 7, ...
The installer enforces a separate /boot partition. Otherwise you can't continue. Not tested this time.

Something peculiar: The disc has two ESP partitions. Any Linux installer so far found the right one /dev/sda3. I did a custom install in expert mode and controlled the partitions and mountpoints.
Comment 5 Dave Hodgins 2023-07-12 19:54:04 CEST
(In reply to sturmvogel from comment #3)
> Isn't this documented?
> 
> Quote:
> "If you wish to use encryption on your / partition you must ensure that you
> have a separate /boot partition. The encryption option for the /boot
> partition must NOT be set, otherwise your system will be unbootable."
> 
> https://doc.mageia.org/installer/8/en/content/diskPartitioning.html

It does have a non-encrypted /boot on sda8.

$ grep /boot ddebug.log 
* mount_part: device=sda8 mntpoint=/boot isMounted= real_mntpoint= device_UUID=3611d90f-cc5d-4212-8c7e-98eb77802439
* mounting UUID=3611d90f-cc5d-4212-8c7e-98eb77802439 on /mnt/boot as type ext4, options noatime,acl
* running: mount -t ext4 UUID=3611d90f-cc5d-4212-8c7e-98eb77802439 /mnt/boot -o noatime,acl
* mount_part: device=sda3 mntpoint=/boot/EFI isMounted= real_mntpoint= device_UUID=FE7B-CEC0
* mounting /dev/sda3 on /mnt/boot/EFI as type vfat, options iocharset=utf8
* running: mount -t vfat /dev/sda3 /mnt/boot/EFI -o check=relaxed
* mount_part: device=sda8 mntpoint=/boot isMounted=1 real_mntpoint= device_UUID=3611d90f-cc5d-4212-8c7e-98eb77802439
* trans: scheduling update of bootloader-utils-1.16-10.mga9.noarch (id=4073, file=/tmp/image/media/core/bootloader-utils-1.16-10.mga9.noarch.rpm)
* trans: scheduling update of bootsplash-3.3.11-9.mga9.noarch (id=3660, file=/tmp/image/media/core/bootsplash-3.3.11-9.mga9.noarch.rpm)
* adding /boot/vmlinuz-6.3.9-desktop-2.mga9
* running: mkinitrd -v -f /boot/initrd-6.3.9-desktop-2.mga9.img 6.3.9-desktop-2.mga9 with root /mnt
* running: /usr/share/bootsplash/scripts/make-boot-splash /boot/initrd-6.3.9-desktop-2.mga9.img 1024 with root /mnt
remove-boot-splash: Format of /boot/initrd-6.3.9-desktop-2.mga9.img not recognized
* adding /boot/vmlinuz-6.3.9-desktop-2.mga9
* adding /boot/vmlinuz-6.3.9-desktop-2.mga9

CC: (none) => davidwhodgins

Comment 6 Dave Hodgins 2023-07-12 20:10:19 CEST
What I find strange is that there are no errors indicated in the ddebug.log, that
I can see.

For the rescue ...

According to https://unix.stackexchange.com/questions/91620/efi-variables-are-not-supported-on-this-system
the message "EFI variables are not supported on this system." occurs when
the module efivars has not been loaded.

On my m8 uefi laptop, efivars is not loaded, but efivarfs is loaded. I'm
no expert on uefi.

My experience is that for some things chroot works, but other things require
using systemd-nspawn instead. I can't remember for sure, but I think
grub2-install requires using systemd-nspawn.

See https://wiki.mageia.org/en/Systemd-nspawn#First_container

Note that unlike chroot, /dev, /proc, /run, and /sys should not be bind
mounted in the container.
Comment 7 Dave Hodgins 2023-07-12 20:10:54 CEST
Note the rescue system must be booted in uefi mode too.
Comment 8 Martin Whitaker 2023-07-13 00:03:53 CEST
(In reply to Dave Hodgins from comment #6)
> What I find strange is that there are no errors indicated in the ddebug.log,
> that I can see.

That's because ddebug.log is incomplete, suggesting the installer was terminated abnormally and failed to sync the log files. Check if report.bug.xz is similarly affected.

For the rescue, the efivars module was deprecated and removed from the Linux kernel last year. You now need to mount the efivarfs pseudo-filesystem to gain access to the EFI variables. The Mageia rescue system does this automatically when booted in EFI mode.

CC: (none) => mageia

Comment 9 Ulrich Beckmann 2023-07-13 20:26:14 CEST
@ Dave Hodgins: Thanks for pointing to systemd-nspawwn. I will look into it later.

@ Martin Whitaker: 
I have forgotten the procedure for report.bug. As it must be created at runtime, I'll have to repeat the whole test.

The rescue worked with chroot and your hint. I'll need the installation just for another test. I will clone it and run another test with the newer rc.

mount -o bind /sys/firmware/efi/efivars /mnt/sys/firmware/efi/efivars

Thanks,
Ulrich
Comment 10 Lewis Smith 2023-07-13 21:37:42 CEST
Thanks to all learned contributors. That was mostly about making Rescue work.

But we still have the original complaint (comment 0).
> An error occurred, grub2-mkconfig failed
Given that most people these days install on EFI systems, that in itself cannot be the issue. It looks as if the rule for encrypted '/' with separate /boot partition is honoured.
So where is the catch?

CC: (none) => lewyssmith

Comment 11 Martin Whitaker 2023-07-14 10:15:13 CEST
@Ulrich, after an error occurs in the installer, use Ctrl-Alt-F2 to switch to the debug console, insert a formatted USB drive into a spare USB socket, and enter the command "bug". That will write report.bug to the USB stick. It will be quite large, so you'll need to compress it before attaching to this bug report.

If the installer exits normally it stores a compressed copy (report.bug.xz) in /root/drakx of the installed system, but I'm guessing you don't have that.
Comment 12 Ulrich Beckmann 2023-07-14 20:37:37 CEST
Created attachment 13922 [details]
File report.bug

@ Martin: I had forgotten the procedure, since the installer never failed since long.

I could reproduce the issue with RC1 from 07/11, see the attached file. The other files from /root/drakx are also available.

Best regards,
Ulrich

Attachment 13921 is obsolete: 0 => 1

Comment 13 Dave Hodgins 2023-07-14 20:57:52 CEST
ERROR: killing runaway process (process=grub2-mkconfig, pid=11190, args=-o /boot/grub2/grub.cfg

So it's hitting the 10 minute hard coded timeout, but with no indication why.

Most likely it's due to os-prober and the large number of file systems.

As per bug 44, you could try adding the kernel parameter divider=10 when
booting the installer.
Comment 14 Lewis Smith 2023-07-14 21:41:55 CEST
Good detective work.

This very long wait for OS-prober only affects a few systems whose essential factor is not known. I suffered it in the past. Unsure whether it is related to number of partitions or other OS's. I never had many of either.

bug 18538 has lots about this, with pointers to other bugs.

@Ulrich
Can you say simply what partitions you have, and other (if any) operating systems? The output of one of, if you can do them:
 $ lsblk
 # fdisk -l /dev/sdx
 # gdisk -l /dev/sdx
even if that info is buried somewhere here already.
Comment 15 Ulrich Beckmann 2023-07-14 21:42:45 CEST
That is evil ...

First the message "Be patient, it might take some time",
then killing it hardcoded.

The system is really complex, multiboot with LVM and btrfs. Normally grub2-mkconfig takes some minutes. I have some partitions excluded with GRUB_OS_PROBER_SKIP_LIST. So everything was fine!
Comment 16 Dave Hodgins 2023-07-14 22:16:27 CEST
Lewis, it's 15 physical partitions on 3 drives. 5 of which are lvm physical
volumes containing 14 logical volumes many of which are btrfs which supports
sub volumes.

It's probably the most complicated partitioning I've seen in a bug report.

Ulrich, while I tried to get the limit increased or provide a way to
override it due to bug 44 (I was using an old i586 system), there is a
point where it is fair to assume something has gotten into a loop.

As per bug 44, use the server kernel when possible. It only checks for mouse
or keyboard input 100 times per second instead of 1000 times per second. It
leaves more time for the cpu to do actual work.

Some people report the mouse is jerky, but I see no difference in the mouse
handling.

With the installer, the kernel used can't be chosen, so add the kernel option
divider=10. It's not as good as switching to the server kernel, but with the
option, the desktop kernel is closer in total throughput.

Another option is to disable os-prober during the install, and then re-enable
it post install.

Please try the divider=10 option first, to help confirm it's not actually
stuck in a loop.
Comment 17 Lewis Smith 2023-07-15 20:28:14 CEST
I second this suggestion to get on with things, but it remains ridiculous that with modern hardware anything like this takes 10m, even 10s. Complex setup notwithstanding.
When I had the problem (installs took 2*10min for bootloader stuff, kernel updates 10m), I also had other Linux's, and their os-probers took a few seconds where ours took forever. Do you Ulrich have another distribution installed to make the comparison?
Comment 18 Ulrich Beckmann 2023-07-16 21:22:57 CEST
Thanks all,

It is not modern hardware, it is a Sony Vaio E Series Notebook about 10 years old. There is one hard disk, two volume groups, one of these closed at installation time (LUKS-encrypted container sda7).

I think it is a bug and bad programming. At least the message should read: "grub2-mkconfig aborted after xxx min". 

Os-prober is slow, but always worked in Cauldron and works after the rescue on this system. I think the problem is I/O and not the kernel. I wonder why it or parts of it are written in shell scripts. Probably because it should work on all systems. And there is no need to scan all btrfs subvolumes, when the search for the root subvolume starts at the default subvolume. Anaconda (or blivet-gui) from Fedora once took more the 40 min until the screen with the partitioning scheme was shown.

The status of the installation is "Works for me". I am confident that the installation is fit for other tests. 

Ulrich
Comment 19 Dave Hodgins 2023-07-17 03:37:39 CEST
Closing as works for me. Thanks for the feedback.

Resolution: (none) => WORKSFORME
Status: NEW => RESOLVED