Bug 27838 - draklive-install crashed due to failure to write to EFI NVRAM
Summary: draklive-install crashed due to failure to write to EFI NVRAM
Status: RESOLVED WORKSFORME
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal major
Target Milestone: Mageia 9
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: IN_ERRATA8
Depends on:
Blocks:
 
Reported: 2020-12-15 21:49 CET by papoteur
Modified: 2021-03-03 21:39 CET (History)
4 users (show)

See Also:
Source RPM: draklive-install-2.26-1.mga8
CVE:
Status comment:


Attachments
Journal taken just after the crash (483.54 KB, text/plain)
2020-12-16 21:02 CET, papoteur
Details
Draklive-install.log after installation of refind (9.20 KB, text/plain)
2020-12-18 17:56 CET, papoteur
Details
The journal of the same installation (423.06 KB, text/plain)
2020-12-18 17:57 CET, papoteur
Details

Description papoteur 2020-12-15 21:49:14 CET
The "draklive-install" program crashed. Drakbug-18.37 caught it.

Installation of grub2 at end of live installation

grub2-install failed: Installation pour la plate-forme x86_64-efi.
Could not prepare Boot variable: No such file or directory
grub2-install : erreur : efibootmgr n'a pas réussi à enregistrer l'entrée de démarrage: Erreur d'entrée/sortie.
	...propagated at /usr/lib/libDrakX/any.pm line 278.
Perl's trace:
drakbug::bug_handler() called from /usr/lib/libDrakX/any.pm:278
any::installBootloader() called from /usr/sbin/draklive-install:397
main::setup_bootloader() called from /usr/sbin/draklive-install:102
main::install_live() called from /usr/sbin/draklive-install:72

Theme name: Adwaita
Kernel version = 5.9.12-desktop-1.mga8
Distribution=Mageia release 8 (Cauldron) for x86_64
CPU=Intel(R) Celeron(R) CPU N3450 @ 1.10GHz
Comment 1 Martin Whitaker 2020-12-15 22:52:52 CET
Please boot to the live desktop and provide the output from running

  mount

in a terminal window. Also attach the log file created by running

  journalctl -b > journal.log

as the root user.

CC: (none) => mageia

Comment 2 papoteur 2020-12-16 20:59:56 CET
Hi Martin,
mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,noexec,size=1903688k,nr_inodes=475922,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,inode64)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,mode=755,inode64)
/dev/loop0 on /run/mgalive/ovlsize type squashfs (ro,relatime)
overlay on / type overlay (rw,noatime,lowerdir=/live/distrib,upperdir=/live/overlay/memory,workdir=/live/overlay/work)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15527)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,mode=755)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=1926224k,nr_inodes=409600,inode64)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=385244k,nr_inodes=96311,mode=700,uid=1000,gid=1000,inode64)
/dev/sda5 on /mnt/install type ext4 (rw,relatime)
/dev/sda1 on /mnt/install/boot/EFI type vfat (rw,relatime,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,check=r,utf8,errors=remount-ro)
/dev/sda7 on /mnt/install/home type ext4 (rw,relatime)
devtmpfs on /mnt/install/dev type devtmpfs (rw,nosuid,noexec,size=1903688k,nr_inodes=475922,mode=755,inode64)
tmpfs on /mnt/install/run type tmpfs (rw,nosuid,nodev,noexec,mode=755,inode64)
none on /mnt/install/proc type proc (rw,relatime)
none on /mnt/install/sys type sysfs (rw,relatime)
tmpfs on /mnt/install/tmp type tmpfs (rw,nosuid,nodev,size=1926224k,nr_inodes=409600,inode64)
Comment 3 papoteur 2020-12-16 21:02:48 CET
Created attachment 12094 [details]
Journal taken just after the crash

I have redo the installation. Same crash.
I chosen grub2 with os-prober.
This is the first time I try to install Mageia on this laptop, alongside Windows 10.
Comment 4 Dave Hodgins 2020-12-16 21:16:04 CET
Looks like the trigger is ...
déc. 16 20:58:45 localhost kernel: [Firmware Bug]: Page fault caused by firmware at PA: 0x65fc01d0

Check for firmware updates.

CC: (none) => davidwhodgins

Comment 5 papoteur 2020-12-16 21:18:29 CET
Trying to install grub2 in a chroot:
LC_ALL=C grub2-install /dev/sda
Installing for x86_64-efi platform.
show_order(): Input/output error
Skipping unreadable variable "Boot0000": Input/output error
Skipping unreadable variable "Boot0001": Input/output error
Skipping unreadable variable "Boot0004": Input/output error
Skipping unreadable variable "Boot000C": Input/output error
Skipping unreadable variable "Boot000D": Input/output error
Skipping unreadable variable "Boot000E": Input/output error
Skipping unreadable variable "Boot000F": Input/output error
Skipping unreadable variable "Boot0011": Input/output error
Skipping unreadable variable "Boot2001": Input/output error
Skipping unreadable variable "Boot2002": Input/output error
Skipping unreadable variable "Boot2003": Input/output error
Could not prepare Boot variable: Input/output error
grub2-install: error: efibootmgr failed to register the boot entry: Input/output error.
Comment 6 Dave Hodgins 2020-12-16 21:36:22 CET
My guess is that either the uefi firmware is corrupt, or the nvram being used
to store the uefi boot entries has failed.
Comment 7 papoteur 2020-12-16 22:38:30 CET
I updated the firmware, but the result is the same
Comment 8 Martin Whitaker 2020-12-17 00:15:39 CET
That's a machine that's known to have a seriously buggy UEFI BIOS. See bug 23180 for some past history, also Google turns up many problems.

If you need to keep Windows on the machine, this should work:

1. After booting to the Live desktop, mount the ESP (/dev/sda1).

2. In the ESP, move the file /EFI/Microsoft/Boot/bootmgfw.efi up one level to /EFI/Microsoft/bootmgfw.efi.

3. In the ESP, delete /EFI/mageia (left over from previous install attempts).

4. Run draklive-install. When you reach the bootloader selection dialogue, select rEFInd instead of GRUB2. At the next dialogue, select the option to install in /EFI/BOOT. That will install rEFInd as the EFI fallback bootloader.

It might fail at the end of installing the bootloader because it will attempt to write to the NVRAM variable that records which boot menu entry was last used, but you can ignore that (if it crashes the installer, let me know, and I'll try to make it ignore errors from that step).

I've been using this procedure on my HP laptop for years. The only annoyance is that sometimes a Windows update will restore the bootmgfw.efi file to its original location. On my laptop, that means it will default to booting straight into Windows. You should be able to temporarily override that in the BIOS, and once you've booted into Linux, move the file back again.
Comment 9 papoteur 2020-12-17 08:16:40 CET
Thanks Martin,
This workaround worked.
However, when installing, the Firmware Bug still occurs and draklive-install is stuck.
I think we have to avoid a crash in the first case and a stall in the second case.
I saw in Arch wiki [1] that sometimes, it can help to try efi_no_storage_paranoia as kernel parameter. I didn't try that. Is there any chance that this can help?
[1] https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface#Requirements_for_UEFI_variable_support
Comment 10 Martin Whitaker 2020-12-17 09:41:50 CET
That kernel option disables some checks that prevent the machine being bricked if the NVRAM is full. I wouldn't recommend using it.

Did you look for any /sys/firmware/efi/efivars/dump-* files, as they suggest?

Could you rerun the install, selecting rEFInd as above, and again capture the journal.log after the install is done, also the file /tmp/draklive-install.log. If  it's failing where I think it is, I should be able to make the installer handle the error and continue.
Comment 11 papoteur 2020-12-17 14:49:32 CET
(In reply to Martin Whitaker from comment #10)
> That kernel option disables some checks that prevent the machine being
> bricked if the NVRAM is full. I wouldn't recommend using it.
OK, I forget that.
> 
> Did you look for any /sys/firmware/efi/efivars/dump-* files, as they suggest?
Yes, but there is none
> 
> Could you rerun the install, selecting rEFInd as above, and again capture
> the journal.log after the install is done, also the file
> /tmp/draklive-install.log. If  it's failing where I think it is, I should be
> able to make the installer handle the error and continue.
OK, I will, but wait for one day or two.
Comment 12 papoteur 2020-12-18 17:56:25 CET
Created attachment 12108 [details]
Draklive-install.log after installation of refind

Installer is stuck during at least 10 min.
Comment 13 papoteur 2020-12-18 17:57:47 CET
Created attachment 12109 [details]
The journal of the same installation
Comment 14 Martin Whitaker 2020-12-18 19:45:18 CET
If you boot to the live desktop and enter the command

  od /sys/firmware/efi/efivars/BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c || echo failed

does it hang, and if not what is the output?

(I'm looking for a simple test for whether the efivars are working)
Comment 15 Martin Whitaker 2020-12-18 20:47:40 CET
Oh, and does it cause the "Firmware Bug" message to appear in journal?

Summary: draklive-install crashed => draklive-install crashed due to failure to write to EFI NVRAM

Comment 16 papoteur 2020-12-18 21:29:22 CET
Hi Martin,
With the od command, I get a list of 9 entries each of six digits. There is no bug reported in the journal.
Comment 17 Martin Whitaker 2020-12-18 22:11:29 CET
OK, thanks for testing. I can't think of another way to automatically determine that writing to the NVRAM is broken, so my only option is to add another checkbox to the GUI to enable/disable it.

You could try switching to GRUB2, enabling the option to install it in \EFI\BOOT, to see if that works any better, but I don't know whether it will find the Windows bootloader in its moved location.
Comment 18 Aurelien Oudelet 2020-12-18 22:21:57 CET
(In reply to Martin Whitaker from comment #17)
> OK, thanks for testing. I can't think of another way to automatically
> determine that writing to the NVRAM is broken, so my only option is to add
> another checkbox to the GUI to enable/disable it.
> 
> You could try switching to GRUB2, enabling the option to install it in
> \EFI\BOOT, to see if that works any better, but I don't know whether it will
> find the Windows bootloader in its moved location.

os-prober is our friend. Also, the UEFI does not care of /boot partition and ESP mounted on /boot/EFI/.

os-prober will look for EFI loader on FAT32 partitions and for other foreign OS elsewhere. And even if Windows one is moved one directory up or down, it will find it.

CC: (none) => ouaurelien

Comment 19 papoteur 2020-12-21 19:42:23 CET
(In reply to Martin Whitaker from comment #17)
> You could try switching to GRUB2, enabling the option to install it in
> \EFI\BOOT, to see if that works any better, but I don't know whether it will
> find the Windows bootloader in its moved location.
In this case, the installation goes at end, but grub doesn't see Windows. There is no entry for it at boot.
Comment 20 Martin Whitaker 2020-12-21 20:31:58 CET
os-prober is not our friend :-(

Anticipating this, I've added support to the installer and drakboot to configure rEFInd to store its variables on disk (in the ESP) instead of in NVRAM. The default is to use the NVRAM, so you need to uncheck the option when configuring the bootloader.

This will be in the next release of drakxtools. I'll make sure it is released before the next ISO build.
Comment 21 Lewis Smith 2021-02-03 15:25:38 CET
ping papoteur
Any change on this?
-------------------
(In reply to Martin Whitaker from comment #8)
> That's a machine that's known to have a seriously buggy UEFI BIOS. See bug
> 23180 for some past history, also Google turns up many problems.
I could not see where you identified the machine!
Does this warrant a mention in ERRATA? or RELEASENOTES?

CC: (none) => lewyssmith

Comment 22 papoteur 2021-02-03 19:21:50 CET
Hi Lewis,
The hardware is a HP Probook X360 G1 EE. Really a bad one.
Some keys of the keyboard are out. Webcam is dead. Battery keeps one hour. For a three years old hardware...
Comment 23 Martin Whitaker 2021-02-03 20:21:01 CET
@Lewis, I found the machine identity in the attached system log:

déc. 16 21:37:13 localhost kernel: DMI: HP HP ProBook x360 11 G1 EE/82EE, BIOS 01.09 04/10/2017

@Papoteur, if you have access to the 8-rc ISOs currently in QA, you can try unchecking the option for rEFInd to store its variables in NVRAM, which I hope will work around the problem with your BIOS.
Comment 24 Lewis Smith 2021-02-03 20:58:31 CET
If that helps, it looks ideal for RELEASENOTES.
Comment 25 papoteur 2021-02-06 08:06:31 CET
(In reply to Martin Whitaker from comment #23)
> @Lewis, I found the machine identity in the attached system log:
> 
> déc. 16 21:37:13 localhost kernel: DMI: HP HP ProBook x360 11 G1 EE/82EE,
> BIOS 01.09 04/10/2017
> 
> @Papoteur, if you have access to the 8-rc ISOs currently in QA, you can try
> unchecking the option for rEFInd to store its variables in NVRAM, which I
> hope will work around the problem with your BIOS.
Thanks Martin,
Just tried with the RC LIVE Plasma.
If I uncheck the option, all is fine.
If I keep it, draklive-install becomes unresponsive, using lot of CPU.
Aurelien Oudelet 2021-02-19 15:15:24 CET

Keywords: (none) => FOR_ERRATA8

Aurelien Oudelet 2021-02-19 15:15:43 CET

CC: (none) => fri

Comment 26 Morgan Leijström 2021-02-19 15:44:04 CET
Text suggestion for errata, check if i understand this:

=== Crash at end of install ===

{{bug|27838}} Some computers have buggy firmwares, causing writes to EFI NVRAM to fail at end of Mageia installation.

Workaround: When configuring the bootloader at end of the install, uncheck the option to store to NVRAM, and it will store to disk instead. (New for Mageia 8)



Question: does this also affect upgrades from mga7, how if so?
Comment 27 papoteur 2021-02-19 19:23:09 CET
Hi Morgan,
The workaround is furthermore to use rEFInd instead of grub and to check "Install in /EFI/BOOT"

I have tested only a new installation.
Comment 28 Morgan Leijström 2021-02-20 16:22:04 CET
OK thanks.

How do user select rEFInd?
Comment 29 Morgan Leijström 2021-02-20 16:22:33 CET
Do we have a screenshot of the dialogue in question?
Comment 30 Morgan Leijström 2021-02-20 18:53:25 CET
I dont have time to try myself now.
I entered per below, please check if OK:

https://wiki.mageia.org/en/Mageia_8_Errata#Crash_at_end_of_install

{{bug|27838}} Some computers have buggy firmwares, causing writes to EFI NVRAM to fail at end of Mageia installation.

Workaround: When configuring the bootloader at end of the install: Select rEFInd instead of GRUB, check "Install in /EFI/BOOT", uncheck the option to store to NVRAM  ''(New for Mageia 8)''

Keywords: FOR_ERRATA8 => IN_ERRATA8

Comment 31 Lewis Smith 2021-02-20 21:27:19 CET
Thanks once again Morgan for the ERRATA note; it looks fine to me if I understood correctly comments 23, 25, 27.

Component: RPM Packages => Installer
Assignee: bugsquad => mageiatools
CC: lewyssmith => (none)

Comment 32 Morgan Leijström 2021-03-03 10:03:53 CET
For Mageia 9 it would be nice if installer handle this more gracefully when failing; i.e popup telling the problem and go back to dialogue with the options for workaround preselected.

Target Milestone: --- => Mageia 9

Comment 33 Martin Whitaker 2021-03-03 21:39:18 CET
The installer doesn't know what the problem is. All it knows is that grub2-install failed. There are many things that can cause grub2-install to fail - it is very fragile.

Nice though it would be, adding code to the installer to diagnose BIOS and GRUB bugs would be a large and never-ending task.

Closing this as works-for-me, because it is a BIOS bug, not an installer one, and we have a workaround.

Resolution: (none) => WORKSFORME
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.