Bug 27013 - Why do I have to “rescue” the boot sometimes?
Summary: Why do I have to “rescue” the boot sometimes?
Status: RESOLVED WORKSFORME
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Martin Whitaker
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2020-07-28 22:09 CEST by William Kenney
Modified: 2020-08-27 21:50 CEST (History)
1 user (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
report.bug.xz (176.29 KB, application/x-xz)
2020-07-30 18:23 CEST, William Kenney
Details
fit32_ls_boot.efi (54.84 KB, image/jpeg)
2020-07-31 20:12 CEST, William Kenney
Details
fit32_efibootmgr (77.81 KB, image/jpeg)
2020-07-31 20:13 CEST, William Kenney
Details
fit128_ls_boot.efi (88.82 KB, image/jpeg)
2020-07-31 20:14 CEST, William Kenney
Details
fit128_efibootmgr (87.04 KB, image/jpeg)
2020-07-31 20:15 CEST, William Kenney
Details
fit128_m7_ls_boot (192.10 KB, application/efi)
2020-08-01 00:04 CEST, William Kenney
Details
fit128_m7_efibootmgr (45.08 KB, image/jpeg)
2020-08-01 00:04 CEST, William Kenney
Details

Description William Kenney 2020-07-28 22:09:32 CEST
Description of problem:

The exact cause of this is still being researched. Assume you are only using removable USB media to boot from. And you are swapping back and forth between  different releases of Mageia and or drives. Lets say between M7.1 and Cauldron ( M8 ). If done in the right sequence ( sequence still being researched ) parts, or all of the /boot/EFI directory gets erased, changed or corrupted. What that does is the next time you try to boot the USB drive it's unbootable and the computer can't find a bootable drive.

Potential sources:

/boot/EFI/EFI/refind/refind.conf

/boot//EFI/REFIND/REFIND_X64.EFI 

/boot/EFI/mageia/grubx64.efi

It appears to not occur on Legacy boot system. If your always booting from the same internal, or external, drive you'll probably never see the problem.

The "rescue" function of the CI restores the necessary files and the drive will reboot properly. I believe I saw this when we were transitioning from M6 to M7.

even if the partition layout is the same, the UUID will be different, and the boot entry will no longer be valid causing the no-boot condition.
Comment 1 William Kenney 2020-07-29 01:41:28 CEST
test procedure:

Start with a bootable M8 x86_64 USB drive
Boots to a usable desktop
Existing directories /ETC & /etc
Both directories contain files
Power off system

Boot a previously bootable M7.1 x86_64 USB drive
Computer complains that it cannot find a bootable drive
Mount the non-bootable USB drive on the working M8 x86_64 USB drive
Existing directory /boot/ETC ( empty )
No /boot/etc

Using the M7.1 x86_64 CI, rescue option, the files in M7.1 x86_64 USB drive
/boot/ETC are restored and the USB drive is again bootable and operates correctly

It seems that going back and forth between the two releases causes something
to zero out some files, and directories, in the /boot directory

I've seen this happen on two completely different computers. Both were Intel based.
Comment 2 William Kenney 2020-07-29 01:41:49 CEST
I have a selection of USB drives with either bootable M7.1 or Cauldron(M8) installed on them. Example. Boot a M8 USB drive, boots and runs just fine, power that off. Boot a previously bootable M7.1 USB drive and the boot is gone and it won’t boot. Booting the M7.1 CI, invoking the rescue option, recovers the boot and then a reboot all is fine.

Seems to happen more when I go between releases then when I use two different USB drives both with M8 on it. I’ve seen this on two different platforms now. I think I saw this as far back as going between M6 & M7.
Comment 3 William Kenney 2020-07-29 06:28:39 CEST
Martin Whitaker  said:

The MBR is not used for EFI boot. The location of the bootloader (or
rather, a list of bootloaders) is stored in the EFI non-volatile RAM. To
see that list, run '/sbin/efibootmgr -v' on an EFI-booted system. For
example, I get:

% /sbin/efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0003,0001,0002,0004
Boot0000* rEFInd Boot Manager
HD(2,GPT,8e198e34-c707-4ceb-8950-1c7ba0d3bd7f,0x1000800,0x95822)/File(\EFI\REFIND\REFIND_X64.EFI)
Boot0001* UEFI:CD/DVD Drive     BBS(129,,0x0)
Boot0002* UEFI:Removable Device BBS(130,,0x0)
Boot0003* Hard Drive    BBS(HD,,0x0)..GO..NO........q.S.a.m.s.u.n.g.
.S.S.D. .9.7.0. .E.V.O. .P.l.u.s.
.5.0.0.G.B....................A...........................%8Q.Q;......4..Gd-.;.A..MQ..L.S.4.E.V.N.G.0.M.1.4.6.8.0.0.J........BO
Boot0004* UEFI:Network Device   BBS(131,,0x0)

This shows my preferred bootloader is rEFInd, located in
/EFI/REFIND/REFIND_X64.EFI on partition 2 of the GPT partitioned hard
disk with the partition UUID 8e198e34-c707-4ceb-8950-1c7ba0d3bd7f.

The remaining entries show the drives which will be searched for a
fallback bootloader (which is assumed to be stored in
/EFI/BOOT/BOOTX64.EFI in an EFI system partition).

So if your bootloader is being stored in /EFI/mageia/grubx64.efi,
efibootmgr should show an entry for that. But when you swap drives, even
if the partition layout is the same, the UUID will be different, and the
boot entry will no longer be valid.
Comment 4 Martin Whitaker 2020-07-29 10:00:28 CEST
As I tried to explain (maybe badly) in comment 3, what is being changed is the boot information stored in the EFI non-volatile RAM, not anything on the disk itself.

Following your bug 26761, I modified the installer so that it would install the bootloader in /EFI/BOOT if it detected you were installing on a removable device, which should avoid this problem. It seems that is not happening. Please attach the /root/drakx/report.bug.xz file from your most recent cauldron (M8) install so we can investigate that.

Assignee: bugsquad => mageia
CC: (none) => mageia

Martin Whitaker 2020-07-29 10:00:52 CEST

Keywords: (none) => NEEDINFO

Comment 5 William Kenney 2020-07-30 18:23:58 CEST
Created attachment 11776 [details]
report.bug.xz
Comment 6 William Kenney 2020-07-30 18:25:05 CEST
(In reply to Martin Whitaker from comment #4)
> Please attach the /root/drakx/report.bug.xz file from your most
> recent cauldron (M8) install so we can investigate that.

Attached.
Tell me if this works for you.
Comment 7 Martin Whitaker 2020-07-30 21:45:22 CEST
OK Bill, from the installer log it appears the installer did identify your disk as being removable. You can confirm that by looking in the /boot/grub2/install.sh file, which (according to the log) contains:

  grub2-install --removable

That should mean that the bootloader is installed in the ESP on that disk in /EFI/BOOT and that no entry was added to the EFI NVRAM. So, could you

1. Check that /EFI/BOOT/BOOTX64.EFI really does exist in the ESP.
2. With that disk plugged in, reboot and get into the BIOS boot menu^. Does your removable disk appear in the menu?

^ I think (courtesy of Google) that is done on your Dell by hitting F12 when booting. Unfortunately the key you need to press is different for each brand of BIOS, so it's hard to be sure. On my machines it's variously F8, F9, and ESC, but I don't have any Dell machines.

Note that if you didn't reformat the ESP, you could have a bootloader from an older install still left in /EFI/mageia. That should do no harm, but if all else fails, try deleting it.
Comment 8 William Kenney 2020-07-30 22:48:35 CEST
(In reply to Martin Whitaker from comment #7)

> Note that if you didn't reformat the ESP, you could have a bootloader from
> an older install still left in /EFI/mageia. That should do no harm, but if
> all else fails, try deleting it.

Ok, I think I'm going to move on to the next installer. Since I'm still using beta isos is the new method in the netinstall file(s) now?

Thanks for working with me.
Comment 9 William Kenney 2020-07-30 22:51:08 CEST
The latest netinstaller I have here is dated 7/16/20
Comment 10 William Kenney 2020-07-30 22:59:21 CEST
(In reply to Martin Whitaker from comment #7)
> Note that if you didn't reformat the ESP, you could have a bootloader from
> an older install still left in /EFI/mageia. That should do no harm, but if
> all else fails, try deleting it.

As a rule during this kind of testing I clean the target drive be it a USB, SSD or rotating disk to absolute zero. Using gparted I open the drive and delete all the partitions to zero so there is nothing left at all. It's only then that I start the install to that drive.
Comment 11 Martin Whitaker 2020-07-30 23:19:31 CEST
The netinstall ISO loads the stage2 installer from the repos, so you get the latest version. But that's the same version as is on the beta1 ISOs, so don't expect it to behave any differently.
Comment 12 William Kenney 2020-07-31 04:01:18 CEST
(In reply to Martin Whitaker from comment #7) 
> ^ I think (courtesy of Google) that is done on your Dell by hitting F12 when
> booting. Unfortunately the key you need to press is different for each brand
> of BIOS, so it's hard to be sure. On my machines it's variously F8, F9, and
> ESC, but I don't have any Dell machines.

I have two Dell laptops. A Vostro circa 2011, and an Inspiron circa 2020.
During boot if you repeatedly press the F12 key that will present you with
a boot menu listing the various drives that the BIOS sees as bootable.

In order to prevent any damage to the internal 256GB M.2 PCIe NVMe SSD
I have turned off the internal hard disc controller. So far so good.
WIth all the fussing around with removable USB boot drives I've as of
yet not damaged or written anything to that drive. That contains a
legitimate that came with the laptop Win10 Home Edition that I'd like to preserve. I can turn off the internal HD controller in both laptops.

If on boot the BIOS does not see a bootable device the laptop SCREEMS

BBBBBEEEEEEEEPPPPPPPP!!!!!!!!

and throws up a warning page that it can't find a bootable device.
Even though I know its coming it still makes me jump.
Comment 13 Martin Whitaker 2020-07-31 09:22:54 CEST
So, as I understand it, if you boot the installer in UEFI mode and do a clean install onto the removable drive, then reboot, all is well. At that point, without having made any changes, what is the output from

  ls -lR /boot/EFI

and

  /sbin/efibootmgr -v

If you now power down the laptop, does it still boot on power up?
Comment 14 William Kenney 2020-07-31 20:12:47 CEST
Created attachment 11777 [details]
fit32_ls_boot.efi
Comment 15 William Kenney 2020-07-31 20:13:22 CEST
Created attachment 11778 [details]
fit32_efibootmgr
Comment 16 William Kenney 2020-07-31 20:14:13 CEST
Created attachment 11779 [details]
fit128_ls_boot.efi
Comment 17 William Kenney 2020-07-31 20:15:29 CEST
Created attachment 11780 [details]
fit128_efibootmgr
Comment 18 William Kenney 2020-07-31 20:17:16 CEST
Two USB FIT drives. One 128GB the other 32GB

The 128GB drive has been used a few weeks now and I can't remember
if ever having to have its boot restored.

The 32GB drive I completely wiped it to zero using gparted then
using netinstall, and an up-to-date-today repo, reinstalled
M8 from the ground up. I have attached four screen shots.

As of today I can go back and forth between these two drives,
they both boot properly, and I have not had to reinstalled the boot.
I believe in the past I've had to recover the boot on the FIT 32 drive
several times.
Comment 19 Martin Whitaker 2020-07-31 21:33:36 CEST
OK, that all looks to be working as intended. I think the "mageia" entry shown by efibootmgr is left over from one of your earlier installs (or possibly from use of the rescue system - I haven't checked whether the rescue system has been updated to handle removable drives).

This will of course only work if you are using the latest installer code - so not with alpha1 ISOs or with Mageia 7. If you want to put Mageia 7 on a removable drive, you should select the rEFInd bootloader and check the option to install it in /EFI/BOOT.
Comment 20 William Kenney 2020-07-31 22:58:35 CEST
So help me understand something.
Has all this effort resulted in something positive?
I'm probably one of the few people that use removable media this way.
IMO platforms with USB 3.x ports and the new ultrafast USB drives kinda
negate a reason for having an internal HD.
The performance on this new laptop is more the adequate for most everyone
even though the drive is this little bitty USB drive.
Comment 21 William Kenney 2020-08-01 00:04:07 CEST
Created attachment 11781 [details]
fit128_m7_ls_boot
Comment 22 William Kenney 2020-08-01 00:04:45 CEST
Created attachment 11782 [details]
fit128_m7_efibootmgr
Comment 23 William Kenney 2020-08-01 00:05:05 CEST
Here's M7 on a FIT128. It had to be rescued before it would boot.
Comment 24 Martin Whitaker 2020-08-01 08:46:19 CEST
(In reply to William Kenney from comment #23)
> Here's M7 on a FIT128. It had to be rescued before it would boot.

As expected - see comment 19.

When you install or rescue a bootloader in /EFI/mageia, a "mageia" entry is added in the EFI NVRAM, as shown in the output from efibootmgr. When you install or rescue a bootloader in /EFI/mageia on a different disk, the "mageia" entry is overwritten with the new bootloader details. As the entry includes the partition UUID, it will no longer work with the previous disk.

A UEFI BIOS does not know to look in /EFI/mageia for a bootloader unless there is a valid NVRAM entry that tells it to look there. Without that, the only place it will look is /EFI/BOOT. That is why you have to install the bootloader in /EFI/BOOT on a removable disk.
Comment 25 Martin Whitaker 2020-08-01 08:59:06 CEST
(In reply to William Kenney from comment #20)
> So help me understand something.
> Has all this effort resulted in something positive?
> I'm probably one of the few people that use removable media this way.

I suspect you are the only person who will disable their internal disk before doing so.

If I wanted to use removable media, I wouldn't have any of this trouble, because I use rEFInd as my boot manager. rEFInd is installed on my internal disk. Once installed, it is never reinstalled or updated when I install Mageia on different partitions or on different disks. That is because it automatically scans for bootable images each time it starts, and generates the boot menu on the fly. So for example, if I insert a USB stick containing a CI or Live ISO, that will automatically appear in the boot menu.
Comment 26 William Kenney 2020-08-04 19:25:08 CEST
(In reply to Martin Whitaker)

Four M8 Media:

Samsung FIT Plus USB 3.1 128GB ( /boot/EFI & /boot/efi, M8 x86_64 Plasma )
Samsung FIT Plus USB 3.1  32GB ( /boot/EFI & /boot/efi, M8 x86_64 Plasma )
Generic SSD USB 3.1 128GB ( /boot/EFI & /boot/efi, M8 x86_64 Plasma )
SanDisk Ultra Fit USB 3.1 16GB, Mageia-8-beta1-Live-Plasma-x86_64.iso ( /boot/ )
  /boot/
  /boot/dracut/ ( empty )
  /boot/config-5.7.9-desktop-1.mga8
  /boot/initrd-5.7.9-desktop-1.mga8,img
  /boot/symvers-5.7.9-desktop-1.mga8,xz
  /boot/System.map-5.7.9-desktop-1.mga8
  /boot/vmlinuz-5.7.9-desktop-1.mga8


I can now loop through all four of these media, in any sequence, and not need to rescue the BOOT on any of them.
Comment 27 Aurelien Oudelet 2020-08-27 17:52:59 CEST
This should be closed.

(In reply to William Kenney from comment #26)
> 
> 
> I can now loop through all four of these media, in any sequence, and not
> need to rescue the BOOT on any of them.

Closing as WORKSOME.

Here seems there is good explanation:
(In reply to Martin Whitaker from comment #24)
> 
> When you install or rescue a bootloader in /EFI/mageia, a "mageia" entry is
> added in the EFI NVRAM, as shown in the output from efibootmgr. When you
> install or rescue a bootloader in /EFI/mageia on a different disk, the
> "mageia" entry is overwritten with the new bootloader details. As the entry
> includes the partition UUID, it will no longer work with the previous disk.
> 
> A UEFI BIOS does not know to look in /EFI/mageia for a bootloader unless
> there is a valid NVRAM entry that tells it to look there. Without that, the
> only place it will look is /EFI/BOOT. That is why you have to install the
> bootloader in /EFI/BOOT on a removable disk.

Status: NEW => RESOLVED
Resolution: (none) => WORKSFORME

Comment 28 William Kenney 2020-08-27 19:14:27 CEST
(In reply to Aurelien Oudelet from comment #27)
> This should be closed.

I agree, but.....

I've worked this pretty hard in the last weeks and I am totally convinced that so long as I am working with an M8 install that was created after Martin made the code change to the installer. That created:

/boot
 /EFI
   /EFI/BOOT/BOOTV64.EFI
   
There is no problem at all. Any and all removable media I have used can be booted in any order and never has to be rescued. So if the sequence is:

Internal SSD Win10 -> 32GB USB M8 -> 128GB USB M8 -> 128GB USB SSD M8 -> 16GB SD chip M8 and on and on in any random circle there's never a need to rescue the M8 boot.

But introduce any M7.1 USB drive be that a USB drive or USB SSD drive the boot of that media will have to be rescued every time. Once rescued the M7.1 media will work just fine but if I power off and go to another M7.1 USB media then that new media boot will need to be rescued.

At anytime using a M7.1 USB media if I go back to any of the M8 media there is never a need to rescue that media. So the installer fix that Martin applied as described in Comment 19

/boot
  /EFI
    /EFI/BOOT/BOOTV64.EFI
    
I think it's linked
  
Appears to have completely eliminated this issue. And I agree that this bug should be closed.

But is there anyway to fix the situation in M7.1 or is the installer there cast in stone?
And or if there is an existing install can it somehow be modified with the new boot code?

Thanks
Comment 29 Martin Whitaker 2020-08-27 20:55:00 CEST
Certainly the 7.1 ISOs are set in stone. The stage2 installer used by the netinstall ISOs is frozen, although it would be updated for critical bugs. I don't think this counts - you are doing something fairly unusual.

If you want to fix your 7.1 installs, select rEFInd instead of GRUB2 as your bootloader, and when you do, select the option "Install in \EFI\BOOT".
Comment 30 William Kenney 2020-08-27 21:50:08 CEST
(In reply to Martin Whitaker from comment #29)

> If you want to fix your 7.1 installs, select rEFInd instead of GRUB2 as your
> bootloader, and when you do, select the option "Install in \EFI\BOOT".

Thanks Martin.

I only swap boot drives on one platform that being my Dell laptop.
But that is were I do all my testing and I'm swapping in and out drives
all the time.

I'll give this a try.

Note You need to log in before you can comment on or make changes to this bug.