Bug 32653 - Surface Pro 9 stuck on error after grub with kernel 6.5.13-6
Summary: Surface Pro 9 stuck on error after grub with kernel 6.5.13-6
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal major
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-27 15:10 CET by christian barranco
Modified: 2024-03-29 21:21 CET (History)
5 users (show)

See Also:
Source RPM: kernel-6.5.13-6.mga9.src.rpm
CVE:
Status comment:


Attachments
full journal of the failed start (256.44 KB, text/plain)
2023-12-27 15:11 CET, christian barranco
Details
diff between Arch config file and MGA9 config file (104.01 KB, text/plain)
2024-02-24 19:25 CET, christian barranco
Details

Description christian barranco 2023-12-27 15:10:22 CET
Description of problem:
My Surface Pro 9 doesn't finish the start process after upgrading to kernel 6.5.13-6
After passing the grub screen, it remains stuck on a black screen mentioning ACPI Error: Could not disable RealTimeClock events (20230331/evxfevnt-243)
It does work well with kernel 6.5.11

Version-Release number of selected component (if applicable): 6.5.13-6


How reproducible: always


Steps to Reproduce: power on the machine, grub loads with kernel 6.5.13-6 and it gets stuck on the black screen with an ACPI error.

full journalctl is attached.
Comment 1 christian barranco 2023-12-27 15:11:41 CET
Created attachment 14233 [details]
full journal of the failed start
Comment 2 Dave Hodgins 2023-12-27 20:24:36 CET
I think it's due to the message
Activation via systemd failed for unit 'dbus-org.bluez.service': Refusing activation, D-Bus is shutting down.

I haven't seen that before, but a search finds
https://bbs.archlinux.org/viewtopic.php?id=155714

What does "systemctl status dbus-org.bluez.service? show?

CC: (none) => davidwhodgins

Comment 3 Dave Hodgins 2023-12-27 20:25:15 CET
Typo. What does "systemctl status dbus-org.bluez.service show?
Comment 4 christian barranco 2023-12-28 08:52:45 CET
Thanks Dave for your support.
The thing is I cannot do anything when this message shows up. I am stuck with a black screen. I don't get any login prompt / screen.
Fn keys don't bring me to any login prompt either.

So, I cannot run this systemctl check. Would you have any idea on how to be able to do so?
Comment 5 Dave Hodgins 2023-12-28 14:05:15 CET
Boot to run level 3 (aka multi-user.target) by appending " 3" without those
quotes to the kernel parameters.
Comment 6 Pascal Terjan 2023-12-28 16:46:07 CET
The journal is of a successful boot with kernel 6.5.11-desktop-5.mga9, and that bluetooth error message is during shutdown and seems harmless.

I guess it doesn't start booting enough to have the journal start.

Note that the ACPI error is also there when successfuly booting so this is also probably not related:
Dec 27 10:03:26 kernel: ACPI Error: Could not disable RealTimeClock events (20230331/evxfevnt-243)

CC: (none) => pterjan

Comment 7 christian barranco 2023-12-28 22:47:45 CET
Sorry @Pascal and @Dave, I have confused you with my journal...
How is it possible not to have a journal at all (I checked again indeed) with the same error message? As this message is in the middle of the 6.5.11 boot, why would it be different with 6.5.13.

Anyway, I got a firmware update by Windows (I am on dual boot).
None of the kernel boot anymore... I get the grub and then I am stuck on:

Loading Linux 6.5.11-desktop-5.mga9 ...
Loading initial ramdisk ...

and stuck for ever... marvelous...
Comment 8 Pascal Terjan 2023-12-28 23:33:53 CET
At the beginning of the boot everything is in memory.

Then disks are mounted, journal starts and dumps logs from early boot (which is why all the early messages have the same timestamp).

If things don't go as far as mounting your / you will not have anything recorded.
Comment 9 Giuseppe Ghibò 2023-12-29 00:28:00 CET
(In reply to christian squidf from comment #7)

> Sorry @Pascal and @Dave, I have confused you with my journal...
> How is it possible not to have a journal at all (I checked again indeed)
> with the same error message? As this message is in the middle of the 6.5.11
> boot, why would it be different with 6.5.13.
> 
> Anyway, I got a firmware update by Windows (I am on dual boot).
> None of the kernel boot anymore... I get the grub and then I am stuck on:
> 
> Loading Linux 6.5.11-desktop-5.mga9 ...
> Loading initial ramdisk ...
> 
> and stuck for ever... marvelous...

a) what was the latest working kernel beyond 6.4.x?

b) check your initrd images and if it's the case regenerate with dracut for the working kernel (preserving also the older one for debugging purpose)

c) is it possible that latest BIOS firmware update altered something in the boot configuration, e.g. maybe it re-enabled secure boot, so clearly it won't boot with non-signed images.

d) there is kernel-desktop-6.5.13-7.mga9 in backports/updates_testing, it has a little update for the surface (but not anything specific to some particular bug).

CC: (none) => ghibomgx

Comment 10 Dave Hodgins 2023-12-29 01:29:02 CET
Double check the uefi settings to see if the firmware update turned on
restricted (aka secure) boot.
Comment 11 Dave Hodgins 2023-12-29 01:31:27 CET
Also make sure that the hard drive controller is in ahci mode.
Comment 12 christian barranco 2023-12-29 12:36:21 CET
Hi. 
To try to summarize:
-Last kernel version which was working: 6.5.11-desktop-5

-The issue is now nothing works. I can boot on a usb flash drive with Mageia on it, but it won't load anything listed on the menu

-I think the issue now is linked to an update I got from Windows Update yesterday, preventing anything to load in memory before Windows 11 starts. It might be related to the cumulative update KB5033375; not sure though.

-I cannot roll it back; fully at least in order to get back to the previous state.

-UEFI BIOS options on a Surface 9 are very scare. The only thing I can do, besides setting the boot options, is to disable secure boot (which is) and secured core (which is as well).

So, right now, dead end...

Thanks all for trying to help with this messy nightmare!
Comment 13 christian barranco 2023-12-29 15:23:52 CET
Hi. More news.

I can boot on a Manjaro live ISO, written with ISO dumper.
Neither a Mageia 9 dvd install nor a Mageia 9 live iso can boot.

It looks like a Microsoft update has messed things and a solution might reside in kernel that contains NX support. According to 
https://github.com/linux-surface/linux-surface/issues/1162
kernel 6.6 might have these patches.
Not sure whether it is a coincidence, but Manjaro live uses kernel 6.6.8.

Anyway, even if a kernel 6.6 becomes available for MGA9, currently, I cannot boot on Mageia. It will be then a bit tricky to do the update.

Any hint?
Comment 14 Giuseppe Ghibò 2023-12-29 15:54:59 CET
(In reply to christian squidf from comment #13)
> Hi. More news.
> 
> I can boot on a Manjaro live ISO, written with ISO dumper.
> Neither a Mageia 9 dvd install nor a Mageia 9 live iso can boot.
> 
> It looks like a Microsoft update has messed things and a solution might
> reside in kernel that contains NX support. According to 
> https://github.com/linux-surface/linux-surface/issues/1162
> kernel 6.6 might have these patches.
> Not sure whether it is a coincidence, but Manjaro live uses kernel 6.6.8.
> 
> Anyway, even if a kernel 6.6 becomes available for MGA9, currently, I cannot
> boot on Mageia. It will be then a bit tricky to do the update.
> 
> Any hint?

Weird because x86_64 bit kernel 6.5.13-desktop-6.mga9 has the NX bit set, if you try with:

journalctl -b | grep -i 'execute disable'

you get 
<timestamp> localhost kernel: NX (Execute Disable) protection: active

so it should be something other, unless you booted with kernel parameter "noexec=off" which is what controls the NX bit.

Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and kernel-devel) from backports testing. To install it, just boot with manjaro as you did, then mount mageia as chroot (including /dev/, /sys/, /proc /dev/pts), chroot to it and install the kernel.
Comment 15 christian barranco 2023-12-29 16:00:20 CET
(In reply to Giuseppe Ghibò from comment #14)
> (In reply to christian squidf from comment #13)
> > Hi. More news.
> > 
> > I can boot on a Manjaro live ISO, written with ISO dumper.
> > Neither a Mageia 9 dvd install nor a Mageia 9 live iso can boot.
> > 
> > It looks like a Microsoft update has messed things and a solution might
> > reside in kernel that contains NX support. According to 
> > https://github.com/linux-surface/linux-surface/issues/1162
> > kernel 6.6 might have these patches.
> > Not sure whether it is a coincidence, but Manjaro live uses kernel 6.6.8.
> > 
> > Anyway, even if a kernel 6.6 becomes available for MGA9, currently, I cannot
> > boot on Mageia. It will be then a bit tricky to do the update.
> > 
> > Any hint?
> 
> Weird because x86_64 bit kernel 6.5.13-desktop-6.mga9 has the NX bit set, if
> you try with:
> 
> journalctl -b | grep -i 'execute disable'
> 
> you get 
> <timestamp> localhost kernel: NX (Execute Disable) protection: active
> 
> so it should be something other, unless you booted with kernel parameter
> "noexec=off" which is what controls the NX bit.
> 
> Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> kernel-devel) from backports testing. To install it, just boot with manjaro
> as you did, then mount mageia as chroot (including /dev/, /sys/, /proc
> /dev/pts), chroot to it and install the kernel.

Thanks Giuseppe for the feedback. The thing is, though, I cannot boot *at all* on MGA anymore. So, I can't try what you are suggesting, unfortunately.
Comment 16 Giuseppe Ghibò 2023-12-29 16:17:39 CET
(In reply to christian squidf from comment #15)

> > Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> > kernel-devel) from backports testing. To install it, just boot with manjaro
> > as you did, then mount mageia as chroot (including /dev/, /sys/, /proc
> > /dev/pts), chroot to it and install the kernel.
> 
> Thanks Giuseppe for the feedback. The thing is, though, I cannot boot *at
> all* on MGA anymore. So, I can't try what you are suggesting, unfortunately.

So you don't know how to mount a mageia chroot (which is not booting a mageia) from another booting distro? Trying to find if there is some doc ready about this on the wiki...
Comment 17 christian barranco 2023-12-29 16:59:26 CET
(In reply to Giuseppe Ghibò from comment #16)
> (In reply to christian squidf from comment #15)
> 
> > > Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> > > kernel-devel) from backports testing. To install it, just boot with manjaro
> > > as you did, then mount mageia as chroot (including /dev/, /sys/, /proc
> > > /dev/pts), chroot to it and install the kernel.
> > 
> > Thanks Giuseppe for the feedback. The thing is, though, I cannot boot *at
> > all* on MGA anymore. So, I can't try what you are suggesting, unfortunately.
> 
> So you don't know how to mount a mageia chroot (which is not booting a
> mageia) from another booting distro? Trying to find if there is some doc
> ready about this on the wiki...

Thanks for the hints. No, I don't. I will have a look as well. Can I do that from the Manjaro live session? 
But, right now, anyway, I am restoring Windows to its initial state, hoping the culprit update will vanish and will allow me to boot again on my already installed MGA configuration. Let us see...
Comment 18 Morgan Leijström 2023-12-29 17:03:29 CET
> Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> kernel-devel) from backports testing.

Just for testing, you can on another computer boot a Mageia9 Live with persistence, update it, and then try to boot it in the problematic system.

Live boot menu can choose to boot original or last updated kernel.
https://wiki.mageia.org/en/Persistent_live_systems#Original_kernel
(and also edit boot command line if desired, the common way)

CC: (none) => fri

Comment 19 Giuseppe Ghibò 2023-12-29 17:09:53 CET
(In reply to christian squidf from comment #17)
> (In reply to Giuseppe Ghibò from comment #16)
> > (In reply to christian squidf from comment #15)
> > 
> > > > Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> > > > kernel-devel) from backports testing. To install it, just boot with manjaro
> > > > as you did, then mount mageia as chroot (including /dev/, /sys/, /proc
> > > > /dev/pts), chroot to it and install the kernel.
> > > 
> > > Thanks Giuseppe for the feedback. The thing is, though, I cannot boot *at
> > > all* on MGA anymore. So, I can't try what you are suggesting, unfortunately.
> > 
> > So you don't know how to mount a mageia chroot (which is not booting a
> > mageia) from another booting distro? Trying to find if there is some doc
> > ready about this on the wiki...
> 
> Thanks for the hints. No, I don't. I will have a look as well. Can I do that
> from the Manjaro live session? 
> But, right now, anyway, I am restoring Windows to its initial state, hoping
> the culprit update will vanish and will allow me to boot again on my already
> installed MGA configuration. Let us see...

Yes, from that live session too. Haven't found anything ready on out wiki.
Try something like this on the booting host (the one you said it was able to
boot). Suppose /dev/sdb is the mageia disk and /dev/sdb2 is the boot partition, and /dev/sdb3 the root partition (with /usr), and /dev/sdb1 is the EFI vfat partition. If you have different partition then you need to adapt according to them:

mkdir /mnt/disk/
mount /dev/sdb3 /mnt/disk
mount /dev/sdb2 /mnt/disk/boot
mount /dev/sdb1 /mnt/disk/boot/EFI
mount --rbind /dev/ /mnt/disk/dev
mount --make-rslave /mnt/disk/dev
mount --rbind /dev/pts /mnt/disk/dev/pts
mount --make-rslave /mnt/disk/dev/pts
mount --rbind /proc /mnt/disk/proc
mount --make-rslave /mnt/disk/proc
mount --rbind /sys  /mnt/disk/sys
mount --make-rslave /mnt/disk/sys
mount --rbind /tmp  /mnt/disk/tmp

more or less. Then do:

chroot /mnt/disk

and you are root within mageia (no X11, which would require a more sophisticaled chroot), where you can download a package and install it with the commands you know.
Comment 20 christian barranco 2023-12-29 18:50:46 CET
So, wiping out Windows has not helped... 

Which might not help as is I have a LMV partitioning, with LUKS encryption.
/boot is not encrypted though.

I will give a try with the chroot and keep you posted.
Comment 21 Pascal Terjan 2023-12-30 01:54:28 CET
(In reply to Giuseppe Ghibò from comment #14)

> Weird because x86_64 bit kernel 6.5.13-desktop-6.mga9 has the NX bit set, if
> you try with:
> 
> journalctl -b | grep -i 'execute disable'
>
> 
> you get 
> <timestamp> localhost kernel: NX (Execute Disable) protection: active
> 
> so it should be something other, unless you booted with kernel parameter
> "noexec=off" which is what controls the NX bit.

I believe this is about something else, the state of the non executable memory handed over to the kernel by the EFI stuff, rather than what the kernel allocates/manages


There is a config option:

CONFIG_EFI_DXE_MEM_ATTRIBUTES: Adjust memory attributes in EFISTUB

UEFI specification does not guarantee all memory to be accessible for both write and execute as the kernel expects it to be. Use DXE services to check and alter memory protection attributes during boot via EFISTUB to ensure that memory ranges used by the kernel are writable and executable.

But we already have that option enabled.

I see also that this code changed recently https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11078876b7a6a1b7226344fecab968945c806832 but from the date that should have been already in the first 6.5
Comment 22 christian barranco 2024-02-18 22:06:36 CET
(In reply to Giuseppe Ghibò from comment #19)
> (In reply to christian squidf from comment #17)
> > (In reply to Giuseppe Ghibò from comment #16)
> > > (In reply to christian squidf from comment #15)
> > > 
> > > > > Anyway you might try with kernel 6.5.13-desktop-7 (just install kernel and
> > > > > kernel-devel) from backports testing. To install it, just boot with manjaro
> > > > > as you did, then mount mageia as chroot (including /dev/, /sys/, /proc
> > > > > /dev/pts), chroot to it and install the kernel.
> > > > 
> > > > Thanks Giuseppe for the feedback. The thing is, though, I cannot boot *at
> > > > all* on MGA anymore. So, I can't try what you are suggesting, unfortunately.
> > > 
> > > So you don't know how to mount a mageia chroot (which is not booting a
> > > mageia) from another booting distro? Trying to find if there is some doc
> > > ready about this on the wiki...
> > 
> > Thanks for the hints. No, I don't. I will have a look as well. Can I do that
> > from the Manjaro live session? 
> > But, right now, anyway, I am restoring Windows to its initial state, hoping
> > the culprit update will vanish and will allow me to boot again on my already
> > installed MGA configuration. Let us see...
> 
> Yes, from that live session too. Haven't found anything ready on out wiki.
> Try something like this on the booting host (the one you said it was able to
> boot). Suppose /dev/sdb is the mageia disk and /dev/sdb2 is the boot
> partition, and /dev/sdb3 the root partition (with /usr), and /dev/sdb1 is
> the EFI vfat partition. If you have different partition then you need to
> adapt according to them:
> 
> mkdir /mnt/disk/
> mount /dev/sdb3 /mnt/disk
> mount /dev/sdb2 /mnt/disk/boot
> mount /dev/sdb1 /mnt/disk/boot/EFI
> mount --rbind /dev/ /mnt/disk/dev
> mount --make-rslave /mnt/disk/dev
> mount --rbind /dev/pts /mnt/disk/dev/pts
> mount --make-rslave /mnt/disk/dev/pts
> mount --rbind /proc /mnt/disk/proc
> mount --make-rslave /mnt/disk/proc
> mount --rbind /sys  /mnt/disk/sys
> mount --make-rslave /mnt/disk/sys
> mount --rbind /tmp  /mnt/disk/tmp
> 
> more or less. Then do:
> 
> chroot /mnt/disk
> 
> and you are root within mageia (no X11, which would require a more
> sophisticaled chroot), where you can download a package and install it with
> the commands you know.

Hi.
So, I succeeded to update to kernel 6.6.14 in chroot but, still, it does freeze on a black screen after the grub.
I also created a live USB with persistent partition and updated it with kernel 6.6.14. Same thing: frozen on a black screen after the grub.
I confirm Manjaro boots from a live USB.
Any other idea?
Thanks
Comment 23 katnatek 2024-02-18 23:46:04 CET
(In reply to christian barranco from comment #22)
> 
> Hi.
> So, I succeeded to update to kernel 6.6.14 in chroot but, still, it does
> freeze on a black screen after the grub.
> I also created a live USB with persistent partition and updated it with
> kernel 6.6.14. Same thing: frozen on a black screen after the grub.
> I confirm Manjaro boots from a live USB.
> Any other idea?
> Thanks

Perhaps you can start by comparing the kernel options in manjaro 

cat /proc/cmdline

With the used by mageia in /boot/grub2/grub.cfg
Comment 24 Giuseppe Ghibò 2024-02-18 23:48:37 CET
What about 6.6.17-1.mga9 (updates_testing)? Is manjaro kernel signed with shim?
Comment 25 Giuseppe Ghibò 2024-02-19 18:37:02 CET
> Any other idea?

Further ideas could be that problem is not in kernel but in grub itself. We use grub2-2.06 in sync with FC (actually our current 2.06 version it's lagged by 10-15 patches from FC one, that are cherry-picked from grub upstream). Manjaro instead uses grub 2.12. Creating also a grub 2.12 mga package is not a trivial task as requires rebasing some patches (or try to build a vanilla grub-2.12).

Further alternative is trying to add some manual entry to let the working grub chainloading the mga one, or alternative to add entries to load from there the mageia vmlinuz and initrd (it's a bit complicate, but might work), or instead of grub try to use REFIind.

Another thing in kernel could regarding these keys, that we have't enabled by default:

CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
CONFIG_INTEGRITY_PLATFORM_KEYRING=y
CONFIG_INTEGRITY_MACHINE_KEYRING=y
CONFIG_LOAD_UEFI_KEYS=y

a version of kernel with those configs enabled is here:

https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/07034376-kernel/
Comment 26 christian barranco 2024-02-21 21:31:55 CET
(In reply to Giuseppe Ghibò from comment #24)
> What about 6.6.17-1.mga9 (updates_testing)? Is manjaro kernel signed with
> shim?

Hi. 6.6.17 doesn't boot either.

How do I check whether Manjaro kernel is signed with shim?


Regarding #25, actually, when I updated the Surface with chroot from Manjaro live, I inherited Manjaro grub, for some reasons. So, it might mean the answer is not there?


I will give a try to your copr version later in the week.

Thanks for all the support.
Comment 27 katnatek 2024-02-22 01:37:09 CET
(In reply to christian barranco from comment #26)
> (In reply to Giuseppe Ghibò from comment #24)
> > What about 6.6.17-1.mga9 (updates_testing)? Is manjaro kernel signed with
> > shim?
> 
> Hi. 6.6.17 doesn't boot either.
> 
> How do I check whether Manjaro kernel is signed with shim?
> 
Adapt this to majaro

zgrep CONFIG_INTEGRITY /boot/config-6.6.14-desktop-2.mga9 
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_INTEGRITY_AUDIT=y
Comment 28 christian barranco 2024-02-23 20:12:17 CET
(In reply to Giuseppe Ghibò from comment #25)
> 
> Another thing in kernel could regarding these keys, that we have't enabled
> by default:
> 
> CONFIG_INTEGRITY_SIGNATURE=y
> CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
> CONFIG_INTEGRITY_TRUSTED_KEYRING=y
> CONFIG_INTEGRITY_PLATFORM_KEYRING=y
> CONFIG_INTEGRITY_MACHINE_KEYRING=y
> CONFIG_LOAD_UEFI_KEYS=y
> 
> a version of kernel with those configs enabled is here:
> 
> https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/
> mageia-9-x86_64/07034376-kernel/

Hi Giuseppe. Your kernel-desktop-6.6.17-2.mga9-1-1.mga9.x86_64.rpm leads as well to a black screen after the grub.


Hi katnatek, booting with Manjaro, I don't get any config* file in /boot
Comment 29 Giuseppe Ghibò 2024-02-23 21:05:21 CET
Which means the CONFIG_INTEGRITY_* had no effects. You said that in some way mageia grub in the bootloader was overridden with manjaro grub. Maybe you could reinstall the mageia one?
Comment 30 christian barranco 2024-02-23 21:08:36 CET
(In reply to Giuseppe Ghibò from comment #29)
> Which means the CONFIG_INTEGRITY_* had no effects. You said that in some way
> mageia grub in the bootloader was overridden with manjaro grub. Maybe you
> could reinstall the mageia one?

I could but booting on a MGA9 persistent live ISO behaves in the same way. 
It has the Mageia GRUB.
Do you think it will change anything if I update it on the Surface itself?
Comment 31 Dave Hodgins 2024-02-23 21:51:35 CET
(In reply to christian barranco from comment #28)
<snip>

> Hi katnatek, booting with Manjaro, I don't get any config* file in /boot

Check "zgrep CONFIG_INTEGRITY /proc/config.gz".
Comment 32 Giuseppe Ghibò 2024-02-23 22:01:34 CET
(In reply to christian barranco from comment #30)

> Do you think it will change anything if I update it on the Surface itself?

It could narrow the possibilities.

Other ideas:

a) have you tried with booting using rEFIND instead of grub2?

b) have you checked the boot order in BIOS? What is?

c) what is the current firmware version and its timestamp? Could it be rolled back safely (just the firmware)?
Comment 33 christian barranco 2024-02-24 17:45:44 CET
(In reply to Dave Hodgins from comment #31)
> (In reply to christian barranco from comment #28)
> <snip>
> 
> > Hi katnatek, booting with Manjaro, I don't get any config* file in /boot
> 
> Check "zgrep CONFIG_INTEGRITY /proc/config.gz".

Thanks Dave.

It gives:
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
CONFIG_INTEGRITY_PLATFORM_KEYRING=y
CONFIG_INTEGRITY_MACHINE_KEYRING=y
# CONFIG_INTEGRITY_CA_MACHINE_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y

@Giuseppe:

>a) have you tried with booting using rEFIND instead of grub2?
Yes, no change

>b) have you checked the boot order in BIOS? What is?
It is Mageia first. Please, do remember, I do boot on the GRUB and get the GRUB menu. The issue is then it gets stuck on a black screen

>c) what is the current firmware version and its timestamp? Could it be rolled back safely (just the firmware)?
The issue was during a WINDOWS11 update. I have not been able to identify the culprit;
Comment 34 Giuseppe Ghibò 2024-02-24 17:53:27 CET
As looking around you seems not alone, someone other had similar problems though on older versions of the hardware like Surface 7. AFAIK they rolled back some hardware specific relative to firmware update.

BTW, what if you don't boot in graphical mode? e.g. appending at boot cmdline, "3 vga=normal nomodeset" just to see is not something related to graphics card.
Comment 35 christian barranco 2024-02-24 19:25:24 CET
Created attachment 14420 [details]
diff between Arch config file and MGA9 config file

I am not sure what to look for. Does it ring any bell to anyone?
Comment 36 katnatek 2024-03-06 02:20:02 CET
Try add noxconf to kernel options, in the QA test of kernel 6.6.14 my graphic card was switched to other driver and the presence of /etc/X11/xorg.conf did block the possibility to start graphic session
Comment 37 christian barranco 2024-03-06 20:38:25 CET
(In reply to katnatek from comment #36)
> Try add noxconf to kernel options, in the QA test of kernel 6.6.14 my
> graphic card was switched to other driver and the presence of
> /etc/X11/xorg.conf did block the possibility to start graphic session

Thanks but it is already there when booting from a LIVE usb and it doesn't boot either; any other idea?
Comment 38 Kristoffer Grundström 2024-03-06 21:44:54 CET
I don't mean to sound like a fool, but can you reproduce the same issue with another RPM based distro?

CC: (none) => lovaren

Comment 39 christian barranco 2024-03-07 21:29:14 CET
(In reply to Kristoffer Grundström from comment #38)
> I don't mean to sound like a fool, but can you reproduce the same issue with
> another RPM based distro?

Hi. Indeed, I have tested many distros:
- Manjaro live boots and installs
- Nebora live boots and installs
- boot-repair-disk boots 
- Solus live does NOT boot

Not sure what to do with that yet, though.
Comment 40 katnatek 2024-03-08 19:07:10 CET Comment hidden (obsolete)
Comment 41 katnatek 2024-03-08 19:08:02 CET
Sorry wrong tab
Comment 42 christian barranco 2024-03-29 20:25:32 CET
Hi. Here below is the feedback of a linux-surface kernel dev, based on a similar issue with Fedora, which seems to be solved. Does it ring any bell for someone? Apparently something to adjust in the Grub package?

"Its a difficult topic, and I don't even fully understand the issue myself
As far as we know, the issue is related to a security feature called NX mode
Essentially, NX mode means that memory should never be marked as writable and executable at the same time(modifié)
The UEFI API lets you change this for allocated memory, and the Surface UEFI has started to honor these flags
i.e. previously you could mark a memory region as readable and executable, and then still write to it. Now this is not possible anymore
However, this means that applications have to support NX mode and need to properly set the properties of the memory they allocate
Applications can indicate compatibility with NX mode through a special flag, the UEFI is supposed to read this flag and enable NX mode only if the application support it
unfortunately for us, the Surface UEFI seems to enable NX mode unconditionally
This is just the "setup" though
The actual issue is this: When the kernel gets loaded, normally NX mode wouldn't play a role, because it is not loaded by the UEFI but by GRUB.
However, Fedoras GRUB contains a patch that makes it honor the flag I just mentioned. If it is present, GRUB will load the kernel into memory that is not writable. But the kernel needs some sections of its memory to be writable!
E
20:05
On a normal UEFI, this wouldn't be an issue, because Fedoras shim and Fedoras GRUB don't indicate compatibility with NX mode to the UEFI, so it should allow writing to memory that is not marked as writable
On the Surface UEFI, this mode is enabled unconditionally, so writing to memory that isn't writable freezes the application for security reasons
20:07
We can fix this in the surface kernel by removing the flag that indicates NX compatibility from the kernel image. That means GRUB won't bother with NX and load the kernel into memory thats readable, writable and executable
20:08
So its a combination of a UEFI that enforces a feature against the UEFI spec, and a GRUB that tries to be compatible with said feature, but isn't."
Comment 43 Giuseppe Ghibò 2024-03-29 21:21:18 CET
You can try building your own local grub2 commenting patches from 0255 to 0258 in the file:

https://svnweb.mageia.org/packages/updates/9/grub2/current/SOURCES/grub.patches?revision=2052231&view=markup

and see how is going.

Note You need to log in before you can comment on or make changes to this bug.