Bug 33865 - After kernel update "kernel-desktop-6.6.65-2.mga9", boot freezes after message "i915 0000:00:02.0: [drm] VT-d active for gfx access"
Summary: After kernel update "kernel-desktop-6.6.65-2.mga9", boot freezes after messag...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-19 10:48 CET by pat dealt
Modified: 2024-12-29 16:59 CET (History)
3 users (show)

See Also:
Source RPM: kernel-desktop-6.6.65-2.mga9
CVE:
Status comment:


Attachments
Snapshot of screen log when it freezes (758.55 KB, image/jpeg)
2024-12-20 11:12 CET, pat dealt
Details

Description pat dealt 2024-12-19 10:48:19 CET
Description of problem:

After kernel update version 6.6.65-2, it is impossible to boot.
The last message displayed during boot is :

 "i915 0000:00:02.0: [drm] VT-d active for gfx access"

Then screen is frozen and only a HW poweroff is able to restart the computer.
This issue occurs even with failsafe option or any other kernel option.


This bug is similar to bug 33733 (with kernel-desktop-6.6.58-2.mga9).
Bug 33733 was fixed with microcode-0.20241112-1.mga9.nonfree.noarch release.

It seems that microcode is a missing dependency for kernel-desktop at least for i915 intel graphics modules.

There is probably an incompatibility between microcode-0.20241112-1.mga9.nonfree.noarch and kernel-desktop-6.6.65-2.mga9.
Comment 1 Lewis Smith 2024-12-19 20:53:10 CET
Thank you for the report.
Luckily your complaint is not universal, and I have just rebooted OK with that kernel, Intel graphics & the same driver:
Graphics:
  Device-1: Intel GeminiLake [UHD Graphics 600] driver: i915 v: kernel
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: intel,v4l dri: iris gpu: i915 resolution: 1366x768~60Hz
  API: EGL v: 1.5 drivers: iris,swrast platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.6 vendor: intel mesa v: 24.2.8 renderer: Mesa Intel UHD
    Graphics 600 (GLK 2)
  API: Vulkan v: 1.3.231 drivers: intel,llvmpipe surfaces: xcb,xlib

I presume you can boot your previous kernel to be able to carry on.
If you can, you should be able to attach to this bug the journal of a previous failed boot. Something like:
 # journalctl -b-x --no-hostname > file.txt
where x is how many boots back you want to go. Given that the boot fails, the journal should be small enough not to have to compress it; but do so if you wish.

Assigning directly to kernel; but please do try to attach an abortive journal.

Assignee: bugsquad => kernel

Comment 2 pat dealt 2024-12-20 11:12:55 CET
Created attachment 14816 [details]
Snapshot of screen log when it freezes

Thanks for your feedback.

To answer to your question : yes I can now boot with previous kernel 6.6.61-desktop-1.mga9 but only after microcode-0.20241112-1 update.
Otherwise I got the same issue.

To compare with your graphics configuration, here is mine :

[root@NUC12 ~]# inxi -G
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics driver: i915 v: kernel
  Display: server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: modesetting,v4l dri: iris gpu: i915 resolution: 1920x1080~60Hz
  API: OpenGL v: 4.6 vendor: intel mesa v: 24.2.8 renderer: Mesa Intel
    Graphics (ADL GT2)
  API: Vulkan v: 1.3.231 drivers: intel,llvmpipe surfaces: xcb,xlib
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
[root@NUC12 ~]# uname -r
6.6.61-desktop-1.mga9

I agree with you, it's not universal : last kernel update (6.6.65-2) works perfectly on my second NUC11TNHi3, not on this one NUC12WSHi5.

But NUC..i3 have one graphics module (UHD) and NUC..i5 have another one (iris XE). Based on your comment #1, I assume you get a UHD graphics module.

I also noticed in the log from NUC11..i3 no mention of "gfx access" was done as it is in NUC12 ..i5.

Regarding the log : There is no log recorded even with option loglevel 7.
I suspect the log is not yet in place or recorded when it crashes.

As you can see in this extract below there is a gap between times 9:46 (n-2 "working"boot) and 10:03 (n "working" boot).
No trace on the n-1 failed boot (~ 10:01).

>>

déc. 20 09:46:35.787669 Mandrake kernel: net-fw DROP IN=enp100s0 OUT= MAC=48:21:0b:51:44:77:38:07:16:18:46:23:86:dd SRC=fe80:0000:0000:0000:3a07:1613:3718:4623 DST=fe80:0000:0000:0000:5983:c62f:dd83:c5ef LEN=80 TC=0 HOPLIMIT=64 FLOWLBL=770645 PROTO=TCP SPT=53856 DPT=5357 WINDOW=64800 RES=0x00 SYN URGP=0
-- Boot 2803b83174e54df68f170b43534e77f4 --
déc. 20 10:03:15.978308 Mandrake kernel: microcode: updated early: 0x434 -> 0x435, date = 2024-06-03
déc. 20 10:03:15.978338 Mandrake kernel: Linux version 6.6.61-desktop-1.mga9 (iurt@rabbit.mageia.org) (gcc (Mageia 12.3.0-3.mga9) 12.3.0, GNU ld (GNU Binutils) 2.40) #1 SMP PREEMPT_DYNAMIC Thu Nov 14 15:07:14 UTC 2024
déc. 20 10:03:15.978351 Mandrake kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.61-desktop-1.mga9 root=UUID=e1c9030f-67e4-4495-8750-9199f30ac4a9 ro splash quiet
déc. 20 10:03:15.978361 Mandrake kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
déc. 20 10:03:15.978369 Mandrake kernel: BIOS-provided physical RAM map:
déc. 20 10:03:15.978377 Mandrake kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
déc. 20 10:03:15.978385 Mandrake kernel: BIOS-e820: [mem 0x000000000009e000-0x000000000009efff] reserved

<<

Anyway I just took a snashot of the screen (attachment #1 [details]) after freeze and give you the same log below at the same moment when it works with previous kernel.
>>

20.12.2024 10:03:15:993	kernel	Run /init as init process
20.12.2024 10:03:15:993	kernel	  with arguments:
20.12.2024 10:03:15:993	kernel	    /init
20.12.2024 10:03:15:993	kernel	    splash
20.12.2024 10:03:15:993	kernel	  with environment:
20.12.2024 10:03:15:993	kernel	    HOME=/
20.12.2024 10:03:15:993	kernel	    TERM=linux
20.12.2024 10:03:15:993	kernel	    BOOT_IMAGE=/boot/vmlinuz-6.6.61-desktop-1.mga9
20.12.2024 10:03:15:993	dracut	Mageia-9
20.12.2024 10:03:15:993	kernel	ACPI: bus type drm_connector registered
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] VT-d active for gfx access
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: vgaarb: deactivate vga console
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] Using Transparent Hugepages
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.29.2
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
20.12.2024 10:03:15:993	kernel	[drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
20.12.2024 10:03:15:993	kernel	ACPI: video: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
20.12.2024 10:03:15:993	kernel	input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input3
20.12.2024 10:03:15:993	kernel	fbcon: i915drmfb (fb0) is primary device
20.12.2024 10:03:15:993	kernel	fbcon: Deferring console take-over
20.12.2024 10:03:15:993	kernel	i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device

<<
Comment 3 Rubén Fernández 2024-12-20 21:30:18 CET
Hi, I may have encountered the same bug.
Just thirty minutes ago Mageia 9 couldn't boot. After Grub, it froze before reaching the login screen. I pressed 'escape' and saw this message:
"Failed to start systemd-sysctl.service"
I had to do a hard poweroff pressing the power button. When booting again, in Grub I chose kernel 6.6.61 and could boot. So the problem seems to be kernel 6.6.65.
I ran both "journalctl -b -1" and "journalctl -b -0" but it seems the abortive boot left no trace.
What info do you need to debug this? Is it the same bug at all?

CC: (none) => ruben33en-mandriva

Comment 4 pat dealt 2024-12-21 14:23:35 CET
Hello everybody,

Rubén, I'm not sure this is the same issue because by chance you've got an answer to esc keystroke and I don't : any keystroke is  ignored in my case (esc, CTRL+C, CTRL+D ...).

Anyway there are similarities such as 
 - the abortive boot left no trace
 - the need to push the power button to get out from this trap.

Nevermind, I think my issue is linked to the use of Intel Iris XE graphic card/module, is that your case? Whet version of microcode are you using?

Just keep in mind, when I released this 33865 bug, the global status was :
 - kernel 52, 58, 61 working
 - kernel 65 not working
In the meantime I made other investigations with those results :

I uninstall microcode-0.20241112-1.mga9.nonfree.noarch then reinstall it.

Now the status is :
 - only kernel 58 is working
 - kernel 52,  61, 65 not working
Strange isn'it!

I feel the problem is due to the regeneration of initrd-xxx.img supposed to be done during kernel installation and also via dracut during microcode installation.
This file is probably corrupted after kernel or microcode installation.
Any clue?
Comment 5 pat dealt 2024-12-26 13:24:59 CET
The graphics card (intel iris Xe) included in NU12i5 and i7 needs installation of this additional package :
kernel-firmware-nonfree-20240909-1.mga9.nonfree
Otherwise it won't boot.

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Comment 6 katnatek 2024-12-26 17:49:58 CET
(In reply to pat dealt from comment #5)
> The graphics card (intel iris Xe) included in NU12i5 and i7 needs
> installation of this additional package :
> kernel-firmware-nonfree-20240909-1.mga9.nonfree
> Otherwise it won't boot.

Giuseppe & draktools team, our tools should fetch the required package for the card

CC: (none) => ghibomgx, mageiatools

Comment 7 Giuseppe Ghibò 2024-12-27 12:09:09 CET
(In reply to katnatek from comment #6)
> (In reply to pat dealt from comment #5)
> > The graphics card (intel iris Xe) included in NU12i5 and i7 needs
> > installation of this additional package :
> > kernel-firmware-nonfree-20240909-1.mga9.nonfree
> > Otherwise it won't boot.
> 
> Giuseppe & draktools team, our tools should fetch the required package for
> the card

But the kernel-firmware-nonfree was already installed before the update to get the Xe card working? Or was removed during some upgrade by chance? In case of problems it's worthwhile to report in its own bug report with the PCI-id sigs, to have more chance of being analyzed.
Comment 8 pat dealt 2024-12-29 16:50:10 CET
You are right : at the initial mageia-9 installation, the package was there.
Then it was uninstalled. Don't know when and why. May be I did it by error.
Never mind, Thanks for your help.
Comment 9 Giuseppe Ghibò 2024-12-29 16:59:47 CET
logs might tell what you want to know. E.g.: journalctl --since '1 month ago' | grep \\[RPM\\] should show all the package installation/uninstallation, etc.

Note You need to log in before you can comment on or make changes to this bug.