Description of problem: Radeon-firmware prior to 20240811 is not sufficient to initialize the 890m GPU, firmware after 20240811 either cause amdgpu to hang or fails to load resulting in a black screen. Version-Release number of selected component (if applicable): How reproducible: Always. Steps to Reproduce: 1. Install Mageia 9 on Ryzen AI 9 XH 370 (a Beelink SER9 in this case) 2. Upgrade to Cauldron. 3. Install kernel 6.10 or later, in this case using 6.11.4 and building using "make oldconfig" and use default choices. 4. Generate radeon-firmware-20240811 RPM using firmware from linux-firmware.git. In this state the system works and boots correctly with full 3D acceleration, newer firmware hangs on boot or leaves a black screen, older firmware reports "unable to initialize GPU". Once the newer firmware has been installed, either the system can be booted with the 6.6.57 kernel or an ssh session can be used to revert back to 20240811. The point of this seemingly ridiculous bug report is that the distributions that roll out a required kernel (6.10+), mesa (2.4.2 +) and firmware 20240811+, usually come with firmware after 20240811 and boot into a black screen making installation difficult, if not impossible. Distributions I tried include: Manjaro, bazzite, and Ubuntu 24.10. I would have filed a bug report with the firmware maintainers but there was no provision to do so. I don't expect much from Mageia to resolve this issue other than have a potential work around available should a bug report be filed for this hardware. I also wanted a way for Mageia users to get this hardware working.
In a surprising coincidence I noticed that Mageia updated all the firmware to the latest release for Cauldron. In a not surprising coincidence, the radeon-firmware-20240909 did not work with the Ryzen AI HX 370 processor and it was necessary to revert back to 20240811. It's good to see Cauldron up to date firmware-wise though.
(In reply to Alan Richter from comment #0) > Description of problem: > Radeon-firmware prior to 20240811 is not sufficient to initialize the 890m > GPU, firmware after 20240811 either cause amdgpu to hang or fails to load > resulting in a black screen. > <snip > > I don't expect much from Mageia to resolve this issue other than have a > potential work around available should a bug report be filed for this > hardware. I also wanted a way for Mageia users to get this hardware working. CC'ing Morgan who has a talent for putting such things in the wiki. Assigning to our kernel and drivers maintainers, in case they can come up with a solution. I'm reluctant to mark this as an upstream bug, even if you hit this issue in several distributions, because I understand from comment #0 that filing a bug report upstream is impossible (I don't have time to check)
Assignee: bugsquad => kernelCC: (none) => fri, marja11
If many dristros suffer, this really looks like fix should be done in some upstream. Really nothing coming? - i.e for next firmware version? Our users generally can not compile themselves, nor handle SSH. I do not find version 20240811 in our repos? Maybe we can provide it so users can downgrade to it? Can the affected system boot to command line? At least after entering some parameter in grub menu? Can it boot to desktop by specifying some simple driver? - easier for inexperienced users then.
(In reply to Morgan Leijström from comment #3) > If many dristros suffer, this really looks like fix should be done in some > upstream. > > Really nothing coming? - i.e for next firmware version? On the Linux Firmware site: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ There have been two broken releases 20240909 and 20241017, neither work with the Ryzen AI HX 370 (or however it's spelled), 20240709 doesn't possess the firmware needed for the 370 so 20240811 is it. > > Our users generally can not compile themselves, nor handle SSH. > I do not find version 20240811 in our repos? > Maybe we can provide it so users can downgrade to it? Mageia is unique in that they split off radeon-firmware from the rest of linux-firmware (non-free) which makes it easier to keep various versions of radeon-firmware around. It's not terribly difficult to make a new radeon-firmware, get the firmware from the above site, copy over *radeon* *amdgpu* and WHENCE to radeon-firmware-YYYYMMDD, scoop it up into a "tar.xz", fix the SPEC file, do a rpmbuild -ba and it's just there, super magic. Of course if enabling ssh during install is a toughie then building a new RPM from a tarball off of git might also be a little too hard. > > Can the affected system boot to command line? Negative, after booting, "modprobe amdgpu" hangs and never returns, it simply runs away at 100% leaving the user with a blank black screen. Since amdgpu is uninitialized, ctrl-alt-F3 doesn't work; if ssh isn't enabled, there's no way to access the system. > At least after entering some parameter in grub menu? nomodeset works by preventing amdgpu from getting loaded but the system isn't much fun to play with. > Can it boot to desktop by specifying some simple driver? - easier for > inexperienced users then. Mageia 9 and Cauldron will boot into a VESA screen which isn't much fun but is more usable than a black screen; this is due to cauldron sticking with the 6.6.XX kernel, the 370 requires: Linux 6.10, Mesa 24.2.X and 20240811 (later is broken) to work with the 890M video. I've opened a forum on the beelink site with more things I've found about this problem. I realize that this chip is extremely new but it does pack the strongest IGPU available, it's the first hybrid AMD chip four Zen 5 cores and eight Zen 5c cores and it's really, really efficient. This is the link to the beelink forum (no login required): https://bbs.bee-link.com/d/1556-linux-on-ser9
After getting the 890m video working on Cauldron, getting it going in Mageia 9 has been more elusive. In Cauldron I have full hardware but in mga9 it appears that glamor won't start. From /var/log/Xorg.log I see the following differences: On Cauldron (happy) (II) Loading sub module "glamoregl" (II) LoadModule: "glamoregl" (II) Loading /usr/lib64/xorg/modules/libglamoregl.so (II) Module glamoregl: vendor="X.Org Foundation" compiled for 1.21.1.13, module version = 1.0.1 ABI class: X.Org ANSI C Emulation, version 0.4 (II) AMDGPU(0): glamor X acceleration enabled on AMD Radeon Graphics (radeonsi, gfx1150, LLVM 17.0.6, DRM 3.59, 6.11.5) (II) AMDGPU(0): glamor detected, initialising EGL layer. On mga9 (not happy) (II) Loading sub module "glamoregl" (II) LoadModule: "glamoregl" (II) Loading /usr/lib64/xorg/modules/libglamoregl.so (II) Module glamoregl: vendor="X.Org Foundation" compiled for 1.21.1.13, module version = 1.0.1 ABI class: X.Org ANSI C Emulation, version 0.4 (II) AMDGPU(0): Refusing to try glamor on llvmpipe (EE) AMDGPU(0): glamor detected, failed to initialize EGL. (WW) AMDGPU(0): amdgpu_glamor_pre_init returned FALSE, using ShadowFB I went ahead and enabled updates_testing to get a fresher mesa and my best guess is that a newer llvm may be required for the 890m. The SPEC file between mga9 and mga10 are virtually identical except for the changelogs. I did have to downgrade radeon-firmware to 20240811 otherwise the system hung when initializing amdgpu and never got to multi-user mode.
I narrowed the problem down to a single file: /usr/lib/firmware/amdgpu/dcn_3_5_dmcub.bin If it's 479008 bytes, it's good, if it's 479648, it's bad. Just six hundred extra bytes and the machine is munched.
As a proof of concept, I did a Manjaro-kde-24.1.1-241011 install, using "nomodeset" on the linux command line when booting to the installer, finished the install, booted for the first time again using "nomodeset", replaced the dcn_3_5_dmcub.bin file with the one from 20240811, ran mkinitcpio -P to regenerate initramfs then rebooted without "nomodeset" and the system had full hardware acceleration. This one bad file prevents linux from booting in some cases and in others leaves a black screen and runaway processes on others. I'm only posting because this is one place where I can share what I consider important information about a single 400KB file that prevents Linux from booting on a new and very interesting piece of hardware, strix-point.
So to resume, for the dcn_3_5_dmcub.bin file we have: - 20240811: 479008 bytes (md5sum: ba152d787495238b0f6b8c77752d865b) OK - 20240909: 479468 bytes (md5sum: 0daa55c14259febd4e6200e80c1b9644) BROKEN - 20241017: 482848 bytes (md5sum: 38110eb806722e8717e89922546147a9) BROKEN what about: - 20241110: 483104 bytes (md5sum: 579ce8bd7d0d0542cef4756d1c590ce5) ??? (file: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/dcn_3_5_dmcub.bin). does it work? Or broken too?
CC: (none) => ghibomgx
I built a radeon-firmware from the 20241110 tarball, installed it and the system hard-locked on boot, the NIC never came up and of course the screen was black so there was no way to interact with the system. I removed radeon-firmware-202411110 and rebooted the system, as expected there was a lot of complaints about firmware failing to load. I then installed radeon-firmware-20241110, did a rmmod amdgpu then modprobe amdgpu. The system stayed up, the last line of demsg was: [drm] DMUB hardware initialized: version=0x09000800 but modprobe hung at 100% CPU and as expected the screen was black. Attempting to reboot failed, the NIC stayed up with an active ping but sshd was stopped so the system had to be hard reset. Replacing dcn_3_5_dmcub.bin with the file from 20240811 returns the system to full functionality with full hardware support. There was some speculation on the beelink site that AGESA in the BIOS was old or out of date but I have the latest BIOS installed on my SER9 which is dated 9/10/2024. I do not know if this affects other strix-point systems or if it's unique to the beelink SER9.
In addition, I have successfully gotten the latest Manjaro and Ubuntu distributions working on my SER9 and have heard that a Mint user also got his system working once the dcn_3_5_dmcub.bin file was replaced.
Please check with radeon-firmware-20240909-2.mga{9,10} and check whether it works correctly (e.g. glmark2 score, etc.). It reverted both dcn_3_5{,_1}_dmcub.bin to release v0.0.227.0. Apparently no version beyond v0.0.227.0 seems working correctly.
Steps taken: urpme radeon-firmware (the one I made) sudo urpmi radeon-firmware (from cauldron) rpm -qi radeon-firmware Name : radeon-firmware Version : 20240909 Release : 2.mga10.nonfree Architecture: noarch Install Date: Mon 11 Nov 2024 01:30:10 PM MST Group : System/Kernel and hardware Size : 100095776 License : Freeware Signature : RSA/SHA256, Mon 11 Nov 2024 12:18:02 PM MST, Key ID b742fa8b80420f66 Source RPM : radeon-firmware-20240909-2.mga10.nonfree.src.rpm Build Date : Mon 11 Nov 2024 12:17:19 PM MST Build Host : localhost Packager : ghibo <ghibo> Vendor : Mageia.Org URL : http://ati.amd.com/ Summary : ATI/AMD Radeon firmware files Power off Unplug Wait a bit. Turn on and . . . the system comes up correctly with full hardware acceleration. reboot and the system comes up correctly with full hardware acceleration. From my viewpoint, this fixes the problem.
what is the score you get with glmark2?
glmark2 returns 17530, Gravity Mark 1600x900 windowed Vulkan returned a score of 8357, FPS of 50.0. This sure isn't llvmpipe.
any improvement on it using gamemoderun?
Negative there, gamemoderun results in a score of 17261. I have been playing Days Gone and there is a definite uplift from gamemoderun which has been keeping the game from running on the zen 5 compact cores. I got into the BIOS and reset everything to "Default Settings" and re-ran glmark (not in gamemoderun) and got: 17415. Just as a comparison, I ran glmark2 on GMKtec NucBox K8 (AMD 8845HS, with 780m) and got 17158; albe
... continued. albeit on Mageia 9, not cauldron.
... more continued. For thoroughness, I ran glmark on the ubuntu 24.10 installation and got 14158 but that was using Xwayland and not Xorg.
8845HS and 370HX seems pretty close. Considering that the HX 730 has the LPDDR5 7500 surface mounted, while the 8845HS has probably the DDR5-5600 in some slot. Actually the 890M is probably the fastest integrated card available.
But on mga9 with radeon-firmware-20240909-2.mga9.nonfree does it boot and start?
If starting from the mga9 ISO, it is possible to install but once a post 6.10 kernel is built, mesa and radeon-firmware-20240909 is installed, the system hangs on boot when amdgpu initializes. Once the dcn_3_5_dmcub.bin is replaced with the one from 20240811, mga9 comes up in full resolution BUT there is not hardware acceleration; vkcube won't come up complaining about DRM3 missing and glxinfo says OpenGL renderer string: llvmpipe. I tried using systemd Xorg and several other things but I could not get hardware acceleration on mga9. Cauldron works fine though. As for the 890M, it's a RDNA 3+ chip while 780 on the hawk point is an RDNA 3 and my 6850U 680M is RDNA 2. The "+" differentiates the 890m from the 780M and according to Phoronix the 890m needed new PSP and DMCUB firmware that he got from AMD. Reference: https://www.phoronix.com/review/amd-ryzen-ai-9-hx-370
But the kernel in mga10 (cauldron) where you get it working with acceleration was 6.6.x (e.g. 6.6.60-desktop1) or a custom 6.11? (6.10 is EOL)?
The requirements for the 890M are: Kernel 6.10+ Mesa 24.2+ Firmware 20240811 (ONLY) So for the kernel I just went to kernel.org, got the latest (I'm running 6.11.7), did a make oldconfig (default responses to questions), make -j $(nproc) (it took about 11 minutes), sudo make modules_install; sudo make install.
what is 'lspcidrake -v | grep VGA' output for that chip?
lspcidrake -v | grep VGA returns nothing, however lspcidrake -v | grep -i display returns: amdgpu : Advanced Micro Devices, Inc. [AMD/ATI]|Device 150e [DISPLAY_OTHER] (vendor:1002 device:150e subv:1f66 subd:0030) (rev: c1)
Ok, it's that. Thanks.
A thread regarding this issue has been started on gitlab.freedesktop.org, the heavies at AMD are aware of this problem. Here is the link. https://gitlab.freedesktop.org/drm/amd/-/issues/3744#note_2666245
Thank you Alan for the hunting. Upstream aware -> setting UPSTREAM
Status comment: (none) => Upstream see c27Keywords: (none) => UPSTREAM
After engaging AMD and having them look into the problem, it appears that the BIOS on the SER9 is misconfigured such that it does not correctly configure the muxes and amdgpu hangs in DCN. From Mario Limonciello's post in this thread: https://gitlab.freedesktop.org/drm/amd/-/issues/3744#note_2666245 ------ FWIW we did manage to track down one of these and dig into the exact details of what is wrong. Their BIOS isn't configuring muxes properly, and it causes a hang in DCN. The DMCUB firmware that is loaded to DCN hardware isn't unique to Linux, it's the exact same binary used by Windows as well and thus a Windows driver with the same DMCUB firmware binary will also be affected. The proper solution is for them to fix the BIOS. ------- Thus since this issue appears to only be limited to the Beelink SER9, it would probably be prudent to revert Cauldron radeon-firmware to its unmodified state. If Beelink fixes their BIOS then the modification won't be necessary, if they do not then no up-to-date Linux will run on the SER9 and it will be relegated to Windows with only their drivers. I apologize for sending the crew on the fruitless pursuit of an untamed ornithoid, it's just this was one very strange and severe bug.
Nice, Alan. So it seems the cause is in BIOS of Beelink SER9. https://bbs.bee-link.com/d/1556-linux-on-ser9/54 + next post Possibly they may foul up more of their products if they are not aware. I think someone should contact bee-link https://www.bee-link.com/pages/contact-us and link to forum. - because they may not check forum.
There haven't been many updates on the resolution to this problem other than this issue is also affecting the Minisforum EliteMini AI370 which is packing the same processor. Mario L. from AMD says that the muxes aren't being correctly initialized correctly by the BIOS. The problem has been brought to the attention of both Beelink and Minisforum. The biggest difference, it appears, is that the mini computers don't use e-DP for display like laptops do. Considering that the AI370 and SER9 are the flagship products for these two manufacturers and neither will boot an adequately mature Linux without substituting that one 400KB firmware file.
So it seems this is pretty stalled upstream. So it affects two major vendor of these minipc(s). I wonder if there are others. Generally a BIOS update (which is not available) of these kind hardware is usually not pretty common. What I wonder if the GPU muxes can be accessed later, after BIOS initialization, but before kernel graphics initialization, for being correctly re-initialized as they should, using some kind of interface.
It appears that this problem may have come to a resolution, Beelink released an updated BIOS and when combined with the 20241210 firmware, the SER9 now boots, and operates correctly. Earlier versions of firmware crashed on execution or hung so at least for this lone, singular piece of hardware, the 20241210 radeon-firmware must be used. (I do roll my own RPMs when I'm fiddling with a problem, or I need something newer than Cauldron can provide.)
I saw kernel-mainline-6.14.4-1.mainline.mga10.src.rpm on the cauldron core SRPMS so I patched it up to 6.14.5, enabled CONFIG_DRM_ACCEL_AMDXDNA=m because everything is better when all that AI is enabled and rebuilt kernel-firmware-nonfree with 20250410. I plan to build and try kernel-stable-6.12.25-2.stable.mga10.src.rpm to make sure that the likely LTS kernel for Mageia 10 works with Strix Point.
There will be a kernel-firmware upgrade for cauldron soon to current version. Did you already experiment 20250410 as being stable? Note that CONFIG_DRM_ACCEL_AMDXDN is not yet supported in kernel 6.12.27 nor 6.6.x, because the code drivers/accel/amdxdna is not yet present there. What about AI MAX+ 395, same features as 370?
For both 6.14.5-mainline and 6.12.27-stable, firmware 20250410 appears to be stable although "dmesg | grep firmware" reports amdgpu 0000:65:00.0: Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2. This could be due to something I did while building the 20250410 firmware RPM though. I couldn't detect any errors other than a message that appeared in Xorg.0.log: (EE) AMDGPU(0): drmmode_do_crtc_dpms cannot get last vblank counter I was waiting for a mini with the AI MAX+ 395 but since I live in the US, I may not be able to purchase one. As thrilling as being able to load amdxdna is, I've not found anything that does anything useful with it.