Description of problem: Just updated to latest Cauldron on a system where everything was working fine (also running Cauldron) - haven't updated for 2 weeks (was on holiday) After around 800+ updates and reboot - system comes up after grub and then just a blank screen - no keyboard, nothing ! Only magic keys to reboot tried to switch to nouveau - didn't help - deinstalled all nvidia stuff and used XFDrake again - same issue Version-Release number of selected component (if applicable): nvidia 515.76-1 with RTX 3060 card How reproducible: Everytime - can't get graphical interface - same machine boots fine into Windows and Mint Linux Steps to Reproduce: 1. Updated to latest Cauldron 2. Reboot 3. BLACK SCREEN and frozen
Sorry for your angst. > can't get graphical interface Are you able to get a virtual console with Ctl/Alt/Fn ? > tried to switch to nouveau How did you manage that? If so, please tell us about your system, post O/P of: $ inxi -MSGxx and attach the journal of the failed boot: # journalctl -b --no-hostname > journal.txt # xz journal.txt The result to attach would be journal.txt.xz
CC: (none) => lewyssmith
No angst - just frustration ;-) With blank screen, can not switch to virtual console - only reboot with magic key sequence Booted into recovery node and ran XFdrake to switch to nouveau After today's update (which also included a new NVidia rpm - stll fails, but now NVidia driver doesn't even load anymore When i try to manualy load the nvidia module I get a contant stream of the following error: nvrm the nvidia probe routine was not called for 1 device(s) See attached for the details you requested
Created attachment 13407 [details] journald output
Created attachment 13408 [details] inxi output
Nobody responded on the dev mailing list - so not sure if I'm the only one affected ?!? I can only report that the exact same machine works fine with latest NVidia drivers under Linux Mint . . .
Thank you for the attached information. As you suggest comment 2, there is little evidence of video|VGA|nvidia messages in the journal. A do not see what bugsquad can do more, so assigning to kernel/drivers.
CC: lewyssmith => (none)Assignee: bugsquad => kernel
lspcidrake -vv | grep VGA lspcidrake -vv | grep 3D would show the PCI IDs of your video system cards. lshw -c video would also show which is the current driver used. You seems using kernel 5.19.12. Did you get the same problems with kernel 6.0.0 (in updates_testing)? Note that the card RTX 3060 might be also supported by driver nvidia470, should the 515 not working for some unknown reason (you have just to choose the entry "NVIDIA GeForce 635 to GeForce 920" in XFdrake to get it), even if not automatically pointed to it. Also to check manually in the file: /usr/share/doc/x11-driver-video-nvidia-current/ or /usr/share/doc/x11-driver-video-nvidia470/ there is a file, called README.txt, whose Appendix A lists the exact PCI IDs of the cards supported by the corresponding driver. This is redundant, as the table are already merged in ldetect-lst, but you might check if there is some card unsupported. Also checking /var/log/Xorg.0.log once started with proprietary drivers might show some info about what's going wrong. And last but not least, check in /proc/cmdline there is the boot parameter 'nokmsboot'. This is mandatory for nvidia proprietary drivers to get working on discrete desktop cards (while on the contrary is not required for nouveau, or those laptop when using Prime) and is automatically added by XFdrake to /etc/default/grub, once configured the Nvidia proprietary drivers. However, sometimes, especially when playing with proprietary/free drivers or forcing installing/reinstalling/uninstalling/deinstalling manually kernels and drivers, this parameter might get lost.
CC: (none) => ghibomgx
Thx Guiseppe! I know that this card works fine with latest NVidia driver (515.65.01) as I am using the exact same machine under Linux Mint and it works fine . . . so regressing to older nvidia 470 doesn't make sense ?!? This machine was working fine under Cauldron with latest nvidia drivers just prior to this posting - as mentioned, I was away for two weeks - cam back and installed all Cauldron updates and then things broke :-( I updated to kernel 6.0 - which now allows XFDrake to build the nvidia drivers correctly and they now load properly - unfortunately, same issue - blank screen on normal boot and can't do anything except "magic keys" reboot - "nokmsboot" is in grub already . . . I might just wait for the alpha / beta ISOs for Mageia 9 and try a fresh install - meanwhile my fallback is Linux Mint . . . once again, I have not seen any reply on the DEV list if someone else is affected - so it might be just my system ?!?
Well, using nvidia470 was just a temporary suggestion to split up, trying something that might work too. Probably what happened is just messed up with some package deps (not necessary from nvidia), or missed package. I myself after upgrade from last week got gdm not working anymore (it just appear a white screen saying that "something wrong just happened" on startup). Booting in init level 3 and then running startx manually instead would make plasma5 starting correctly. Also another side effect, when installing the package git-prompt, would make the PS1 env var disappearing, leaving any shell prompt mute. What is strange is that these happens both on a system and not on another.
Hi I also have an RTX 3060. For information, I managed to have a display server with plasma with the open source driver if I deleted the /etc/X11/xorg.conf file. I can connect to my session with sddm and plasma works properly. The proprietary driver nvidia 515.xx installs and the machine boots correctly. However, with this driver, the connection to my plasma session does not work. With the proprietary driver for gdm, I get the same behaviour as described in this bug (https://bugs.mageia.org/show_bug.cgi?id=30936) : "oh no, something has gone wrong" With the open source driver, gdm is not displayed and I can't do anything on the machine.
CC: (none) => jeanmichel.varvou
For gdm in the meanwhile you can switch to sddm using: urpmi sddm systemctl disable gdm.service systemctl enable sddm.service
Hi Thanks Giuseppe. Indeed, I can connect to a gnome session with the proprietary driver if I use sddm. So the problem is related to gdm. sddm working with gnome, I redid an installation with the proprietary driver via a net installation and the Plasma desktop. The last test is conclusive. So in summary: => gdm does not work; => the proprietary driver 515.76-3 works with plasma and gnome. => The open source driver works if the /etc/X11/xorg.conf file is deleted.
Note also that it was reported upstream that a black screen might occur on a RTX 30xx series with 515.76 when the X server is started on a boot display connected via an HDMI port. A workaround was to not use the HDMI port (but DP) or connect the HDMI portof the display after the boot has been completed. Previous 515.65 seems was not affected.
Yeah, in a nvidia forum a dev responded that the issue with black screen is already root-caused and a fix is queued for next driver release...
Any idea when 5.20.56 might land in Cauldron? It's been out since the 12th
That's not the one we are waiting for, we are tracking Production branch, not feature branch... So either a new R515 branch release, or if nVidia switches to a new Production branch...
Well that's depressing :-( I guess I have to continue to use Linux Mint until I get a working version - according to this LinuxMint post: https://forums.linuxmint.com/viewtopic.php?f=47&t=383605&p=2245034#p2245034 - the bug is know and being fixed in the 5.20 version - not sure if it will be backported
bugs like that pretty much always get fixed in Production branch, as thats the "important one"... think of Feature vs Production branch having the same process as our Cauldron vs Release branch... new fixes/changes/drivers goes to Cauldron first for initial tests, and if they behave, they land in Release branch... It's part of QA, as the feature branches can cause more breakages than they fixes...
(In reply to Thomas Backlund from comment #18) > bugs like that pretty much always get fixed in Production branch, as thats > the "important one"... > > think of Feature vs Production branch having the same process as our > Cauldron vs Release branch... > > new fixes/changes/drivers goes to Cauldron first for initial tests, and if > they behave, they land in Release branch... > > It's part of QA, as the feature branches can cause more breakages than they > fixes... Understood! Thx.
Is there an easy was to revert to the previous build of the NVidia 5.X driver on Cauldron?? Under Mint, I just installed 5.10 for example and it works (5.15 is also broke) Thx
Depends on how "easy" you mean. AFAIK just previous 515.65 is not affected (though it had other bugs). In the past there was some mirror (belnet IIRC) that was keeping all RPM packages issued, so that was easy to find older RPMs, but now I think there are no more such kind of mirrors. This also raise one question, i.e. whether for some particular package is it possible to keep the latest two versions in mirrors (and thus also in RPM metadata), similar to what we do for updates_testing. Anyway the "easiest" way that come to my mind is to rebuild the 515.65 from the svn archive, using: mgarepo getsrpm -l -r 1875933 svn://svn.mageia.org/packages/nvidia-current/ bm -l nvidia-current-515.65.01-1.mga9.src.rpm then install the local RPM packages it produced into: /var/tmp/nvidia-current-515.65.01-1.mga9-topdir/RPMS/x86_64/ with urpmi --downgrade. It requires the presence of the "mgarepo" and "bm" commands which should be installed from their respective packages. But maybe there are also other ways.
(In reply to Giuseppe Ghibò from comment #21) > Depends on how "easy" you mean. AFAIK just previous 515.65 is not affected > (though it had other bugs). In the past there was some mirror (belnet IIRC) > that was keeping all RPM packages issued, so that was easy to find older > RPMs, but now I think there are no more such kind of mirrors. > > This also raise one question, i.e. whether for some particular package is it > possible to keep the latest two versions in mirrors (and thus also in RPM > metadata), similar to what we do for updates_testing. > > Anyway the "easiest" way that come to my mind is to rebuild the 515.65 from > the svn archive, using: > > mgarepo getsrpm -l -r 1875933 svn://svn.mageia.org/packages/nvidia-current/ > bm -l nvidia-current-515.65.01-1.mga9.src.rpm > > then install the local RPM packages it produced into: > > /var/tmp/nvidia-current-515.65.01-1.mga9-topdir/RPMS/x86_64/ > > with urpmi --downgrade. It requires the presence of the "mgarepo" and "bm" > commands which should be installed from their respective packages. > > But maybe there are also other ways. Wow - Thanks! On Linux Mint they simply offer the previous version as well as the latest "recommended" version. Maybe we should do something like that?? When a proprietary graphics driver is upgraded and fails to work, there should be a fallback to the last known working version . . . This would save some hassle and offer more "dummy proof" updates . . . . ?!? Cheers, Robert
Support all these versions (e.g. 515.43.04, 515.48.07, 515.57, 515.65.01, 515.76) and support them "smootly" IMHO is beyond current drakx utils capability. From what I remember also these kind of problems are pretty rare, I'm surprised a new production branch is not released yet. Instead what could be worthwhile and simple to consider is to keep in cauldron the latest two releases of a driver in the mirrors. More or less like we already do for nonfree/updates. E.g. if you look at: https://distrib-coffee.ipsl.jussieu.fr/pub/linux/Mageia/distrib/8/x86_64/media/nonfree/updates/ you see that all the releases are keepen there. Of course my idea about keeping two releases fails if more than a release is "released" for a single version. E.g. having nvidia-current-515.76-2.mga9 and nvidia-current-515.76-1.mga9 are just the "two latest" published RPMs, but would discard version 515.65.01 and before, so it's not what we wanted. Probably would be easier to just mark all the nvidia-current* RPM packages in cauldron to not be discarded, then at some point we could polish older releases from RPMs to be mirrored by hand. Of course this wouldn't be "dummy proof".
(In reply to Robert Fox from comment #22) > Wow - Thanks! On Linux Mint they simply offer the previous version as well > as the latest "recommended" version. Maybe we should do something like > that?? When a proprietary graphics driver is upgraded and fails to work, > there should be a fallback to the last known working version . . . This > would save some hassle and offer more "dummy proof" updates . . . . ?!? You do realize we have done that for years in stable releases where we have previous releases in updates tree... you obviously need some manual intervention with urpmi --downgrade...
Linux Mint now has 5.20.56 available - so assuming it's now in production branch
Created attachment 13455 [details] LinuxMint new versions
I can confirm in 5.20 branch this problem is resolved - just waiting for Mageia Cauldron build . . .
(In reply to Robert Fox from comment #19) > (In reply to Thomas Backlund from comment #18) > > bugs like that pretty much always get fixed in Production branch, as thats > > the "important one"... > > > > think of Feature vs Production branch having the same process as our > > Cauldron vs Release branch... > > > > new fixes/changes/drivers goes to Cauldron first for initial tests, and if > > they behave, they land in Release branch... > > > > It's part of QA, as the feature branches can cause more breakages than they > > fixes... > > Understood! Thx. Any timeline for getting the latest version (5.20) in Cauldron ?? The last updates in Cauldron were October 7th - and this is holding me back from returning to my preferred Cauldron install . . . Thx in advance, Robert
AFAIK the 520.56.06 is not a production branch. IMHO Mint probably packages either short or long releases and let user to manually choose (or is it automatically detected?) which driver version to install from the GUI. According to what posted on NV sites: Current production branch release: 515.76 Current new feature branch release: 520.56.06 Current beta release: 515.43.04 Legacy releases 470.141.03 ... From what I know, a production branch has also the advantage of having several releases and not having supported cards being removed from one release to another in the same series (at most having new cards added). Usually the cadence of a new release is 4-6 weeks on NVidia drivers, so maybe a new release upstream it's pretty close, who know?
Seems to be a "feature" release according to this website - Mint is based on ubuntu: https://ubuntuhandbook.org/index.php/2022/10/nvidia-driver-520-install-ubuntu/ or here: https://www.phoronix.com/news/NVIDIA-520.61.05-Linux
BTW, Note also that 520.61.05 is prior to 520.56.06. AFAIK it's the version shipped in the nvidia cuda 11.8, and should be affected by the same bug (black screen) as 515.76.
I broke down and manually installed NVIDIA-Linux-x86_64-520.56.06.run - So a temporary fix until further notice! I've been using Linux Mint since beginning of October(when this bug happened) - but I miss my Mageia! I fail to understand why NVidia doesn't either patch the 5.15 branch to fix this annoying and serious 3060 issue or release the 5.20 New Feature branch! Backporting such an important fix would be welcome (not blaming Mageia policies here) Just FYI
There was yesterday some activity, and was released upstream a new "beta" driver, 525.53. So probably a newer "non-beta" should be expected soon. Though it's not clear whether it will be a new production or a new branch.
Fix is in nvidia-current-515.86.01-1.mga9 just pushed to buildsystem
Resolution: (none) => FIXEDStatus: NEW => RESOLVED
Thanks. Wortwhile to note that support for new RTX 4090 (and H100), for who has, was left out from 515.86.01 (it is in 520.56/525.53).