Bug 30727 - After updating nvidia-current, Plasma can not log out, reboot, etc until reboot.
Summary: After updating nvidia-current, Plasma can not log out, reboot, etc until reboot.
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 8
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-09 01:02 CEST by Morgan Leijström
Modified: 2024-04-21 21:41 CEST (History)
1 user (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Morgan Leijström 2022-08-09 01:02:36 CEST
This is an old issue.
For example Bug 20153 for nvidia 340, assumed to be fixed.
I am pretty sure there was another bug too, for nvidia-current, bug my search-fu fails me, so I enter this bug.

This hit me in both nvidia-current testing updates today, Bug 30722, Bug 30724 
Here is the output in journal when clicking Plasma menu to reboot:

aug 08 12:03:52 svarten.tribun ksmserver-logout-greeter[586343]: Could not create scene graph context for backend 'opengl' - check that plugins are installed correctly in /usr/lib64/qt5/plugins
aug 08 12:03:52 svarten.tribun kernel: NVRM: API mismatch: the client has the version 515.65.01, but
                                       NVRM: this kernel module has the version 470.141.03.  Please
                                       NVRM: make sure that this kernel module and all NVIDIA driver
                                       NVRM: components have the same version.
aug 08 12:03:52 svarten.tribun ksmserver-logout-greeter[586343]: QGLXContext: Failed to create dummy context
aug 08 12:03:52 svarten.tribun ksmserver-logout-greeter[586343]: file:///usr/share/plasma/look-and-feel/org.kde.breeze.desktop/contents/components/UserDelegate.qml:34: ReferenceError: model is not defined
aug 08 12:03:52 svarten.tribun ksmserver-logout-greeter[586343]: file:///usr/share/plasma/look-and-feel/org.kde.breeze.desktop/contents/components/UserDelegate.qml:34: ReferenceError: model is not defined
aug 08 12:03:52 svarten.tribun ksmserver-logout-greeter[586343]: Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(ResetNotification), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize 8, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer, swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile  QSurfaceFormat::NoProfile)
Comment 1 sturmvogel 2022-08-09 05:44:07 CEST
There is this ML thread for an explanation and a possible solution (seems not implemented yet):
https://ml.mageia.org/l/arc/dev/2021-12/msg00091.html

And here the solution to fix your present problem (if you still see it):
https://forums.mageia.org/en/viewtopic.php?f=23&t=14234
Comment 2 sturmvogel 2022-08-09 06:06:36 CEST
Also a more precise solution from tmb:
https://bugs.mageia.org/show_bug.cgi?id=28632#c6
Comment 3 Morgan Leijström 2022-08-09 09:27:15 CEST
Many thanks for the digging!

Pity our users still see this on our main DE.

I will have a look at 28632#c6 next time.  However as for the cause, that c6 say it is caused by updating kernel and nvidia at the same case, but I hit it after only updating nvidia-current!
Comment 4 Morgan Leijström 2022-08-09 09:30:17 CEST
This problem i experience is different from 28632 in that after updating Nvidia, I can not log out by the menu (I instead opened a console and "shutdown r now") but I have *no* problem after reboot as then the versions agree.
Comment 5 Giuseppe Ghibò 2022-08-09 11:25:24 CEST
FYI the fix suggested in ML was included in dkms package of cauldron, but not on dkms of mga8. For mga8 it needs to be backported dkms from cauldron or in updates.

Note that there might be still some conditions where updates doesn't work. For instance when an upgrade install a newer release of the driver, and the newer (upstream) driver drops support for your current card.

CC: (none) => ghibomgx

Comment 6 Morgan Leijström 2022-08-09 14:13:00 CEST
(In reply to Giuseppe Ghibò from comment #5)
> FYI the fix suggested in ML was included in dkms package of cauldron, but
> not on dkms of mga8. For mga8 it needs to be backported dkms from cauldron
> or in updates.

I think it would be a good idea to have that fix in mga8 updates testing so we can check if it works more easily, we who experience the problem.

 
> Note that there might be still some conditions where updates doesn't work.
> For instance when an upgrade install a newer release of the driver, and the
> newer (upstream) driver drops support for your current card.

When we know an update risk breaking stuff it is usually put in backports, like we already have a late nvidia-current line in backport, as well as an elder line in updates.
Comment 7 Thomas Backlund 2022-08-09 20:35:59 CEST
(In reply to Morgan Leijström from comment #3)
> Many thanks for the digging!
> 
> Pity our users still see this on our main DE.
>

Correcction... 
it is not and has never been "our main DE"...

It's just one of the DEs we happend to provide...

But yes, it's extremely unstable on nvidia driver updates
Comment 8 Giuseppe Ghibò 2022-08-27 10:21:44 CEST
(In reply to Morgan Leijström from comment #6)
> (In reply to Giuseppe Ghibò from comment #5)
> > FYI the fix suggested in ML was included in dkms package of cauldron, but
> > not on dkms of mga8. For mga8 it needs to be backported dkms from cauldron
> > or in updates.
> 
> I think it would be a good idea to have that fix in mga8 updates testing so
> we can check if it works more easily, we who experience the problem.
> 

There is now package dkms-2.0.19-43.1.mga8 in updates_testing.
Comment 9 Morgan Leijström 2022-08-27 20:04:46 CEST
Sorry to say it did not fix the problem on my system

__Test:
1. While using noeuveau driver and our normal update kernel 5.15.62-desktop-1.mga8:
2. Uninstalled nvidia packages, then
  $ sudo urpmi x11-driver-video-nvidia-current-470.74-1.mga8.nonfree
 (I beleive it is the eldest nvidia-current version we have that support kernel 5.15)
  it installs incl deps, and build module, dkms status shows OK.
  in drakx11 selected to use proprietary
3. Reboot, works OK. 
4. $ sudo urpmi x11-driver-video-nvidia-current-470.94-1.mga8.nonfree
   It seem to progress as usual
5. In Plasma menu selecting to reboot, nothing visible happens, and in log:
aug 27 19:42:20 svarten.tribun kernel: NVRM: API mismatch: the client has the version 470.94, but
                                       NVRM: this kernel module has the version 470.74.  Please
                                       NVRM: make sure that this kernel module and all NVIDIA driver
                                       NVRM: components have the same version.
aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]: QGLXContext: Failed to create dummy context
aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]: file:///usr/share/plasma/look-and-feel/org.kde.breeze.desktop/contents/components/UserDelegate.qml:34: ReferenceError: model is not defined
aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]: Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(ResetNotification), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize 8, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer, swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile  QSurfaceFormat::NoProfile)

(In real, the NVRM line comes many times)
(Later rebooted using terminal, and the driver works)

$ rpm -qa | grep dkms
dkms-virtualbox-6.1.36-1.mga8
dkms-minimal-2.0.19-43.1.mga8
dkms-nvidia-current-470.94-1.mga8.nonfree
dkms-2.0.19-43.1.mga8


And yes Plasma is "only" *one* of the two most used DE in Mageia, one of the three we provide live ISOs of and is suggested by the classic installer.
Comment 10 Thomas Backlund 2022-08-27 20:10:52 CEST
(In reply to Morgan Leijström from comment #9)

> And yes Plasma is "only" *one* of the two most used DE in Mageia, one of the
> three we provide live ISOs of and is suggested by the classic installer.

:)

The reason for it being suggested by the CI is not that we prefer it, it justs hides a bug in drakx installer that made it crash when nothing was selected by default...

it was supposed to be fixed properly later on, but...
Comment 11 Giuseppe Ghibò 2022-08-27 20:58:19 CEST
(In reply to Morgan Leijström from comment #9)

> Sorry to say it did not fix the problem on my system
> 
> __Test:
> 1. While using noeuveau driver and our normal update kernel
> 5.15.62-desktop-1.mga8:
> 2. Uninstalled nvidia packages, then
>   $ sudo urpmi x11-driver-video-nvidia-current-470.74-1.mga8.nonfree
>  (I beleive it is the eldest nvidia-current version we have that support
> kernel 5.15)
>   it installs incl deps, and build module, dkms status shows OK.
>   in drakx11 selected to use proprietary
> 3. Reboot, works OK. 
> 4. $ sudo urpmi x11-driver-video-nvidia-current-470.94-1.mga8.nonfree
>    It seem to progress as usual
> 5. In Plasma menu selecting to reboot, nothing visible happens, and in log:
> aug 27 19:42:20 svarten.tribun kernel: NVRM: API mismatch: the client has
> the version 470.94, but
>                                        NVRM: this kernel module has the
> version 470.74.  Please
>                                        NVRM: make sure that this kernel
> module and all NVIDIA driver
>                                        NVRM: components have the same
> version.
> aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]:
> QGLXContext: Failed to create dummy context
> aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]:
> file:///usr/share/plasma/look-and-feel/org.kde.breeze.desktop/contents/
> components/UserDelegate.qml:34: ReferenceError: model is not defined
> aug 27 19:42:20 svarten.tribun ksmserver-logout-greeter[233378]: Failed to
> create OpenGL context for format QSurfaceFormat(version 2.0, options
> QFlags<QSurfaceFormat::FormatOption>(ResetNotification), depthBufferSize 24,
> redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize 8,
> stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer,
> swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile 
> QSurfaceFormat::NoProfile)
> 
> (In real, the NVRM line comes many times)
> (Later rebooted using terminal, and the driver works)
> 
> $ rpm -qa | grep dkms
> dkms-virtualbox-6.1.36-1.mga8
> dkms-minimal-2.0.19-43.1.mga8
> dkms-nvidia-current-470.94-1.mga8.nonfree
> dkms-2.0.19-43.1.mga8
> 
> 
> And yes Plasma is "only" *one* of the two most used DE in Mageia, one of the
> three we provide live ISOs of and is suggested by the classic installer.

After some rounds of reboots the things seems working, aren't them? BTW, there is also nvidia 470.141 as latest.

The problem seems in unloading the current nvidia kernel modules (nvidia-modules, nvidia-drm, nvidia-uvm) while X11 is being in use. I wonder if there is a safe way to do this. In theory there should be something that automatically does:

- switch from X11 to kernel console/tty
- unload all the nvidia* kernel modules
- build the newer nvidia modules and insert the newer nvidia modules for the current kernel (and eventually other kernels).
- restart X11 (or xdm|sddm) (even without rebooting) and run X11 with the newer 
 nvidia driver.

which requires at least restarting X11.
Comment 12 Morgan Leijström 2022-08-27 21:08:17 CEST
(In reply to Giuseppe Ghibò from comment #11)
> After some rounds of reboots the things seems working, aren't them?

Yes, always directly next boot.

Problem is that common users dont know how to get out, and some wh are afraid of CLI may just pull the plug, possibly loosing work, etc.



> The problem seems in unloading the current nvidia kernel modules
> I wonder if there is a safe way to do this.

Avoid landing in this trouble?

If i reboot and select another already installed kernel, at boot the modules will be built for it, and it works.

So what if simply not building the modules during install (possibly conditionally after detecting Plasma or other potentially problematic DE is in use).
(And launch a popup like the one after installing kernel, gcc, etc, asking user to reboot for new nvidia)
Comment 13 Morgan Leijström 2022-09-07 22:37:22 CEST
I did not encounter this problem in last two kernel tests...
https://bugs.mageia.org/show_bug.cgi?id=30813#c8
https://bugs.mageia.org/show_bug.cgi?id=30814#c3
Comment 14 Morgan Leijström 2022-09-07 22:38:18 CEST
HAh yes I forgot it is nowadays only problem when updating nvidia...
Comment 15 Morgan Leijström 2024-04-21 20:51:39 CEST
This problem seen not to be in Mageia 9

Resolution: (none) => OLD
Status: NEW => RESOLVED

Comment 16 Morgan Leijström 2024-04-21 21:41:13 CEST
More exactly, on mga9, there is no problem issuing the shutting down, but then during shutting down it stops waiting for something i forgot what, something with x11.

That part could be improved.

Note You need to log in before you can comment on or make changes to this bug.