| Summary: | Desktop crash maybe related to x11 nvidia driver update | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Christian C <bugzzzz> |
| Component: | RPM Packages | Assignee: | Kernel and Drivers maintainers <kernel> |
| Status: | RESOLVED OLD | QA Contact: | |
| Severity: | critical | ||
| Priority: | Normal | CC: | mageia, tmb |
| Version: | 7 | ||
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | x11-driver-video-nvidia340-340.108-1.mga7.nonfree.x86_64 ?? | CVE: | |
| Status comment: | |||
| Attachments: | Extract from /var/log/messages at the time of crash | ||
|
Description
Christian C
2020-01-06 09:42:52 CET
Created attachment 11442 [details]
Extract from /var/log/messages at the time of crash
Crap. 340.108 was suppoed to have official kernel 5.4 support and was tested by some nvidia340 users without issues. does it work at all, or does it always crash ? I guess nVidia devs forgot to test their changes with HARDENED_USERCOPY enabled kernels :( technically it should still work, as we have enabled HARDENED_USERCOPY_FALLBACK that will spit out the kernel trace as info, but still keep working... And the nvidia_stack_t symbol is in the binary-only code, so we cant patch it out :/ If you want to go back to the older driver: dkms-nvidia340-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-cuda-opencl-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-devel-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-doc-html-340.107-12.mga7.nonfree.x86_64.rpm x11-driver-video-nvidia340-340.107-12.mga7.nonfree.x86_64.rpm check which rpms are installed with rpm -qa |grep nvidia340 and then downgrade them, for example if you have dkms-nvidia340 and x11-driver-nvidia340 you can do: urpmi --downgrade dkms-nvidia340-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree and then add the following lines to /etc/urpmi/skip.list /^dkms-nvidia340/ /^x11-driver-video-nvidia340/ and so on... CC:
(none) =>
tmb FWIW, that log message has been present for a long time - see bug 24663 - without any apparent ill effects. So it may be a red herring. CC:
(none) =>
mageia Yeah, I know, thats the stack trace printed out by HARDENED_USERCOPY_FALLBACK to notify users about it but "keep working" as I wrote in comment 2, but then in the log in comment 1 I see: Jan 5 04:56:45 localhost kernel: [130066.500696] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0029, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 5 04:56:45 localhost kernel: [130066.556059] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0029, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 5 04:57:18 localhost kernel: [130099.586739] NVRM: Xid (PCI:0000:05:00): 6, PE0001 Jan 5 04:57:18 localhost okular[12927]: The X11 connection broke (error 1). Did the X11 server die? which is why I asked: does it work at all, or does it always crash ? and the downgrade info is to know if the problem goes away I rebooted at Jan 5 17:59:19 and for the moment, my x11 server is still alive. But half an hour later, I still got these messages I didn’t see this morning : Jan 5 18:33:01 localhost kglobalaccel5[4882]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost kscreen_backend_launcher[4892]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost kuiserver5[6765]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost org.a11y.Bus[5005]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Jan 5 18:33:01 localhost org.a11y.Bus[5005]: after 1633 requests (1633 known processed) with 0 events remaining. Jan 5 18:33:01 localhost kactivitymanagerd[4968]: The X11 connection broke (error 1). Did the X11 server die? I'll try the previous nvidia package the next time it crashes (or I have to reboot). I have some heavy tasks to finish. Or I'll look into my old /var/log/messages to find some X11 errors. It finally crashed at Jan 7 02:32:50 with the same errors : Jan 7 02:32:32 localhost kernel: [117101.322769] NVRM: GPU at PCI:0000:05:00: GPU-2d5ce2d6-32ab-88b3-e5cd-97d122043eb4 Jan 7 02:32:33 localhost kernel: [117101.322774] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0028, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 7 02:32:33 localhost kernel: [117101.415830] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0028, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 7 02:32:50 localhost ksmserver[30719]: The X11 connection broke (error 1). Did the X11 server die? I downgraded with : urpmi --downgrade x11-driver-video-nvidia340-340.108-1.mga7.nonfree dkms-nvidia340-340.108-1.mga7.nonfree nvidia340-doc-html-340.108-1.mga7.nonfree and what is surprising is that it seems to have installed the same versions : rpm -qa |grep nvidia340 dkms-nvidia340-340.108-1.mga7.nonfree x11-driver-video-nvidia340-340.108-1.mga7.nonfree nvidia340-doc-html-340.108-1.mga7.nonfree I got the same oops after reboot. Well, I'll see the difference in the coming hours. I checked the files installed by urpmi --downgrade and they are the same as before. And in the repository, there is no 340-340.107 but for devel rpm : nvidia340-devel-340.107-9.mga7.nonfree.x86_64 What to do ? You need to specify version to downgrade to, as I wrote in comment 2: urpmi --downgrade dkms-nvidia340-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree Sorry, I hadn't seen the version in the command line. But as I say in my last comment, there is no dkms-nvidia340-340.107-* in my repositories ! Well, you were right. I ran : urpmi --downgrade x11-driver-video-nvidia340-340.107-12.mga7.nonfree dkms-nvidia340-340.107-12.mga7.nonfree nvidia340-doc-html-340.107-12.mga7.nonfree and it completed ! started at Jan 7 10:53:25 crashed at : Jan 9 00:19:10 rottennvidiadriver kscreenlocker_greet[7591]: The X11 connection broke: I/O error (code 1) Jan 9 00:19:10 rottennvidiadriver ksmserver[31822]: The X11 connection broke (error 1). Did the X11 server die? my conf : rpm -qa|grep nvidia dkms-nvidia340-340.107-12.mga7.nonfree nvidia340-doc-html-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree any advice ? Mageia 7 is EOL since July 1st 2021. There will not have any further bugfix for this release. You are encouraged to upgrade to Mageia 8 as soon as possible. @reporter, if this bug still apply with Mageia 8, please let us know it. @packager, if you work on the Mageia 7 version of your package, please check the Mageia 8 package if issue is also present. In this case, please fix the Mageia 8 version instead. This bug report will be closed OLD if there is no further notice within 1st September 2021. Hi bug reporter and hi assignee and others involved, Please reopen this bug report if it is still valid for Mageia 8 or 9(cauldron), and change "Version:" in the upper left of this report accordingly. This report is being closed as OLD because it was filed against Mageia 7, for which support ended on June 30th 2021. Thanks, Marja Status:
NEW =>
RESOLVED |