Description of problem: My desktop crashed after disconnection from X11 server. See joined file extract from /var/log/messages. I had recently updated the new x11 video driver : Jan 2 09:08:04 localhost [RPM][27231]: install x11-driver-video-nvidia340-340.108-1.mga7.nonfree.x86_64: success After reboot, I saw the following traces in dmesg : [ 188.228647] NVRM: Your system is not currently configured to drive a VGA console [ 188.228654] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver [ 188.228658] NVRM: requires the use of a text-mode VGA console. Use of other console [ 188.228661] NVRM: drivers including, but not limited to, vesafb, may result in [ 188.228663] NVRM: corruption and stability problems, and is not supported. [ 188.279320] ------------[ cut here ]------------ [ 188.279325] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'nvidia_stack_t' (offset 11864, size 3)! [ 188.279338] WARNING: CPU: 2 PID: 4328 at mm/usercopy.c:80 usercopy_warn+0x7d/0xa0 [ 188.279339] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_comment ip6table_mangle ip6table_nat ip6table_raw nf_log_ipv6 ip6table_filter ip6_tables xt_recent ipt_IFWLOG ipt_psd xt_set ip_set_hash_ip ip_set ipt_REJECT nf_reject_ipv4 xt_conntrack xt_hashlimit xt_addrtype xt_mark iptable_mangle iptable_nat xt_CT xt_tcpudp iptable_raw nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv4 iptable_filter af_packet cfg80211 rfkill vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) joydev raid1 kvm_amd ccp snd_hda_codec_hdmi kvm irqbypass sha1_generic input_leds wmi_bmof r8169 k10temp realtek libphy snd_hda_codec_via [ 188.279366] snd_hda_codec_generic sp5100_tco ledtrig_audio i2c_piix4 snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core asus_atk0110 snd_hwdep snd_pcm snd_timer ide_pci_generic jmicron snd ide_core soundcore acpi_cpufreq evdev sch_fq_codel ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4 hid_generic usbhid hid uas usb_storage sr_mod ohci_pci serio_raw xhci_pci xhci_hcd ehci_pci ehci_hcd ohci_hcd usbcore ata_generic pata_acpi usb_common pata_jmicron video mxm_wmi i2c_algo_bit drm_kms_helper ttm wmi button drm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nouveau] [ 188.279387] CPU: 2 PID: 4328 Comm: Xorg Tainted: P O 5.4.6-desktop-2.mga7 #1 [ 188.279388] Hardware name: System manufacturer System Product Name/M4A87TD EVO, BIOS 2001 03/08/2011 [ 188.279390] RIP: 0010:usercopy_warn+0x7d/0xa0 [ 188.279392] Code: 0d 95 41 51 4d 89 d8 48 c7 c0 c7 7f 0c 95 49 89 f1 48 89 f9 48 0f 45 c2 48 c7 c7 18 a1 0d 95 4c 89 d2 48 89 c6 e8 ac 8d e1 ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 9f 69 0c 95 49 89 f1 49 89 f3 eb 96 [ 188.279393] RSP: 0018:ffffa5db4080bbb8 EFLAGS: 00010286 [ 188.279395] RAX: 0000000000000000 RBX: ffff94e9acba5e58 RCX: 0000000000000006 [ 188.279395] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff94e9afa974c0 [ 188.279396] RBP: 0000000000000003 R08: 0000000000000457 R09: 0000000000000004 [ 188.279397] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 [ 188.279397] R13: ffff94e9acba5e5b R14: ffff94e9acba5e58 R15: ffff94e9acba5ea0 [ 188.279399] FS: 00007f905c5a9940(0000) GS:ffff94e9afa80000(0000) knlGS:0000000000000000 [ 188.279400] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 188.279400] CR2: 00007f9057a2cf40 CR3: 00000002194ec000 CR4: 00000000000006e0 [ 188.279401] Call Trace: [ 188.279406] __check_object_size+0x162/0x173 [ 188.279548] os_memcpy_to_user+0x21/0x40 [nvidia] [ 188.279701] _nv001372rm+0xa5/0x260 [nvidia] [ 188.279862] ? _nv004782rm+0x4eba/0x5500 [nvidia] [ 188.280005] ? _nv004329rm+0xec/0xf0 [nvidia] [ 188.280135] ? _nv004324rm+0xca/0x650 [nvidia] [ 188.280266] ? _nv015124rm+0x576/0x5c0 [nvidia] [ 188.280403] ? _nv000694rm+0x2e/0x60 [nvidia] [ 188.280532] ? _nv000789rm+0x5f5/0x8b0 [nvidia] [ 188.280657] ? rm_ioctl+0x73/0x100 [nvidia] [ 188.280784] ? nvidia_ioctl+0x148/0x490 [nvidia] [ 188.280924] ? nvidia_frontend_ioctl+0x2d/0x50 [nvidia] [ 188.281051] ? nvidia_frontend_unlocked_ioctl+0x19/0x20 [nvidia] [ 188.281054] ? do_vfs_ioctl+0xa4/0x630 [ 188.281056] ? ksys_ioctl+0x60/0x90 [ 188.281058] ? ksys_write+0x59/0xd0 [ 188.281060] ? __x64_sys_ioctl+0x16/0x20 [ 188.281062] ? do_syscall_64+0x5f/0x200 [ 188.281064] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 188.281066] ---[ end trace 3522e411b9a71731 ]--- Version-Release number of selected component (if applicable): 108-1.mga7.nonfree.x86_64 How reproducible: ?? Steps to Reproduce: 1. ?? 2. 3.
Created attachment 11442 [details] Extract from /var/log/messages at the time of crash
Crap. 340.108 was suppoed to have official kernel 5.4 support and was tested by some nvidia340 users without issues. does it work at all, or does it always crash ? I guess nVidia devs forgot to test their changes with HARDENED_USERCOPY enabled kernels :( technically it should still work, as we have enabled HARDENED_USERCOPY_FALLBACK that will spit out the kernel trace as info, but still keep working... And the nvidia_stack_t symbol is in the binary-only code, so we cant patch it out :/ If you want to go back to the older driver: dkms-nvidia340-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-cuda-opencl-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-devel-340.107-12.mga7.nonfree.x86_64.rpm nvidia340-doc-html-340.107-12.mga7.nonfree.x86_64.rpm x11-driver-video-nvidia340-340.107-12.mga7.nonfree.x86_64.rpm check which rpms are installed with rpm -qa |grep nvidia340 and then downgrade them, for example if you have dkms-nvidia340 and x11-driver-nvidia340 you can do: urpmi --downgrade dkms-nvidia340-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree and then add the following lines to /etc/urpmi/skip.list /^dkms-nvidia340/ /^x11-driver-video-nvidia340/ and so on...
CC: (none) => tmbAssignee: bugsquad => kernel
FWIW, that log message has been present for a long time - see bug 24663 - without any apparent ill effects. So it may be a red herring.
CC: (none) => mageia
Yeah, I know, thats the stack trace printed out by HARDENED_USERCOPY_FALLBACK to notify users about it but "keep working" as I wrote in comment 2, but then in the log in comment 1 I see: Jan 5 04:56:45 localhost kernel: [130066.500696] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0029, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 5 04:56:45 localhost kernel: [130066.556059] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0029, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 5 04:57:18 localhost kernel: [130099.586739] NVRM: Xid (PCI:0000:05:00): 6, PE0001 Jan 5 04:57:18 localhost okular[12927]: The X11 connection broke (error 1). Did the X11 server die? which is why I asked: does it work at all, or does it always crash ? and the downgrade info is to know if the problem goes away
I rebooted at Jan 5 17:59:19 and for the moment, my x11 server is still alive. But half an hour later, I still got these messages I didn’t see this morning : Jan 5 18:33:01 localhost kglobalaccel5[4882]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost kscreen_backend_launcher[4892]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost kuiserver5[6765]: The X11 connection broke (error 1). Did the X11 server die? Jan 5 18:33:01 localhost org.a11y.Bus[5005]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Jan 5 18:33:01 localhost org.a11y.Bus[5005]: after 1633 requests (1633 known processed) with 0 events remaining. Jan 5 18:33:01 localhost kactivitymanagerd[4968]: The X11 connection broke (error 1). Did the X11 server die? I'll try the previous nvidia package the next time it crashes (or I have to reboot). I have some heavy tasks to finish. Or I'll look into my old /var/log/messages to find some X11 errors.
It finally crashed at Jan 7 02:32:50 with the same errors : Jan 7 02:32:32 localhost kernel: [117101.322769] NVRM: GPU at PCI:0000:05:00: GPU-2d5ce2d6-32ab-88b3-e5cd-97d122043eb4 Jan 7 02:32:33 localhost kernel: [117101.322774] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0028, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 7 02:32:33 localhost kernel: [117101.415830] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ChID 0028, Class 00008597, Offset 00001b0c, Data 0000f000 Jan 7 02:32:50 localhost ksmserver[30719]: The X11 connection broke (error 1). Did the X11 server die? I downgraded with : urpmi --downgrade x11-driver-video-nvidia340-340.108-1.mga7.nonfree dkms-nvidia340-340.108-1.mga7.nonfree nvidia340-doc-html-340.108-1.mga7.nonfree and what is surprising is that it seems to have installed the same versions : rpm -qa |grep nvidia340 dkms-nvidia340-340.108-1.mga7.nonfree x11-driver-video-nvidia340-340.108-1.mga7.nonfree nvidia340-doc-html-340.108-1.mga7.nonfree I got the same oops after reboot. Well, I'll see the difference in the coming hours.
I checked the files installed by urpmi --downgrade and they are the same as before. And in the repository, there is no 340-340.107 but for devel rpm : nvidia340-devel-340.107-9.mga7.nonfree.x86_64 What to do ?
You need to specify version to downgrade to, as I wrote in comment 2: urpmi --downgrade dkms-nvidia340-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree
Sorry, I hadn't seen the version in the command line. But as I say in my last comment, there is no dkms-nvidia340-340.107-* in my repositories !
Well, you were right. I ran : urpmi --downgrade x11-driver-video-nvidia340-340.107-12.mga7.nonfree dkms-nvidia340-340.107-12.mga7.nonfree nvidia340-doc-html-340.107-12.mga7.nonfree and it completed !
started at Jan 7 10:53:25 crashed at : Jan 9 00:19:10 rottennvidiadriver kscreenlocker_greet[7591]: The X11 connection broke: I/O error (code 1) Jan 9 00:19:10 rottennvidiadriver ksmserver[31822]: The X11 connection broke (error 1). Did the X11 server die? my conf : rpm -qa|grep nvidia dkms-nvidia340-340.107-12.mga7.nonfree nvidia340-doc-html-340.107-12.mga7.nonfree x11-driver-video-nvidia340-340.107-12.mga7.nonfree any advice ?
Mageia 7 is EOL since July 1st 2021. There will not have any further bugfix for this release. You are encouraged to upgrade to Mageia 8 as soon as possible. @reporter, if this bug still apply with Mageia 8, please let us know it. @packager, if you work on the Mageia 7 version of your package, please check the Mageia 8 package if issue is also present. In this case, please fix the Mageia 8 version instead. This bug report will be closed OLD if there is no further notice within 1st September 2021.
Hi bug reporter and hi assignee and others involved, Please reopen this bug report if it is still valid for Mageia 8 or 9(cauldron), and change "Version:" in the upper left of this report accordingly. This report is being closed as OLD because it was filed against Mageia 7, for which support ended on June 30th 2021. Thanks, Marja
Status: NEW => RESOLVEDResolution: (none) => OLD