Bug 23587

Summary: Shutdown hang - nouveau strangeness
Product: Mageia Reporter: Frank Griffin <ftg>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: NEW --- QA Contact:
Severity: normal    
Priority: Normal CC: marja11
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:

Description Frank Griffin 2018-09-16 21:12:31 CEST
For a month or so now I have been seeing a shutdown hang, one that can be broken by alt-SysRq-E.  When the magic key is used, I get the following kernel stacktrace:

Sep 15 17:34:15 ftglap kernel: ------------[ cut here ]------------
Sep 15 17:34:15 ftglap kernel: nouveau 0000:01:00.0: timeout
Sep 15 17:34:15 ftglap kernel: WARNING: CPU: 0 PID: 6211 at drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c:86 nvkm_pmu_reset+0x14c/0x160 [nouveau]
Sep 15 17:34:15 ftglap kernel: Modules linked in: cmac rfcomm rpcsec_gss_krb5 nfsv4 nfs fscache ip_vs nf_conntrack af_packet vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep dm_mirror dm_region_hash dm_log dm_mod joydev arc4 hid_multitouch hid_generic spi_pxa2xx_platform 8250_dw snd_soc_skl uvcvideo snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp iTCO_wdt videobuf2_vmalloc snd_hda_ext_core snd_soc_acpi videobuf2_memops videobuf2_v4l2 iTCO_vendor_support videobuf2_common snd_soc_core videodev intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp iwlmvm coretemp snd_hda_codec_generic btusb media kvm_intel btrtl snd_compress btbcm ac97_bus btintel mac80211 kvm snd_hda_intel bluetooth snd_hda_codec irqbypass crc32_pclmul ecdh_generic crc32c_intel ghash_clmulni_intel snd_hda_core pcbc iwlwifi snd_hwdep snd_pcm
@Sep 15 17:34:15 ftglap kernel:  snd_timer aesni_intel aes_x86_64 crypto_simd cfg80211 cryptd glue_helper snd intel_cstate intel_uncore ipmi_devintf idma64 intel_rapl_perf virt_dma mei_me soundcore input_leds asus_nb_wmi mei asus_wmi tpm_crb i2c_i801 sparse_keymap processor_thermal_device intel_lpss_pci tpm_tis rfkill intel_pch_thermal wmi_bmof intel_lpss intel_soc_dts_iosf thermal tpm_tis_core battery tpm int3400_thermal int3403_thermal acpi_thermal_rel ac int340x_thermal_zone rng_core asus_wireless evdev acpi_pad nfsd vboxguest cuse fuse auth_rpcgss nfs_acl lockd grace nvram sunrpc sch_fq_codel ip_tables x_tables ipv6 crc_ccitt autofs4 ipmi_msghandler nouveau xhci_pci xhci_hcd usbcore ttm serio_raw usb_common i915 i2c_hid hid mxm_wmi i2c_algo_bit drm_kms_helper wmi video button drm
Sep 15 17:34:15 ftglap kernel: CPU: 0 PID: 6211 Comm: net_applet Tainted: P        W  O      4.18.8-desktop-1.mga7 #1
Sep 15 17:34:15 ftglap kernel: CPU: 0 PID: 6211 Comm: net_applet Tainted: P        W  O      4.18.8-desktop-1.mga7 #1
Sep 15 17:34:15 ftglap kernel: Hardware name: ASUSTeK COMPUTER INC. X510UNR/X510UNR, BIOS X510UNR.302 11/14/2017
Sep 15 17:34:15 ftglap kernel: RIP: 0010:nvkm_pmu_reset+0x14c/0x160 [nouveau]
Sep 15 17:34:15 ftglap kernel: Code: 5c 41 5d 41 5e c3 48 8b 7d 10 48 8b 5f 50 48 85 db 74 1e e8 16 0b 21 e0 48 89 da 48 c7 c7 d5 7a 55 c0 48 89 c6 e8 ce 31 c7 df <0f> 0b e9 42 ff ff ff 48 8b 5f 10 eb dc 48 8b 5f 10 eb a5 90 0f 1f
Sep 15 17:34:15 ftglap kernel: RSP: 0018:ffffa35a827dfb78 EFLAGS: 00010286
Sep 15 17:34:15 ftglap kernel: RAX: 0000000000000000 RBX: ffff9047a4b02380 RCX: 0000000000000006
Sep 15 17:34:15 ftglap kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9047aec16590
Sep 15 17:34:15 ftglap kernel: RBP: ffff9047a16e2400 R08: 0000000000000030 R09: 0000000000000004
Sep 15 17:34:15 ftglap kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff90479ecabde0
Sep 15 17:34:15 ftglap kernel: R13: ffff9047a1363cc0 R14: 0000000847f8c180 R15: ffff9047a4b8b0a0
Sep 15 17:34:15 ftglap kernel: FS:  00007f9faa5ba740(0000) GS:ffff9047aec00000(0000) knlGS:0000000000000000
Sep 15 17:34:15 ftglap kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 17:34:15 ftglap kernel: CR2: 00007fb0d38c4000 CR3: 0000000209dde005 CR4: 00000000003606f0
Sep 15 17:34:15 ftglap kernel: Call Trace:
Sep 15 17:34:15 ftglap kernel:  nvkm_pmu_init+0x16/0x40 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_subdev_init+0xb2/0x200 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_device_init+0x123/0x280 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_udevice_init+0x41/0x60 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_object_init+0x3e/0x100 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nouveau_do_resume+0x28/0x150 [nouveau]
Sep 15 17:34:15 ftglap kernel:  nouveau_pmops_runtime_resume+0x88/0x150 [nouveau]
Sep 15 17:34:15 ftglap kernel:  pci_pm_runtime_resume+0x78/0xb0
Sep 15 17:34:15 ftglap kernel:  ? pci_restore_standard_config+0x40/0x40

So here's the strangeness: There are two video chips on this laptop.  One is an Intel 810 or later, and one is an nvidia GEForce.  I have dkms-nvidia installed.  When I run XFdrake, it selects the Intel chip and ignores the nvidia.  I get no prompt asking if I want to use a proprietary driver for nvidia.

So why is nouveau (which I think is an open-source nvidia driver) running at all ?  If the system is using the Intel chip, neither nvidia nor nouveau should be running.
Marja Van Waes 2018-09-18 15:23:21 CEST

Assignee: bugsquad => kernel
CC: (none) => marja11

Comment 1 Frank Griffin 2018-12-04 19:04:08 CET
Still happening in current cauldron.
Comment 2 Frank Griffin 2019-02-19 16:35:13 CET
Minor changes, but still happening.  The hang is different, and occurs after the stacktrace at "Starting power off".  The new trace is:

Feb 17 09:53:29 ftglap kernel: ------------[ cut here ]------------
Feb 17 09:53:29 ftglap kernel: nouveau 0000:01:00.0: timeout
Feb 17 09:53:29 ftglap kernel: WARNING: CPU: 1 PID: 11912 at drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c:86 nvkm_pmu_reset+0x14c/0x160 [nouveau]
Feb 17 09:53:29 ftglap kernel: Modules linked in: iptable_filter uas rpcsec_gss_krb5 nfsv4 nfs fscache cmac rfcomm ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 af_packet bnep binfmt_misc sr_mod dm_mirror dm_region_hash dm_log dm_mod joydev arc4 hid_multitouch hid_generic spi_pxa2xx_platform 8250_dw uvcvideo intel_rapl videobuf2_vmalloc videobuf2_memops x86_pkg_temp_thermal intel_powerclamp videobuf2_v4l2 btusb videobuf2_common coretemp kvm_intel btbcm btrtl snd_soc_skl iwlmvm btintel videodev snd_soc_hdac_hda bluetooth kvm snd_hda_ext_core snd_soc_skl_ipc mac80211 snd_soc_sst_ipc snd_soc_sst_dsp iTCO_wdt iTCO_vendor_support snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_generic snd_soc_core irqbypass media usb_storage ecdh_generic crc32_pclmul crc32c_intel snd_compress ac97_bus ghash_clmulni_intel snd_hda_intel iwlwifi snd_hda_codec aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate snd_hda_core intel_uncore snd_hwdep intel_rapl_perf snd_pcm cfg80211 idma64
Feb 17 09:53:29 ftglap kernel:  asus_nb_wmi asus_wmi virt_dma input_leds sparse_keymap snd_timer wmi_bmof snd rfkill mei_me tpm_crb processor_thermal_device mei intel_lpss_pci soundcore tpm_tis intel_lpss intel_pch_thermal intel_soc_dts_iosf thermal i2c_i801 tpm_tis_core tpm ac int3403_thermal int3400_thermal battery rng_core acpi_thermal_rel int340x_thermal_zone asus_wireless acpi_pad evdev nfsd cuse fuse auth_rpcgss nfs_acl lockd grace nvram sunrpc sch_fq_codel ip_tables x_tables ipv6 crc_ccitt autofs4 nouveau xhci_pci xhci_hcd usbcore ttm serio_raw usb_common i915 i2c_hid mxm_wmi hid i2c_algo_bit drm_kms_helper video wmi button drm [last unloaded: vboxdrv]
Feb 17 09:53:29 ftglap kernel: Modules linked in: iptable_filter uas rpcsec_gss_krb5 nfsv4 nfs fscache cmac rfcomm ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 af_packet bnep binfmt_misc sr_mod dm_mirror dm_region_hash dm_log dm_mod joydev arc4 hid_multitouch hid_generic spi_pxa2xx_platform 8250_dw uvcvideo intel_rapl videobuf2_vmalloc videobuf2_memops x86_pkg_temp_thermal intel_powerclamp videobuf2_v4l2 btusb videobuf2_common coretemp kvm_intel btbcm btrtl snd_soc_skl iwlmvm btintel videodev snd_soc_hdac_hda bluetooth kvm snd_hda_ext_core snd_soc_skl_ipc mac80211 snd_soc_sst_ipc snd_soc_sst_dsp iTCO_wdt iTCO_vendor_support snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_generic snd_soc_core irqbypass media usb_storage ecdh_generic crc32_pclmul crc32c_intel snd_compress ac97_bus ghash_clmulni_intel snd_hda_intel iwlwifi snd_hda_codec aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate snd_hda_core intel_uncore snd_hwdep intel_rapl_perf snd_pcm cfg80211 idma64
Feb 17 09:53:29 ftglap kernel:  asus_nb_wmi asus_wmi virt_dma input_leds sparse_keymap snd_timer wmi_bmof snd rfkill mei_me tpm_crb processor_thermal_device mei intel_lpss_pci soundcore tpm_tis intel_lpss intel_pch_thermal intel_soc_dts_iosf thermal i2c_i801 tpm_tis_core tpm ac int3403_thermal int3400_thermal battery rng_core acpi_thermal_rel int340x_thermal_zone asus_wireless acpi_pad evdev nfsd cuse fuse auth_rpcgss nfs_acl lockd grace nvram sunrpc sch_fq_codel ip_tables x_tables ipv6 crc_ccitt autofs4 nouveau xhci_pci xhci_hcd usbcore ttm serio_raw usb_common i915 i2c_hid mxm_wmi hid i2c_algo_bit drm_kms_helper video wmi button drm [last unloaded: vboxdrv]
Feb 17 09:53:29 ftglap kernel: CPU: 1 PID: 11912 Comm: plymouthd Tainted: G        W  O      4.20.9-desktop-1.mga7 #1
Feb 17 09:53:29 ftglap kernel: Hardware name: ASUSTeK COMPUTER INC. X510UNR/X510UNR, BIOS X510UNR.302 11/14/2017
Feb 17 09:53:29 ftglap kernel: RIP: 0010:nvkm_pmu_reset+0x14c/0x160 [nouveau]
Feb 17 09:53:29 ftglap kernel: Code: 5c 41 5d 41 5e c3 48 8b 7d 10 48 8b 5f 50 48 85 db 74 1e e8 06 5e f6 de 48 89 da 48 c7 c7 31 93 84 c0 48 89 c6 e8 ce 6f 98 de <0f> 0b e9 42 ff ff ff 48 8b 5f 10 eb dc 48 8b 5f 10 eb a5 90 0f 1f
Feb 17 09:53:29 ftglap kernel: RSP: 0018:ffffb07483edb908 EFLAGS: 00010286
Feb 17 09:53:29 ftglap kernel: RAX: 0000000000000000 RBX: ffff979824805a10 RCX: 0000000000000006
Feb 17 09:53:29 ftglap kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff979826a564c0
Feb 17 09:53:29 ftglap kernel: RBP: ffff9798249ac800 R08: 0000000000000030 R09: 0000000000000004
Feb 17 09:53:29 ftglap kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff97981f88ef60
Feb 17 09:53:29 ftglap kernel: R13: ffff979821ed8a80 R14: 0000424c21f67960 R15: ffff9798248340b0
Feb 17 09:53:29 ftglap kernel: FS:  00007f664ee4c740(0000) GS:ffff979826a40000(0000) knlGS:0000000000000000
Feb 17 09:53:29 ftglap kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 17 09:53:29 ftglap kernel: CR2: 000000000227cac8 CR3: 00000001e31ee006 CR4: 00000000003606e0
Feb 17 09:53:29 ftglap kernel: Call Trace:
Feb 17 09:53:29 ftglap kernel:  nvkm_pmu_init+0x16/0x40 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_subdev_init+0xb2/0x200 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_device_init+0x123/0x280 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_udevice_init+0x41/0x60 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_object_init+0x3e/0x100 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nouveau_do_resume+0x28/0x150 [nouveau]
Feb 17 09:53:29 ftglap kernel:  nouveau_pmops_runtime_resume+0x88/0x150 [nouveau]
Feb 17 09:53:29 ftglap kernel:  pci_pm_runtime_resume+0x74/0xd0
Feb 17 09:53:29 ftglap kernel:  ? pci_restore_standard_config+0x40/0x40
Feb 17 09:53:29 ftglap kernel:  __rpm_callback+0xca/0x1d0
Feb 17 09:53:29 ftglap kernel:  ? pci_restore_standard_config+0x40/0x40
Feb 17 09:53:29 ftglap kernel:  rpm_callback+0x1f/0x70
Feb 17 09:53:29 ftglap kernel:  ? pci_restore_standard_config+0x40/0x40
Feb 17 09:53:29 ftglap kernel:  rpm_resume+0x5c0/0x7f0
Feb 17 09:53:29 ftglap kernel:  __pm_runtime_resume+0x47/0x70
Feb 17 09:53:29 ftglap kernel:  nouveau_drm_open+0x39/0x170 [nouveau]
Feb 17 09:53:29 ftglap kernel:  ? _cond_resched+0x15/0x30
Feb 17 09:53:29 ftglap kernel:  ? kmem_cache_alloc_trace+0x1bb/0x1d0
Feb 17 09:53:29 ftglap kernel:  ? cap_inode_getsecurity+0x240/0x240
Feb 17 09:53:29 ftglap kernel:  drm_file_alloc+0x167/0x250 [drm]
Feb 17 09:53:29 ftglap kernel:  drm_open+0xa7/0x1e0 [drm]
Feb 17 09:53:29 ftglap kernel:  drm_stub_open+0xaf/0xf0 [drm]
Feb 17 09:53:29 ftglap kernel:  chrdev_open+0x9e/0x1a0
Feb 17 09:53:29 ftglap kernel:  ? cdev_put.part.3+0x20/0x20
Feb 17 09:53:29 ftglap kernel:  do_dentry_open+0x12f/0x330
Feb 17 09:53:29 ftglap kernel:  path_openat+0x32c/0x16a0
Feb 17 09:53:29 ftglap kernel:  ? filename_lookup.part.63+0xe0/0x170
Feb 17 09:53:29 ftglap kernel:  ? __check_object_size+0x15d/0x189
Feb 17 09:53:29 ftglap kernel:  do_filp_open+0x93/0x100
Feb 17 09:53:29 ftglap kernel:  ? vfs_statx+0x73/0xe0
Feb 17 09:53:29 ftglap kernel:  ? __check_object_size+0x15d/0x189
Feb 17 09:53:29 ftglap kernel:  do_sys_open+0x186/0x210
Feb 17 09:53:29 ftglap kernel:  do_syscall_64+0x55/0x100
Feb 17 09:53:29 ftglap kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 17 09:53:29 ftglap kernel: RIP: 0033:0x7f664f0d8a10
Feb 17 09:53:29 ftglap kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 36 48 8d 05 d7 59 0d 00 8b 00 85 c0 75 5a 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 84 00 00 00 48 83 c4 68 5b 5d c3 0f 1f 44
Feb 17 09:53:29 ftglap kernel: RSP: 002b:00007ffc6bc89260 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Feb 17 09:53:29 ftglap kernel: RAX: ffffffffffffffda RBX: 0000000002099f70 RCX: 00007f664f0d8a10
Feb 17 09:53:29 ftglap kernel: RDX: 0000000000000002 RSI: 000000000209b140 RDI: 00000000ffffff9c
Feb 17 09:53:29 ftglap kernel: RBP: 0000000000000002 R08: 000000000209a4c0 R09: 0000000000000000
Feb 17 09:53:29 ftglap kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f664ee4c6c0
Feb 17 09:53:29 ftglap kernel: R13: 0000000000000000 R14: 00007f664f1c3538 R15: 00007f664f1c34a8
Feb 17 09:53:29 ftglap kernel: ---[ end trace ed5cb8771bf878dc ]---
Feb 17 09:53:29 ftglap systemd[1]: Started Show Plymouth Power Off Screen.
Feb 17 09:53:29 ftglap systemd[1]: Stopped Session c1 of user sddm.
Feb 17 09:53:29 ftglap systemd[1]: Removed slice User Slice of sddm.
Feb 17 09:53:29 ftglap systemd[1]: Stopping Login Service...
Feb 17 09:53:29 ftglap systemd[1]: Stopping Permit User Sessions...
Feb 17 09:53:29 ftglap systemd[1]: Stopped Permit User Sessions.
Feb 17 09:53:29 ftglap systemd[1]: Stopped target Remote File Systems.
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /mnt/ftgme2.data...
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /data/ftg/.thunderbird...
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /mnt/ftgme2.data2...
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /mnt/ftgme2...
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /mnt/cauldron...
Feb 17 09:53:29 ftglap systemd[1]: Unmounting /mnt/ftgme2.usr.local...
Comment 3 Frank Griffin 2019-02-19 16:36:25 CET
Another change is that the oops happens without use of the magic key.