Description of problem: When an AER happens (which I can trigger here relatively reliably) then the kernel becomes unresponsive because it isn't handled. The only option is a cold reboot. Version-Release number of selected component (if applicable): kernel-server-4.14.20-1.mga6-1-1.mga6 How reproducible: fairly reliable Steps to Reproduce: 1. Addonics AD4ES6GPX4 HBA with Marvell 9705PM and 5 SSDs 2. for n in a b c d e; do time dd if=/dev/sd$n bs=128k count=200000 of=/dev/null & done [ 161.530384] pcieport 0000:00:1d.0: AER: Uncorrected (Fatal) error received: id=00e8 [ 161.538263] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=00e8(Receiver ID) [ 161.549885] pcieport 0000:00:1d.0: device [8086:a118] error status/mask=00020000/00010000 [ 161.558412] pcieport 0000:00:1d.0: [17] Receiver Overflow (First) [ 161.565324] pcieport 0000:00:1d.0: broadcast error_detected message [ 161.565326] ahci 0000:44:00.0: device has no AER-aware driver [ 161.565327] ahci 0000:45:00.0: device has no AER-aware driver [ 162.630587] pcieport 0000:00:1d.0: Root Port link has been reset [ 162.630604] pcieport 0000:00:1d.0: AER: Device recovery failed [ 192.450092] ------------[ cut here ]------------ [ 192.454813] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2725 rcu_process_callbacks+0x4d6/0x4f0 [ 192.463545] Modules linked in: xt_recent ip6table_nat nf_nat_ipv6 nf_nat xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_mangle nf_conntrack_snmp xt_tcpudp xt_CT ip6table_raw xt_multiport nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv6 nf_log_common nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack ip6table_filter ip6_tables x_tables af_packet msr sunrpc ses enclosure snd_hda_codec_hdmi snd_hda_codec_ca0132 i915 drm_kms_helper drm i2c_algo_bit intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel [ 192.535227] kvm_intel snd_hda_codec iTCO_wdt iTCO_vendor_support kvm snd_hda_core nls_utf8 irqbypass intel_cstate nls_cp437 snd_hwdep mxm_wmi intel_uncore snd_pcm e1000e vfat intel_rapl_perf fat ixgbe snd_timer snd i2c_i801 soundcore alx ptp mpt3sas joydev pps_core mdio evdev dca input_leds raid_class scsi_transport_sas fan thermal wmi video acpi_pad button mei_me mei shpchp sch_fq_codel gpio_it87 it87 hwmon_vid efivarfs ipv6 crc_ccitt autofs4 algif_skcipher af_alg dm_crypt uas usb_storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc xhci_pci xhci_hcd aesni_intel aes_x86_64 crypto_simd cryptd glue_helper usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod ide_pci_generic ide_core [ 192.601260] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.20-server-1.mga6 #1 [ 192.608653] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 7/Z170X-Gaming 7, BIOS F22j 01/11/2018 [ 192.618654] task: ffffffff992124c0 task.stack: ffffffff99200000 [ 192.624695] RIP: 0010:rcu_process_callbacks+0x4d6/0x4f0 [ 192.630024] RSP: 0018:ffff8a908ec03f10 EFLAGS: 00010002 [ 192.635327] RAX: 0000000000000000 RBX: ffff8a908ec23180 RCX: ffff8a907c73e118 [ 192.642565] RDX: ffffffffffffd801 RSI: ffff8a908ec03f20 RDI: ffff8a908ec231b8 [ 192.649854] RBP: ffffffff99250380 R08: 0000000000000246 R09: 0000000000000001 [ 192.657143] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a908ec231b8 [ 192.664414] R13: ffffffff992124c0 R14: fffffffffffffffb R15: 7fffffffffffffff [ 192.671692] FS: 0000000000000000(0000) GS:ffff8a908ec00000(0000) knlGS:0000000000000000 [ 192.679927] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 192.685822] CR2: 00007f108506e140 CR3: 000000042a20a005 CR4: 00000000003606f0 [ 192.693093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 192.700399] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 192.707694] Call Trace: [ 192.710199] <IRQ> [ 192.712255] __do_softirq+0xf5/0x295 [ 192.715885] irq_exit+0xae/0xb0 [ 192.719075] smp_apic_timer_interrupt+0x70/0x130 [ 192.723788] apic_timer_interrupt+0x7d/0x90 [ 192.728070] </IRQ> [ 192.730229] RIP: 0010:cpuidle_enter_state+0xa4/0x300 [ 192.735298] RSP: 0018:ffffffff99203e88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 192.743004] RAX: ffff8a908ec22480 RBX: 0000000000000006 RCX: 000000000000001f [ 192.750240] RDX: 0000000000000000 RSI: 00000000258f0602 RDI: 0000000000000000 [ 192.757528] RBP: ffffffff992b8960 R08: ffff8a908ec214c4 R09: 0000000000000018 [ 192.764781] R10: 000000000000248b R11: 0000000000004d0c R12: ffff8a908ec2b400 [ 192.772062] R13: ffffffff992b8bb8 R14: 0000002cce52e067 R15: 0000002cceeb23d9 [ 192.779334] ? cpuidle_enter_state+0x92/0x300 [ 192.783798] do_idle+0x185/0x1e0 [ 192.787091] cpu_startup_entry+0x6f/0x80 [ 192.791078] start_kernel+0x4cb/0x4eb [ 192.794812] secondary_startup_64+0xa5/0xb0 [ 192.799111] Code: 17 01 0f 8f 80 fd ff ff 48 8b 15 f6 25 17 01 48 89 93 b0 00 00 00 e9 6d fd ff ff 4c 89 f6 4c 89 e7 e8 5f 73 72 00 e9 eb fb ff ff <0f> 0b e9 9e fd ff ff 0f 0b e9 9d fc ff ff e8 e7 4f f9 ff 0f 1f [ 192.818368] ---[ end trace 7872bf286971a2ac ]---
Assignee: bugsquad => kernelCC: (none) => marja11
Got a very similar trace with an ASMedia controller without the AER ... [10105.578852] ------------[ cut here ]------------ [10105.583583] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2725 rcu_process_call backs+0x4d6/0x4f0 [10105.592403] Modules linked in: xt_recent ip6table_nat nf_nat_ipv6 nf_nat xt_c omment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_ma ngle nf_conntrack_snmp xt_tcpudp xt_CT ip6table_raw xt_multiport nf_conntrack_ip v6 nf_defrag_ipv6 xt_conntrack xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv6 nf_log_ common nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf _conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_c onntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf _conntrack_amanda nf_conntrack ip6table_filter ip6_tables x_tables af_packet msr sunrpc ses enclosure snd_hda_codec_hdmi snd_hda_codec_ca0132 i915 intel_rapl nl s_utf8 x86_pkg_temp_thermal intel_powerclamp drm_kms_helper nls_cp437 coretemp v fat drm i2c_algo_bit [10105.664977] fat kvm_intel ixgbe kvm snd_hda_intel e1000e snd_hda_codec snd_h da_core snd_hwdep irqbypass snd_pcm intel_cstate snd_timer intel_uncore snd iTCO _wdt iTCO_vendor_support ptp soundcore mxm_wmi intel_rapl_perf i2c_i801 pps_core wmi alx evdev input_leds joydev mpt3sas mdio raid_class scsi_transport_sas dca mei_me fan thermal video acpi_pad mei button shpchp sch_fq_codel gpio_it87 it87 hwmon_vid efivarfs ipv6 crc_ccitt autofs4 algif_skcipher af_alg dm_crypt uas usb _storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash _clmulni_intel pcbc xhci_pci aesni_intel xhci_hcd aes_x86_64 crypto_simd cryptd glue_helper usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod ide_pci_ge neric ide_core [10105.730080] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.20-server-1.mga6 # 1 [10105.737413] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 7/Z170X -Gaming 7, BIOS F22j 01/11/2018 [10105.747414] task: ffffffff9e2124c0 task.stack: ffffffff9e200000 [10105.753457] RIP: 0010:rcu_process_callbacks+0x4d6/0x4f0 [10105.758777] RSP: 0018:ffff888e0ec03f10 EFLAGS: 00010002 [10105.764106] RAX: 0000000000000000 RBX: ffff888e0ec23180 RCX: 0000000180040002 [10105.771393] RDX: ffffffffffffd801 RSI: ffff888e0ec03f20 RDI: ffff888e0ec231b8 [10105.778639] RBP: ffffffff9e250380 R08: 00000000fb733b01 R09: 0000000180040002 [10105.785895] R10: ffff888e0ec03e30 R11: 0000000000000000 R12: ffff888e0ec231b8 [10105.793148] R13: ffffffff9e2124c0 R14: fffffffffffffffc R15: 7fffffffffffffff [10105.800428] FS: 0000000000000000(0000) GS:ffff888e0ec00000(0000) knlGS:00000 00000000000 [10105.808643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10105.814477] CR2: 00007f41a1383000 CR3: 000000036820a003 CR4: 00000000003606f0 [10105.821782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10105.829010] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [10105.836246] Call Trace: [10105.838745] <IRQ> [10105.840825] __do_softirq+0xf5/0x295 [10105.844447] irq_exit+0xae/0xb0 [10105.847670] smp_apic_timer_interrupt+0x70/0x130 [10105.852377] apic_timer_interrupt+0x7d/0x90 [10105.856632] </IRQ> [10105.858772] RIP: 0010:cpuidle_enter_state+0xa4/0x300 [10105.863833] RSP: 0018:ffffffff9e203e88 EFLAGS: 00000246 ORIG_RAX: fffffffffff fff10 [10105.871511] RAX: ffff888e0ec22480 RBX: 0000000000000006 RCX: 000000000000001f [10105.878791] RDX: 0000000000000000 RSI: 00000000258f0602 RDI: 0000000000000000 [10105.886089] RBP: ffffffff9e2b8960 R08: ffff888e0ec214c4 R09: 0000000000000018 [10105.893360] R10: 000000000000194b R11: 0000000000002547 R12: ffff888e0ec2b400 [10105.900622] R13: ffffffff9e2b8bb8 R14: 00000930e2d78395 R15: 00000930e36fd284 [10105.907861] ? cpuidle_enter_state+0x92/0x300 [10105.912299] do_idle+0x185/0x1e0 [10105.915599] cpu_startup_entry+0x6f/0x80 [10105.919578] start_kernel+0x4cb/0x4eb [10105.923321] secondary_startup_64+0xa5/0xb0 [10105.927576] Code: 17 01 0f 8f 80 fd ff ff 48 8b 15 f6 25 17 01 48 89 93 b0 00 00 00 e9 6d fd ff ff 4c 89 f6 4c 89 e7 e8 5f 73 72 00 e9 eb fb ff ff <0f> 0b e9 9e fd ff ff 0f 0b e9 9d fc ff ff e8 e7 4f f9 ff 0f 1f [10105.946764] ---[ end trace 9332ec21af281a5e ]---
CC: (none) => herbert
Mageia 6 changed to end-of-life (EOL) status on 2019-09-30. It is no longer maintained, which means that it will not receive any further security or bug fix updates. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Mageia version. Bug Reporter: Thank you for reporting this issue and we are sorry that we weren't able to fix it before Mageia 6's end of life. If you are able to reproduce it against a later version of Mageia, you are encouraged to click on "Version" and change it against that version of Mageia. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Mageia release includes newer upstream software that fixes bugs or makes them obsolete. If you would like to help fixing bugs in the future, don't hesitate to join the packager team via our mentoring program [1] or join the teams that fit you most [2]. [1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager [2] http://www.mageia.org/contribute/ Best regards, Aurélien Bugsquad Team
CC: (none) => ouaurelienStatus: NEW => RESOLVEDResolution: (none) => OLD