Trying to run a mass rebuild I got a lot of processes blocked forever, mostly urpmi waiting for filetriggers, gconftool-2 once calling /usr/bin/killall -q -HUP /usr/libexec/gconfd-2 which hangs. Also some configure scripts call ps which hangs too. Trying to attach to the killall process hangs The stack of killall process: [root@instance-2 pterjan]# cat /proc/4446/stack [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff8106968b>] get_mm_exe_file+0x1b/0x40 [<ffffffff81228cc5>] proc_exe_link+0x55/0xa0 [<ffffffff81227fba>] proc_pid_follow_link+0x4a/0x70 [<ffffffff811cceda>] path_lookupat+0x61a/0xd10 [<ffffffff811cd5f6>] filename_lookup.isra.25+0x26/0x80 [<ffffffff811d0694>] user_path_at_empty+0x54/0xa0 [<ffffffff811d06f1>] user_path_at+0x11/0x20 [<ffffffff811c4012>] vfs_fstatat+0x52/0xa0 [<ffffffff811c44af>] SYSC_newstat+0x1f/0x40 [<ffffffff811c46ee>] SyS_newstat+0xe/0x10 [<ffffffff816c29ed>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Using kernel 3.18.3-server-1.mga5 Reproducible: Steps to Reproduce:
Also, rebooting and restarting the build, it happened again. (It could be some package leaving a process in a strange state)
# for d in *; do echo $d; readlink $d/exe; done pointed me to 1671 cat /proc/1671/cmdline also hangs # cat /proc/1671/stack [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [<ffffffff816c49e8>] page_fault+0x28/0x30 [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [<ffffffff81014190>] do_notify_resume+0x70/0x90 [<ffffffff816c37a2>] retint_signal+0x48/0x86 [<ffffffffffffffff>] 0xffffffffffffffff fd give me an indication, it is from the build of gcc: l-wx------ 1 pterjan 1001 64 Jan 20 11:00 6 -> /home/pterjan/build/chroot_tmp/pterjan/chroot_cauldron.x86_64.0.20150119221317_279/home/pterjan/rpmbuild/BUILD/gcc-4.9.2/obj-x86_64-mageia-linux-gnu/gcc/testsuite/ada/acats1/tests/cb/cb1010d/cb1010d.log (deleted) this is probably related to the hung tasks in kernel logs: [ 6120.690552] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6120.692575] Not tainted 3.18.3-server-1.mga5 #1 [ 6120.694051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6120.696366] c52103x D ffff881a3fd932c0 0 22055 22052 0x00000000 [ 6120.698400] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6120.701117] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6120.703643] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6120.706110] Call Trace: [ 6120.706970] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6120.709045] [<ffffffff816be469>] schedule+0x29/0x70 [ 6120.710906] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6120.712481] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6120.714115] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6120.715461] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6120.716799] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6120.718149] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6120.719588] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6120.721058] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6120.722516] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6120.723793] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6120.725116] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6120.726512] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6120.728123] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6120.730040] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6120.732086] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6120.733734] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6120.735242] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6120.736970] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6240.730119] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6240.731902] Not tainted 3.18.3-server-1.mga5 #1 [ 6240.733013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6240.735002] c52103x D ffff881a3fd932c0 0 22055 1 0x00000004 [ 6240.736674] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6240.738463] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6240.740169] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6240.741968] Call Trace: [ 6240.742580] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6240.743957] [<ffffffff816be469>] schedule+0x29/0x70 [ 6240.745107] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6240.746561] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6240.748140] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6240.749408] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6240.750683] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6240.751830] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6240.753108] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6240.754629] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6240.755873] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6240.757046] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6240.758340] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6240.759637] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6240.760879] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6240.762189] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6240.763617] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6240.764827] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6240.766094] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6240.767389] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6360.760116] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6360.774011] Not tainted 3.18.3-server-1.mga5 #1 [ 6360.775031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6360.776668] c52103x D ffff881a3fd932c0 0 22055 1 0x00000004 [ 6360.778177] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6360.779692] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6360.781581] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6360.783513] Call Trace: [ 6360.784103] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6360.785484] [<ffffffff816be469>] schedule+0x29/0x70 [ 6360.786638] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6360.788125] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6360.789663] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6360.790860] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6360.792105] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6360.793132] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6360.794197] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6360.795344] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6360.796392] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6360.797488] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6360.798560] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6360.799615] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6360.800717] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6360.801837] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6360.803098] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6360.804213] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6360.805408] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6360.806665] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6480.800129] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6480.808454] Not tainted 3.18.3-server-1.mga5 #1 [ 6480.810351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6480.813013] c52103x D ffff881a3fd932c0 0 22055 1 0x00000004 [ 6480.815504] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6480.818257] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6480.821025] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6480.823696] Call Trace: [ 6480.824662] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6480.826747] [<ffffffff816be469>] schedule+0x29/0x70 [ 6480.828684] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6480.832699] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6480.835059] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.837565] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.841095] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6480.843038] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6480.848955] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6480.851874] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6480.853900] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6480.857069] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.858946] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6480.860848] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6480.862638] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6480.864538] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6480.866504] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6480.868290] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6480.870140] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6480.871924] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6480.873491] INFO: task c52104x:9701 blocked for more than 120 seconds. [ 6480.875570] Not tainted 3.18.3-server-1.mga5 #1 [ 6480.877195] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6480.879655] c52104x D ffff881a3fd132c0 0 9701 9697 0x00000000 [ 6480.882041] ffff881751227b30 0000000000000082 ffff8810c2d74490 00000000000132c0 [ 6480.884318] ffff881751227fd8 00000000000132c0 ffff8813f7734510 ffff8810c2d74490 [ 6480.886380] ffffffff8117f676 ffff8810c2d74490 ffff8819b9c2b8e0 ffff8819b9c2b8f8 [ 6480.888496] Call Trace: [ 6480.889192] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6480.890789] [<ffffffff816be469>] schedule+0x29/0x70 [ 6480.892097] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6480.893833] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6480.895621] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.897145] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.898639] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6480.900007] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6480.901623] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6480.903050] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6480.904614] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6480.906365] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6480.908214] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6480.910246] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6480.912055] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6480.913800] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6480.915451] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6480.916885] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6480.918574] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6480.920240] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6600.920124] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6600.951937] Not tainted 3.18.3-server-1.mga5 #1 [ 6600.953545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6600.955684] c52103x D ffff881a3fd932c0 0 22055 1 0x00000004 [ 6600.957942] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6600.960221] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6600.962655] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6600.964905] Call Trace: [ 6600.965624] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6600.967336] [<ffffffff816be469>] schedule+0x29/0x70 [ 6600.968541] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6600.969870] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6600.971298] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6600.972492] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6600.973670] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6600.974881] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6600.976136] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6600.977436] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6600.978637] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6600.979757] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6600.980945] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6600.981946] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6600.982949] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6600.984068] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6600.985294] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6600.986348] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6600.987482] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6600.988553] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6601.035480] INFO: task c52104x:9701 blocked for more than 120 seconds. [ 6601.037186] Not tainted 3.18.3-server-1.mga5 #1 [ 6601.038721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6601.040680] c52104x D ffff881a3fd132c0 0 9701 1 0x00000004 [ 6601.042654] ffff881751227b30 0000000000000082 ffff8810c2d74490 00000000000132c0 [ 6601.044910] ffff881751227fd8 00000000000132c0 ffff8813f7734510 ffff8810c2d74490 [ 6601.046909] ffffffff8117f676 ffff8810c2d74490 ffff8819b9c2b8e0 ffff8819b9c2b8f8 [ 6601.048531] Call Trace: [ 6601.049055] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6601.050335] [<ffffffff816be469>] schedule+0x29/0x70 [ 6601.051480] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6601.052988] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6601.054482] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6601.055582] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6601.056644] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6601.057728] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6601.058979] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6601.060203] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6601.061220] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6601.062187] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6601.063213] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6601.064311] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6601.065257] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6601.066287] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6601.067500] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6601.068458] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6601.069455] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6601.070753] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6721.070096] INFO: task c52103x:22055 blocked for more than 120 seconds. [ 6721.072422] Not tainted 3.18.3-server-1.mga5 #1 [ 6721.074140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6721.076677] c52103x D ffff881a3fd932c0 0 22055 1 0x00000004 [ 6721.079226] ffff88140c97bb30 0000000000000082 ffff8819abf8e590 00000000000132c0 [ 6721.081880] ffff88140c97bfd8 00000000000132c0 ffff881850d8a310 ffff8819abf8e590 [ 6721.084173] ffffffff8117f676 ffff8819abf8e590 ffff8818b4f0e4a0 ffff8818b4f0e4b8 [ 6721.086359] Call Trace: [ 6721.087083] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6721.088536] [<ffffffff816be469>] schedule+0x29/0x70 [ 6721.089713] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6721.091345] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6721.092984] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.094334] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.095701] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6721.097118] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6721.098572] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6721.100226] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6721.102114] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6721.103858] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.105712] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6721.107688] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6721.109323] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6721.111192] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6721.113180] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6721.114852] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6721.116674] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6721.118517] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6721.120295] INFO: task c52104x:9701 blocked for more than 120 seconds. [ 6721.122367] Not tainted 3.18.3-server-1.mga5 #1 [ 6721.123993] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6721.126589] c52104x D ffff881a3fd132c0 0 9701 1 0x00000004 [ 6721.129316] ffff881751227b30 0000000000000082 ffff8810c2d74490 00000000000132c0 [ 6721.132558] ffff881751227fd8 00000000000132c0 ffff8813f7734510 ffff8810c2d74490 [ 6721.137471] ffffffff8117f676 ffff8810c2d74490 ffff8819b9c2b8e0 ffff8819b9c2b8f8 [ 6721.144201] Call Trace: [ 6721.145293] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6721.148190] [<ffffffff816be469>] schedule+0x29/0x70 [ 6721.150424] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6721.154303] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6721.156676] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.158910] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.160973] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6721.163021] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6721.165230] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6721.168602] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6721.170293] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6721.171872] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.173559] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6721.175371] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6721.176998] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6721.178700] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6721.180688] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6721.182201] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6721.183795] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6721.185418] [<ffffffff816c37a2>] retint_signal+0x48/0x86 [ 6721.187073] INFO: task c52104y:6850 blocked for more than 120 seconds. [ 6721.189170] Not tainted 3.18.3-server-1.mga5 #1 [ 6721.190744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6721.193014] c52104y D ffff881a3fdd32c0 0 6850 6846 0x00000000 [ 6721.195174] ffff88119eeafb30 0000000000000082 ffff881372d14590 00000000000132c0 [ 6721.197567] ffff88119eeaffd8 00000000000132c0 ffff88189c53a4d0 ffff881372d14590 [ 6721.199912] ffffffff8117f676 ffff881372d14590 ffff88119d0177a0 ffff88119d0177b8 [ 6721.202559] Call Trace: [ 6721.203396] [<ffffffff8117f676>] ? expand_downwards+0x86/0x2a0 [ 6721.205141] [<ffffffff816be469>] schedule+0x29/0x70 [ 6721.206595] [<ffffffff816c123d>] rwsem_down_read_failed+0xdd/0x120 [ 6721.208372] [<ffffffff813c8544>] call_rwsem_down_read_failed+0x14/0x30 [ 6721.210451] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.212059] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.213633] [<ffffffff816c08c7>] ? down_read+0x17/0x20 [ 6721.215167] [<ffffffff8105b21c>] __do_page_fault+0x42c/0x5c0 [ 6721.216947] [<ffffffff81079168>] ? __send_signal+0x178/0x4a0 [ 6721.218715] [<ffffffff8105b3d2>] do_page_fault+0x22/0x30 [ 6721.220335] [<ffffffff816c49e8>] page_fault+0x28/0x30 [ 6721.221935] [<ffffffff813c8675>] ? __clear_user+0x25/0x50 [ 6721.223515] [<ffffffff8102044e>] save_xstate_sig+0x20e/0x230 [ 6721.225243] [<ffffffff81013ec2>] do_signal+0x952/0xbb0 [ 6721.226814] [<ffffffff8109c99d>] ? set_next_entity+0x9d/0xb0 [ 6721.228731] [<ffffffff81490000>] ? regulator_min_uA_show+0x70/0x70 [ 6721.231164] [<ffffffff814a18d4>] ? pty_write+0x54/0x60 [ 6721.233206] [<ffffffff816bdf81>] ? __schedule+0x3a1/0x860 [ 6721.235284] [<ffffffff81014190>] do_notify_resume+0x70/0x90 [ 6721.237435] [<ffffffff816c37a2>] retint_signal+0x48/0x86
Summary: ps/killall hangs => ada tests during gcc build trigger kernel problem
I will do more tests tonight (like building gcc on kernel-linus without other load on the machine).
Also important note, this is building on a tmpfs.
I could reproduce when building gcc on 3.18.3-server-1.mga5 without anything else running on the machine. I could not reproduce with -linus-3.18.3-1.mga5 (tried 3 times). I could also not reproduce with 3.18.2-server-1.mga5 (tried only once)
Assignee: bugsquad => tmb
Reproduced on 3.18.3-desktop-1.mga5 too
Hm, iirc there was a thread recently on LKML regarding mm/thp relating to expand_downwards... But it's weird the 3.18.2 worked but 3.18.3 not as I didn't change anything besides the upstream -stable patch (and dropped merged ones) and since kernel-linus works (and is built with the same defconfig as desktop kernel) I guess some of our other patches got in trouble with 3.18.3... Hm, If you have time, can you try to disable the AUFS patches: fs-aufs-3.18.patch fs-aufs-3.18-modular.patch fs-aufs-adapt-for-3.18.1-d_child-change.patch and see if the problem goes away ?
For the record, it also happens when building on ext4 instead of tmpfs. I built a kernel without aufs (there is also a line to remove in the spec) but could still reproduce. Given the problem I will try without the x86-mm* patches
I couldn't reproduce after dropping the 3 patches: x86-mm-consolidate-VM_FAULT_RETRY-handling.patch x86-mm-move-mmap_sem-unlock-from-mm_fault_error-to-c.patch x86-mm-fix-VM_FAULT_RETRY-handling.patch The second one looks suspicious (double up_read): up_read(&mm->mmap_sem); if (unlikely(fault & VM_FAULT_ERROR)) { + up_read(&mm->mmap_sem);
Ah this seems due to the inversion of order of patches, the first one was moving it out but is now adding it outside, this one was originally adding it inside but it was not supposed to already be outside.
I committed a possibly fixed patch, time to sleep, I'll test with it tomorrow.
Oops, well spotted... I wonder how I managed to get them reversed... :/ That is the correct fix.
Tried with the fixed patch, and things are good. Do you have another kernel planned soon or should I upload this one?
kernel-3.18.3-2.mga5 already building and should be uploaded within ~1 hour
Ah sorry, should have checked :)
Closing
Status: NEW => RESOLVEDResolution: (none) => FIXED