Description of problem: When I wanted to wake up screens, no success, even if I got mouse cursor moving. After a while, I can commute to tty3. I see in journal that the graphical driver crashed. I had to reboot. Environment: LXQt, using kwin xscreensaver Card:Intel 810 and later: Intel Corporation|HD Graphics 620 [DISPLAY_VGA] (vendor:8086 device:5916 subv:1043 subd:16e0) (rev: 02) One internal monitor, one external monitor connected through HDMI.
Created attachment 11430 [details] Journal of the crash
Created attachment 11431 [details] Dump core of the crash
uname -a Linux YZenbook.local 5.4.2-desktop-1.mga7 #1 SMP Thu Dec 5 17:40:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux rpm -qa --last |grep intel x11-driver-video-intel-2.99.917-57.mga7.x86_64 lun. 04 nov. 2019 16:58:55 CET lib64drm_intel1-2.4.100-1.mga7.x86_64 lun. 04 nov. 2019 15:51:03 CET vaapi-driver-intel-2.3.0-2.mga7.x86_64 lun. 08 juil. 2019 08:03:39 CEST intel-gpu-tools-1.23-3.mga7.x86_64 lun. 08 juil. 2019 07:50:20 CEST
Thanks for this report, Yves; and the evidence attached. Can you give a bit of background: - did this happen on first use of the system (hence always since) ? - if not, is it frequent ? Or occasional ? Or one-off ? Assigning to the kernel/drivers group.
Assignee: bugsquad => kernel
Hello Lewis, This started today, with no apparent reason (non new parameter, no updates). It seems to be linked to screen saver or something which happens at the same time.
Hello, I found a recent bug report which seems similar. https://gitlab.freedesktop.org/drm/intel/issues/673 It seems that a patch is needed against intel driver.
CC: (none) => choucroot
Hello, same issue since kernel 5.4.2. frequent and unpredictable freezing. Here some kernel dmesg logs: [12651.016186] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0 [12651.016189] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. ... ... [12651.016193] GPU crash dump saved to /sys/class/drm/card0/error Unfortunately, the dump is empty # more /sys/class/drm/card0/error No error state collected
(In reply to papoteur from comment #6) > Hello, > I found a recent bug report which seems similar. > https://gitlab.freedesktop.org/drm/intel/issues/673 > It seems that a patch is needed against intel driver. Thanks for the pointer, I will add it to next kernel build
CC: (none) => tmb
Does kernel-desktop-5.4.6-2.1.mga7 from: http://ftp.free.fr/mirrors/mageia.org/people/tmb/mga7/bugs/25930/ work any better ?
Hello. On my machine with this kernel, I notice the following traces before lightdm : i915 :Failed to idle engines, declaring wedged! i915 :Failed to initialize GPU, declaring it wedged! When login into xfce, it is a mess : xfce panel doesn't work properly, the multi- workspace feature is just one desk where the graphical windows doesn't have a control bar anymore ( with stop button, maximize, ....)
Created attachment 11435 [details] Journal of a session with experimental kernel On my side, the experimentation is not good neither. Using LXQt and kwin, it seems kwin restarts several times until it said that too many crashes occurred. on a session, I saw a message inviting to add drm.debug on kernel line. This is the journal from the session with the drm.debug enabled.
There is now a kernel-5.4.7-1.mga7 in updates_testing that has an updated version of the fix sent by the Intel devs, so please try and see if that works any better
Hello, very satisfying update. No hand since nearly 2 hour, duration I could never achieve with 5.4.6. Even if I din't have a precise reproductible scenario, I did lot of tasks that usually ended in hanging with 5.4.6. Here the system is stable. [afb@localhost Bureau]$ uname -r 5.4.7-desktop-1.mga7 [afb@localhost Bureau]$ w 20:31:10 up 1:56, 1 user, load average: 0,29, 0,98, 0,85 UTIL. TTY LOGIN@ IDLE JCPU PCPU QUOI afb tty1 18:34 1:56m 3:13 1.01s xfce4-session Bravo !
Hello. I could notice a little 10 ms freeze than back to normal. During watching a video. But no hang this time ! Here are the logs from dmesg: [ 6410.379742] i915 0000:00:02.0: Resetting rcs0 for stuck wait on rcs0 Maybe it has nothing to do about the fix, but I give you the information.
Created attachment 11446 [details] /sys/class/drm/card0/error after a GPU crash
Unfortunately, new crashs under 5.4.7-desktop-1.mga7 . This time, only occured while surfing with the "Falkon" navigator. The 3rd time, weird but it came back to life after a few seconds. Thus the file /sys/class/drm/card0/error is accessible. I added it in the attachments.
An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGASA-2020-0036.html
Status: NEW => RESOLVEDResolution: (none) => FIXED
Reopening, because comment 11 and comment 16 say that the problem is still no fixed, and I have the exact same problem with kernel 5.4.6 and 5.4.10 (cannot reproduce with 5.3.13), see bug 26049 comment 20 and bug 26049 comment 22.
Status: RESOLVED => REOPENEDCC: (none) => LpSolitResolution: FIXED => (none)
Created attachment 11456 [details] GPU crash dump I attached the output of /sys/class/drm/card0/error. Moreover, dmesg prints: [ 3442.830245] SUPR0GipMap: fGetGipCpu=0xb [ 3443.390048] vboxdrv: 000000003c425d44 VMMR0.r0 [ 3443.470930] vboxdrv: 00000000bf204d81 VBoxDDR0.r0 [ 3443.555065] vboxdrv: 0000000004369794 VBoxEhciR0.r0 [ 3915.972528] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0 [ 3915.972529] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 3915.972530] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 3915.972530] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 3915.972530] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 3915.972531] GPU crash dump saved to /sys/class/drm/card0/error [ 3915.973535] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 3915.974263] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 3915.974402] i915 0000:00:02.0: Resetting chip for hang on rcs0 [...] [ 3941.957842] i915 0000:00:02.0: Resetting rcs0 for stuck wait on rcs0 [ 3955.974046] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 3955.974841] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 3955.974941] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 3955.976782] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 3955.977564] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} [ 4093.959834] i915 0000:00:02.0: Resetting rcs0 for stuck wait on rcs0 [...]
(In reply to Thomas Backlund from comment #12) > There is now a kernel-5.4.7-1.mga7 in updates_testing that has an updated > version of the fix sent by the Intel devs, so please try and see if that > works any better Per https://gitlab.freedesktop.org/drm/intel/issues/673#note_382292, no patch works with kernel 5.4 yet.
Per https://bugs.freedesktop.org/show_bug.cgi?id=111970#c35, we need to use drm-tip to fix this problem: https://cgit.freedesktop.org/drm-tip Is this the case?
(In reply to Frédéric "LpSolit" Buclin from comment #21) > Per https://bugs.freedesktop.org/show_bug.cgi?id=111970#c35, we need to use > drm-tip to fix this problem: https://cgit.freedesktop.org/drm-tip Is this > the case? It's not. Mageia 7 has lib64drm_intel1-2.4.100-1.mga7, which has been packaged on Oct 17.
drm-tip is upstream kernel tip tree for drm subsystem, not libdrm
And no, I wont pull in drm-tip in a stable release (In reply to Frédéric "LpSolit" Buclin from comment #20) > (In reply to Thomas Backlund from comment #12) > > There is now a kernel-5.4.7-1.mga7 in updates_testing that has an updated > > version of the fix sent by the Intel devs, so please try and see if that > > works any better > > Per https://gitlab.freedesktop.org/drm/intel/issues/673#note_382292, no > patch works with kernel 5.4 yet. Well, the v2 of the patch we have atleast makes it crash/hang less often... Lets hope Intel guys can sort this out for stable trees too
To start with my conclusion, I have had what I now think is this problem in Gnome 3 since the 5.3.13-2 kernel, intel i915 driver. With the 5.4.12-1 kernel, logging in as a test user and then on the command line using su - mylogin has given me access to everything set up in my $HOME but reduced the frequency and intensity of problems greatly. Timeout on GPU recovery remains a problem for the i915 driver, which complains of it. Just using a test user account had improved things so much I first thought the problem had gone away, but it recurred, though much diminished. I have a lot of stuff set up (languages, multiple browsers for special purposes, privoxy, geneweb genealogy server, etc) which was one reason I upgraded to Mageia 7 rather than do a clean install, and rather than add all that to the test user account I tried the su - mylogin to see if that could provide at least limited functional use of my regular user account. I have been surprised at how well things have worked. It appears the GPU recovery and rcs0 hang may be intensified by cruft in $HOME configuration files, with the Brave web browser the prime example at the moment. I have almost given up using Brave, but may try again using the testlogin to su - mylogin or possibly mylogin to su - testlogin and then to su - mylogin to get full access to my accounts capabilities.
CC: (none) => jim.beard
*** Bug 26149 has been marked as a duplicate of this bug. ***
CC: (none) => antonin.roussel
*** Bug 26117 has been marked as a duplicate of this bug. ***
CC: (none) => nicolas
Hello, On my server, a similar desktop freeze occurs almost once a day : Bug 26149. I manage to log on it through a x2go session*, or ssh. Maybe an idea to grab information ... (I don't know which one !) * Could not start my firefox browser in x2go session, nor kill the first firefox browser instance which was running in the frozen session (it remains several 'Web Content' defunct processes around). By the way I was wondering if there is a simple way to kill the whole frozen session from this ssh.
CC: (none) => eeeemail
A bit late but you might try loginctl.. https://freedesktop.org/software/systemd/man/loginctl.html eg. # loginctl list-sessions SESSION UID USER SEAT TTY c2 1000 username seat0 # loginctl terminate-session c2 Experiencing this issue too, relieved by using kernel-desktop-5.3.13-2.mga7-1-1.mga7 with matching -devel for now.
Hi everyone, this bug report refers to a number of serious issues with the i915 module. Personally I followed up on kernels since beta 1 (4.19.10), but I didn't notice abnormal gpu hangs until the 5.4.x series. I did have some crashes before, but couldn't pinpoint the root cause as I was also developing an OpenGL application which had bugs on its own. A good summary of the issues involved can be found here: https://linuxreviews.org/Kernel_5.4.1_And_5.3.14_Are_Released_Making_Linux_Users_With_Intel_iGPUs_Finally_Able_To_Use_5.3-Series_Kernels Based on that I reverted to 5.0.7 and 5.3.13 for testing. They are running currently very stable on 2 systems (i5 and i7 coffee lake). The 5.3.13 I will probably not use because of the nasty set_page_dirty (supposedly fixed in 5.3.14). A good way to trigger the GPU hang is to use gdkgears (gdk3 has rather large performance problems which causes high CPU load, which in turn triggers the GPU hang more easily). Keep an ssh session open to kill it The article also says: "Going back to 5.0.21 or updating to 5.5 when it is released are viable solutions" There is also this post: https://linuxreviews.org/Linux_Kernel_5.5_Will_Not_Fix_The_Frequent_Intel_GPU_Hangs_In_Recent_Kernels which basically says that the i915 will not be ready for the show in 5.5 either That leaves us with 5.0 Unfortunately, the 5.0.7 was only available in beta 3. I still have a copy, but I don't seem to find it in the repo anymore. So, until a final solution is released, hereby I formally request to add 5.0.7 again to the repository. Or perhaps even a new 5.0.21 build?
CC: (none) => smout.jan
Hi, killing frozen session through distant ssh allowed me to open a new session from usual login screen. But this new session was very very slow, despite of low CPU and low memory use. So I ended rebooting, in the clean way. Next time, I will give a look to gdkgears. (thank you for tools advices)
(In reply to Antonin Roussel from comment #31) > Hi, > killing frozen session through distant ssh allowed me to open a new session > from usual login screen. But this new session was very very slow, despite of > low CPU and low memory use. So I ended rebooting, in the clean way. > Next time, I will give a look to gdkgears. > (thank you for tools advices) In the beginning I was even using the alt-sysrq keys out of frustration :-O The terminal is very slow indeed, but when you already know which process to kill it becomes easier. If you don't know who is causing the hang then all there is left is a reboot :-/ ps : gdkgears is part of the gtk development suite. Checkout from the git repository and compile...
I am getting the freezes and stettering (typing into the terminal, the screen will freeze for 1 to 5 sec ) on a 5.1.4 kernel. A quick look at the logs does not really show anything. As an additional possible symptom, the latest (day before yesterday) google chrome update stopped it being able to show movies from tv channels. Some (gem.cbc.ca, globeltv.ca) show a black screen with no sound, although the thumbnail images when I put the cursor on the timeline on the movie do show pictures. Reverting to an earlier version of chrome and things work again, so it is not clear if this is related to this bug.
CC: (none) => unruh
(In reply to w unruh from comment #33) > I am getting the freezes and stettering (typing into the terminal, the > screen will freeze for 1 to 5 sec ) on a 5.1.4 kernel. A quick look at the > logs does not really show anything. > > As an additional possible symptom, the latest (day before yesterday) google > chrome update stopped it being able to show movies from tv channels. Some > (gem.cbc.ca, globeltv.ca) show a black screen with no sound, although the > thumbnail images when I put the cursor on the timeline on the movie do show > pictures. Reverting to an earlier version of chrome and things work again, > so it is not clear if this is related to this bug. When the kernel is at a constant speed and it is experiencing sudden freezes, the acceleration causes black video radiation in chrome :-D Sorry, couldn't resist the temptation. Not every day we get the occasion to greet a world class physicist. Pleased to meet you, even when it is online... Getting back on topic: If you are using intel graphics then the freezing might be related, but I wasn't aware of problems with video playback. You might be seeing 2 different things here. If reverting the chrome version helped then at least that one had nothing to do with the i915 module. Did the freezing also go away?
Good to meet you as well. Anyway, yes, I agree that the fact that the version of chrome which is affected is certainly a hint that it has nothing to do with the bug of this thread. However, my systems are Intel graphics with the i915 module, and the Chrome bug must be relatively rare since otherwise they would not have released the new version-- and perhaps that "rarity" is associated with the Intel graphics. Ie, the bug in Chrome is tickling the same bug in the Intel graphics. I reported it because of the association and the remote possibility that the bugs are related. Anyway, this stuttering and temporary freezing are getting very annoying. The past couple of times it froze I did alt-ctrl-F3 and then Alt-ctl-F1 to get back to X and the freezing had been thawed. I could not see anything in the kernel messages (alt-ctrl-F12) which could explain the problem. The problem is that this bug is making Mageia 7 on Intel graphics almost unusable. Note also my problems are on kernel 5.1.4 not the 5.3 or 5.4 that others are running.
Sorry not to have said. I use Plasma as my DE.
(In reply to w unruh from comment #35) > I reported it because of the association and the remote > possibility that the bugs are related. I suspect some race condition to be the culprit. From what I've seen, an application with relative high cpu usage and using the gpu for hardware accelerated drawing will trigger a freeze. I thought that chrome was not using hardware accelerated video playback in Linux, but maybe they had a change of heart in the last update. That could explain both 'black boxes' and triggering freezes. But that's just a wild guess. > Anyway, this stuttering and temporary freezing are getting very annoying. > The past couple of times it froze I did alt-ctrl-F3 and then Alt-ctl-F1 to > get back to X and the freezing had been thawed. > I could not see anything in the kernel messages (alt-ctrl-F12) which could > explain the problem. The tty loses track of previous pages the moment you switch the console (there is no Pg Up). A complete logging can be retrieved via the systemd journal. Here is how to check for i915 messages: journalctl -a | grep i915 Add '-b' if you're only interested in messages from the last boot. > The problem is that this bug is making Mageia 7 on > Intel graphics almost unusable. Couldn't agree more. I'm waiting for the mga kernel maintainer to upload a 5.0.x kernel for the intel users. I have one on 1 system, but I'm missing the rpm or src.rpm to install other machines. > Note also my problems are on kernel 5.1.4 not the 5.3 or 5.4 that others are > running. Yeah I saw that. I did run a 5.1.4 in June last year, but I wasn't doing any video playback in chrome, nor was I using it as my main development machine. And my system logs don't go that far to do a post mortem...
When I encountered this, it was using a usb oscilloscope. I'd not encountered any issue with 5.4 kernels until that point. I very rarely play games but suppose that would have triggered it before now. It's been absolutely fine with kernel 5.3.13, which was just before the changes were introduced upstream. It seems they've been wrestling with it ever since.
On my main machine, opera browser has been particularly bad about crashing with associated rcs0 hanging, but I have found I can use ssh to log in from my backup machine and the problem seems not to occur. The initial response when opera is launched via ssh may be of interest, due to the gpu-process multiple threads error message. To repeat, this works. Trying to use opera after logging in directly to the machine crashes frequently. ssh -l me mainmachine [me@mainmachine ~]$ opera & [1] 31858 [me@mainmachine ~]$ ATTENTION: default value of option vblank_mode overridden by environment. [31890:31890:0210/100529.768926:ERROR:sandbox_linux.cc(369)] InitializeSandbox() called with multiple threads in process gpu-process. [32002:32004:0210/100530.222585:ERROR:nss_util.cc(750)] After loading Root Certs, loaded==false: NSS error code: -8018 [32069:1:0210/100530.727697:ERROR:child_thread_impl.cc(864)] Receiver for unknown Channel-associated interface: chrome.mojom.SearchBouncer [32100:1:0210/100530.754528:ERROR:child_thread_impl.cc(864)] Receiver for unknown Channel-associated interface: chrome.mojom.SearchBouncer [31858:31876:0210/100530.915830:ERROR:nss_util.cc(750)] After loading Root Certs, loaded==false: NSS error code: -8018
Note that there is now a kernel-5.5.3-1 in updates_testing you can try... Note that I havent fixed virtualbox for 5.5 series yet...
(In reply to Thomas Backlund from comment #40) > Note that there is now a kernel-5.5.3-1 in updates_testing you can try... > Thank you Thomas. I will try that tomorrow. But I am a bit pessimistic as indicated here: https://linuxreviews.org/Linux_Kernel_5.5_Will_Not_Fix_The_Frequent_Intel_GPU_Hangs_In_Recent_Kernels
(In reply to Thomas Backlund from comment #40) > Note that there is now a kernel-5.5.3-1 in updates_testing you can try... > > Note that I havent fixed virtualbox for 5.5 series yet... Actually, you might want to wait for the next build... Upstream has identified 2 missing crash fixes in 5.5 (I already have some others added)
There is now a kernel-5.5.4-1.mga8 in testing
Mga7 too. tmb <tmb> 5.5.4-1.mga7: + Revision: 1525734 - drm/i915: Serialise i915_active_acquire() with __active_retire() Will give it a try.
Yeah, I meant the .mga7 one :)
Still waiting for my mirror to sync. It's about 8 hours behind at the moment. It might have to be tomorrow now.
Yeah, the mga8 distro rebuild makes mirroring slow
I've been using the 5.5.4-1 kernel this afternoon, doing the same things I was before, when I ran in to the problem, and not had a recurrence. I also tried suspend/resume and attempting the same again. So far so good. I've watched some youtube at the same time, which should use the card for decoding IINM and played with glxgears and teapot. Also allowed the screen to dim and blank before reviving it, with no ill effects. I've followed the journal throughout and so far not one mention of it. $ uname -a Linux localhost.localdomain 5.5.4-desktop-1.mga7 #1 SMP Sat Feb 15 08:41:16 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ head /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz stepping : 9 microcode : 0xca cpu MHz : 2364.522 cache size : 3072 KB physical id : 0 $ lspcidrake -vvv | grep ^Card Card:Intel 810 and later: Intel Corporation|HD Graphics 620 [DISPLAY_VGA] (vendor:8086 device:5916 subv:103c subd:8215) (rev: 02)
ok. Finally installed 5.5.4-1 and played around stress testing with gdkgears + other opengl app. Normally that would make this system hang in under a minute. Am now at 1.5 hours. No hangs, no logs in the journal. So far so good ^_^ I'll let it run now for a couple of days and will report back on Thursday when I will be using an additional machine. # uname -a Linux temp7 5.5.4-desktop-1.mga7 #1 SMP Sat Feb 15 08:41:16 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # head /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz stepping : 10 microcode : 0xca cpu MHz : 2100.038 cache size : 9216 KB physical id : 0 # lspcidrake -vvv | grep ^Card Card:Intel 810 and later: Intel Corporation|UHD Graphics 630 (Desktop) [DISPLAY_VGA] (vendor:8086 device:3e92 subv:1028 subd:085a)
Depends on: (none) => 26202
Upgrading the kernel to 5.5.4-desktop-1.mga7 also fixes the problem for me.
I also upgraded the kernel and so far in the last two days haven't got any hang. Videoconferencing with Chromium would always freeze Mageia 7, now it's working.
CC: (none) => ruben33en-mandriva
also the other machine behaves correctly. Have been working with it all day without a glitch. The journal has no trace of troubles... # uname -a Linux escher 5.5.4-desktop-1.mga7 #1 SMP Sat Feb 15 08:41:16 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # head /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz stepping : 10 microcode : 0xca cpu MHz : 4298.168 cache size : 12288 KB physical id : 0 # lspcidrake -vvv | grep ^Card Card:Intel 810 and later: Intel Corporation|UHD Graphics 630 (Desktop) [DISPLAY_VGA] (vendor:8086 device:3e92 subv:1043 subd:8694) Note: i915 loads firmware i915/kbl_dmc_ver1_04.bin (v1.4) from kernel-firmware-nonfree. I have currently the released one : kernel-firmware-nonfree-20191220-1 But there is another one in updates_testing : kernel-firmware-nonfree-20200121 Unsure if this can have an influence. Will keep it in mind when it hits the updates release
An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGAA-2020-0059.html
Resolution: (none) => FIXEDStatus: REOPENED => RESOLVED
Thank you Thomas