Bug 20452 - First init of Plasma Session Hangs (black screen, no mouse). AMD APU Kaveri A10
Summary: First init of Plasma Session Hangs (black screen, no mouse). AMD APU Kaveri A10
Status: RESOLVED INVALID
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-11 22:48 CET by Rick Stockton
Modified: 2017-03-16 11:39 CET (History)
4 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
urpmi stdout with --debug, vicinity of "bad decision" regarding fglrx (7.81 KB, text/plain)
2017-03-13 20:46 CET, Rick Stockton
Details

Description Rick Stockton 2017-03-11 22:48:08 CET
The DM came up fine, and Gnome-3 works (with compositing) on the correct driver.

When initializing a first KDE session (after online upgrade from 5.1 to Cauldron 2017-03-11), KDE pops up a panel saying that my "R7 Video" is to old for the fglrx driver module, and it must be switched down to generic "ATI". The message offers no choice except "OK". After pressing "OK" on the message, the session is hung on black screen with no mouse indicator.

WORKAROUND: Switching down to runlevel-3 and running drakx11 offers choose the  correct driver (module fglrx for 6400 and higher), running the test and confirming this module allows KDE Sessions to start up correctly after restart.

My video is: ATI APU, Kaveri integrated "R7" (no explicit card). Vintage 2016, it might be too *new* for that KDE video "verification test" to understand.

(Card = "Kaveri [Radeon R7 Graphics]" , PCI Vendor ID x1002, Device ID x130f).

I tentatively rate as "normal", because the problem looks so bad - even though the workaround is easy, and "online upgrade" users tend to be very capable persons.
Comment 1 Rick Stockton 2017-03-11 22:51:34 CET
I will shortly test whether DVD-based upgrade (using STA-2) has the same problem.

Running a GNOME-3 session (Xorg) works fine, and subsequent KDE session also work fine. Therefore, I suspect the error to within a KDE migration tool.

CC: (none) => rickstockton

Marja Van Waes 2017-03-12 08:52:38 CET

CC: (none) => marja11
Assignee: bugsquad => kde

Comment 2 Nicolas Lécureuil 2017-03-12 09:03:02 CET
in plasma we don't touch drivers ( and plasma is started as a user so can't touch drivers )

CC: (none) => mageia
Assignee: kde => kernel

Comment 3 Rick Stockton 2017-03-12 21:02:33 CET
(In reply to Nicolas Lécureuil from comment #2)
> in plasma we don't touch drivers ( and plasma is started as a user so can't
> touch drivers )

I agree, Plasma is not the source of this behavior. The "evil" pop-up message about switching drivers (from flgrx to ATI) was displayed in black on white, with no border decorations. It didn't _look_ like a 'Plasma' complaint. Perhaps we've got a conflict between drakx11 (works), and some kind of runlevel 5 "first time" video driver re-check within MGA code. But even if it is an issue in "upstream" software, I think it would be "lower down" (Maybe somewhere within/between Kwin, Qt, and X11).

Maybe the problem occurs on the first execution of ANY compositing WM from runlevel 5? I can think of two tests, although it will take me about 7 hours to set up each test:

Test (1): Try login to GNOME (instead of KDE), and see if the message and resulting "black screen" occurs there too. OR:

Test (2): After the message: shutdown-restart, and see if the "black-screen" is resolved WITHOUT running drakx11.

----
Each test involves a relatively short "clone my MGA-5.1 hard drive" into a test hard drive of the same size (20 minutes), followed by a relatively long online upgrade. (That step takes about 6-1/2 hours, using my current wireless-N Ethernet. My system is extremely "fat", relatively good for upgrade testing). Which test do you want first, and what logging is appropriate?
Comment 4 Rick Stockton 2017-03-13 20:43:37 CET
I'm in the process of a 'fresh' online upgrade (from up-to-date MGA 5.1), and it looks more like an installer problem. I ran "urpmi --auto-update --auto --debug" from inside a KDE 'konsole', to easily search and copy the debug information. I expected to see some issues with "video drivers in use" during the upgrade, but I had also expected to see the mga6 version of these packages 'promoted' (in package selection processing and then 'updated', rather than 'removed':
   
created transaction for installing on / (remove=4, install=0, upgrade=12)
trans: scheduling removal of fglrx-kernel-desktop-latest-15.302-10.mga5.nonfree.x86_64
trans: scheduling removal of dkms-fglrx-15.302-4.mga5.nonfree.x86_64
trans: scheduling removal of fglrx-control-center-15.302-4.mga5.nonfree.x86_64
trans: scheduling removal of x11-driver-video-fglrx-15.302-4.mga5.nonfree.x86_64

I suggest reassignment of this bug to the installer team, with a revised description. It's not a KDE 'assignment', the error message occurs because fglrx has not been installed, and therefore wasn't built during the initial run of the mga-6. See attached 'bug-20452-debug-log-2017-03-13.txt', for relevant lines of urpmi logging. 

SWAG: Perhaps it is an ordering problem: I think that it should be possible to promote (and later install) these 4 packages sometime *after* the new kernel, and perhaps corresponding dkms, have already been selected.
Comment 5 Rick Stockton 2017-03-13 20:46:13 CET
Created attachment 9086 [details]
urpmi stdout with --debug,  vicinity of "bad decision" regarding fglrx
Comment 6 Thomas Backlund 2017-03-13 21:35:25 CET
There is no proprietary fglrx driver in Mageia 6.

AMD is not supporting any new kernels or xorg servers, wich is why we force the removal of the *fglrx* rpms on upgrade.

on first boot the ddh should detect the missing fglrx driver and do the switch to the free radeon or amdgpu driver...

In your case the hw is supported by both drivers...

If you boot into runlevel 3, edit /etc/X11/xorg.conf,

change:

Driver "ati"

to

Driver "amdgpu"

and reboot, does it work then ?

or simply rename the xorg.conf and let xorg server run autodetection

CC: (none) => tmb

Comment 7 Rick Stockton 2017-03-13 21:49:39 CET
I understand. I'm not sure which diver drakx11 will choose (probably 'x11-driver-video-ati'". amdgpu was not installed by auto-update, I'll check if drakx11 brings it in.

(We probably need to write an 'AMD Video' section within the Release Notes, next to the 'NVidia' discussion). Thank you! Feel free to re-write the bug 'Description' 'Component'.
Comment 8 Thomas Backlund 2017-03-13 21:54:06 CET
drakx11 will currently assign x11-driver-video-ati (which is why I mentioned you need to edit xorg.conf or rename it), but I'd like to know if the amdgpu one will work better, so please try both and report so we can default to the better working one.
Comment 9 Rick Stockton 2017-03-14 05:54:15 CET
X11-amdgpu, is present but fails drakx11 testing on my Keveri APU and Dell monitor. x11-driver-video-ati also fails in inconsistent ways, and I didn't keep adequate notes. And it's possible that my overclocked GPUs (fine under 5.1 with fglrx) are pushed to into errors with the current alternatives.

I've just cut back my overclock ratios. On my next rebuild and migration of the "upgrade disk", I'll rename xorg.conf at runlevel 3, and see what happens with Xorg auto-detecting. It will be a couple of days - my "real work" takes priority, unfortunately.
Comment 10 Rick Stockton 2017-03-14 18:59:36 CET
One 'quick' test I can do on the current disk: I have kernel boot parameter "VGA=xxx" present, and and I'll try a run removing that - I see that KMS is required for amdgpu. Need to also check that co-req mesa libs are brought in. I might also blacklist 'radeon' for amdgpu, and visa-versa.
Comment 11 Rick Stockton 2017-03-15 05:34:12 CET
lots of interesting test results using ATI/radeon, which runs as the default: (per the following from lsmod)

amdgpu               1531904  0
amdkfd                139264  1
amd_iommu_v2           20480  1 amdkfd
radeon               1486848  43
i2c_algo_bit           16384  2 amdgpu,radeon
drm_kms_helper        135168  2 amdgpu,radeon
ttm                    90112  2 amdgpu,radeon
drm                   335872  18 amdgpu,radeon,ttm,drm_kms_helper

test A: I stripped VGA=775 from my parameters, and ran 'telinit 5' from my root console.
A1: I chose 'Gnome on Xorg', and ran a good session (with applications and logoff).
A2: I chose 'Gnome on Xorg' a *second* time, and it died (the DM came back up).
A3: I chose 'Gnome on Wayland', and ran a good session.
A4: I chose 'Gnome on Wayland' a *second* time, and it died.
A5: I chose 'Ice' without Session, and it worked.
A6: I chose 'Ice WITH Session, and it died.

test B: I left VGA=775 in the parameter string, again ran 'telinit 5' from my root console.
B1: 'Gnome on Wayland' worked.
B2: Next login, 'Gnome on Wayland died.
B3: Next login, "IceWM with session" DIED - breaking the pattern of "binary flip-flop" in the previous tests. I had to crash the box.

test C: stripped VGA=775 again.
C1: "Gnome on X11" worked.
C2: "Plasma" *immediately after* Gnome on X11 *WORKED*.

I am now in that Plasma session. I will next blacklist radeon, rename my 'largely stale' xorg.conf file to prevent it's use, and if amdpgu comes up auto-magically. Restart coming, of course.
Comment 12 Rick Stockton 2017-03-15 07:10:55 CET
(In reply to Thomas Backlund from comment #6)
> 
> If you boot into runlevel 3, edit /etc/X11/xorg.conf,
> 
> change:
> 
> Driver "ati"
> 
> to
> 
> Driver "amdgpu"
> 
> and reboot, does it work then ?
> 
> or simply rename the xorg.conf and let xorg server run autodetection

Neither works. When I succeed in forcing startx to use amdgpu, it quickly dies with an error about trying to run with a conflicting DRM Version (V2, amdgpu requires V3). drm and/or drm_kms_helper are likely to blame. At other times, like right now, radeon is loaded - even though I blacklisted it, in the "gentle" way doing that. Something else decides to load it later.

If I ask urpme to take out DRM V2, it asks whether I want to remove 1500 other packages which claim to require it (basically, every RPM with a GUI). I said no thanks - and I'm running under ati-radeon.

Do we possibly have a related issue with a single module named "drm", rather than separate "drm V2" and "drm v3"? I've got the RPMs, but I think that only V2 gets loaded.
Comment 13 Rick Stockton 2017-03-15 07:30:10 CET
(In reply to Rick Stockton from comment #0)
> The DM came up fine, and Gnome-3 works (with compositing) on the correct
> driver.

Not always.
> 
> When initializing a first KDE session (after online upgrade from 5.1 to
> Cauldron 2017-03-11), KDE pops up a panel saying that my "R7 Video" is to
> old for the fglrx driver module, and it must be switched down to generic
> "ATI". The message offers no choice except "OK". After pressing "OK" on the
> message, the session is hung on black screen with no mouse indicator.
> 
> WORKAROUND: Switching down to runlevel-3 and running drakx11 offers choose
> the  correct driver (module fglrx for 6400 and higher), running the test and
> confirming this module allows KDE Sessions to start up correctly after
> restart.
I think that my memory failed me in saying that... IIRC, the successful Plasma session was the second session after restart.

> 
> My video is: ATI APU, Kaveri integrated "R7" (no explicit card). Vintage
> 2016, it might be too *new* for that KDE video "verification test" to
> understand.
> 
> (Card = "Kaveri [Radeon R7 Graphics]" , PCI Vendor ID x1002, Device ID
> x130f).
> 
> I tentatively rate as "normal", because the problem looks so bad - even
> though the workaround is easy, and "online upgrade" users tend to be very
> capable persons.
No. The problem persists between logins - sometimes it happens, and sometimes it doesn't. It's approximately 50/50 with Gnome sessions, but I am able to run only ONE Plasma session with restarting X -- and it has to be the second login attempt, after doing some other Graphic Desktop first.
Comment 14 Rick Stockton 2017-03-15 18:18:24 CET
I have just ordered an ATI video card, for the purpose of checking whether the hangs and errors are related to the use of a Kaveri APU (versus separate card).

There might be "issues" with suspend/resume operations on the A10 and similar "built-in graphics" processors from AMD. I expect the part to arrive on Saturday, 3/18. If we can resolve Kaveri APU problems, this will also let us see whether "crossfire" works via either of the new AMD video drivers.

Summary: First init of KDE assigns wrong video driver for a Kavieri APU. Hangs (black screen, no mouse). => First init of Plasma Session Hangs (black screen, no mouse). AMD APU Kaveri A10

Comment 15 Rick Stockton 2017-03-16 11:25:25 CET
(In reply to Rick Stockton from comment #12)
> (In reply to Thomas Backlund from comment #6)
> > 
> > If you boot into runlevel 3, edit /etc/X11/xorg.conf,
> > 
> > change:
> > 
> > Driver "ati"
> > 
> > to
> > 
> > Driver "amdgpu"
> > 
> > and reboot, does it work then ?
> > 
> > or simply rename the xorg.conf and let xorg server run autodetection
> 
> Neither works. When I succeed in forcing startx to use amdgpu, it quickly
> dies with an error about trying to run with a conflicting DRM Version (V2,
> amdgpu requires V3). drm and/or drm_kms_helper are likely to blame. At other
> times, like right now, radeon is loaded - even though I blacklisted it, in
> the "gentle" way doing that. Something else decides to load it later.
> 
> If I ask urpme to take out DRM V2, it asks whether I want to remove 1500
> other packages which claim to require it (basically, every RPM with a GUI).
> I said no thanks - and I'm running under ati-radeon.
> 
> Do we possibly have a related issue with a single module named "drm", rather
> than separate "drm V2" and "drm v3"? I've got the RPMs, but I think that
> only V2 gets loaded.

New kernel (and associated bits) With kernel '4.9.15-desktop-1.mga6 #1 SMP' allows amdgpu, which experienced the same problems until I did "dracut -f" and cleaned up some incorrect links on my /boot drive. A *much smaller* problem still exists. Most of these comments no longer apply, and I'm creating a new bug to describ more clearly.
Comment 16 Rick Stockton 2017-03-16 11:39:03 CET
setting status = invalid on this one. Replacement bug (smaller scope and better- defined) is #20499.

Status: NEW => RESOLVED
Resolution: (none) => INVALID


Note You need to log in before you can comment on or make changes to this bug.