Bug 19892 - HID fails with 4.4.32
Summary: HID fails with 4.4.32
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Release (media or process) (show other bugs)
Version: 5
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2016-12-04 23:36 CET by Gilles Allard
Modified: 2017-03-02 17:32 CET (History)
4 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
Required logfiles (330.00 KB, application/x-compressed-tar)
2016-12-05 14:20 CET, Gilles Allard
Details
Required lsusb files (509 bytes, application/x-compressed-tar)
2016-12-05 17:32 CET, Gilles Allard
Details
2 logfiles for 4.4.35. One good and 1 bad (56.51 KB, application/x-compressed-tar)
2016-12-06 15:02 CET, Gilles Allard
Details
Logfile of 4.4.36 boot (190.30 KB, text/x-txt)
2016-12-08 01:08 CET, Gilles Allard
Details
log of 4.4.36 boot with kernel parm: radeon.runpm=0 (155.53 KB, text/plain)
2016-12-11 21:06 CET, Gilles Allard
Details

Description Gilles Allard 2016-12-04 23:36:47 CET
Description of problem:
I installed 4.4.32 recently (automatic update).
After reboot, all input devices of my Dell Precision M4800 failed. No trackpad, no trackpoint, no keyboard. Tried USB devices (mouse and keyboard) without success.
No way to login.
Rebooting with 4.4.30 solved the problem.
Systemd logs do not contain related info.

Version-Release number of selected component (if applicable):
4.4.32-desktop-1.mga5

How reproducible:
Every time I select 4.4.32

Steps to Reproduce:
1. Boot with default GRUB entry
2.
3.
Comment 1 Thomas Backlund 2016-12-05 09:39:32 CET
Hm, on a quick look nothing stands out between 4.4.30 and 4.4.32

Can you boot with 4.4.32 and wait some 30 sec after its fully booted to login screen.

Then rebooot into 4.4.30, open a terminal and as root do:

lspcidrake -v >lspcidrake.log
journalctl -b >boot-4.4.30.log
journalctl -b -1 >boot-4.4.32.log

and _attach_ the 3 logfiles created to this bugreport.

CC: (none) => tmb

Marja Van Waes 2016-12-05 12:17:11 CET

Keywords: (none) => NEEDINFO
CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 2 Gilles Allard 2016-12-05 14:20:04 CET
Created attachment 8726 [details]
Required logfiles

I don't know if it's related but 4.4.32 boots quicker than 4.4.30
Comment 3 Thomas Backlund 2016-12-05 17:18:15 CET
Interestingly 4.4.32 seems to detect your hw better as on 4.4.30 you have a lot of lines like:

 local4.local kernel: pci 0000:04:03.0: BAR 14: no space for [mem size 0x00200000]
 local4.local kernel: pci 0000:04:03.0: BAR 14: failed to assign [mem size 0x00200000]

that are properly detected as resources in 4.4.32


Can you also try with 4.4.36-1 that is available in core updates_testing ?



also please provide output of
lsusb
lsusb -t
Comment 4 Gilles Allard 2016-12-05 17:32:18 CET
Created attachment 8728 [details]
Required lsusb files
Comment 5 Gilles Allard 2016-12-06 15:02:04 CET
Created attachment 8730 [details]
2 logfiles for 4.4.35. One good and 1 bad

I installed 4.4.35-2. It's the latest I found in "testing".
The first time I booted 4.4.35, it worked fine. A log is attached.
All subsequent boots failed after login. System freezed between login and desktop appearance. No disk activity. Through SSH, I found that CPU was quiet.
I've attached a log of a failed boot.

I'm back to 4.4.30
I need guidance to diagnose the problem.

CC: (none) => gallard

Comment 6 Thomas Backlund 2016-12-07 08:42:35 CET
(In reply to Gilles Allard from comment #5)
> Created attachment 8730 [details]
> 2 logfiles for 4.4.35. One good and 1 bad
> 

Something went wrong here... ther OK log is from a 4.1.15 kernel.

Anyway, can you try with the new 4.4.36-2 in testing ?

If you dont find it on your local mirror, grab it from here:
http://mirrors.kernel.org/mageia/distrib/5/x86_64/media/core/updates_testing/kernel-desktop-4.4.36-2.mga5-1-1.mga5.x86_64.rpm

and if you use any dkms packages you need the -devel package too:
http://mirrors.kernel.org/mageia/distrib/5/x86_64/media/core/updates_testing/kernel-desktop-devel-4.4.36-2.mga5-1-1.mga5.x86_64.rpm
Comment 7 Thomas Backlund 2016-12-07 08:44:24 CET
Also does it make any difference between cold boot and warm boot ?

(cold boot = start from computer completely off)
(warm boot = reboot from working kernel)
Comment 8 Gilles Allard 2016-12-08 01:08:06 CET
Created attachment 8739 [details]
Logfile of 4.4.36 boot

4.4.36-2 is similar to 4.4.35-2: the display freeze after login.
Access through SSH shows that X is running  and systemd has completed the boot.

Logfile contain some stack traces at the end but I think they are unrelated to my problem.

Cold_boot and warm_boot are identical.

Next step?
Comment 9 Thomas Backlund 2016-12-09 20:03:53 CET
Can you try kernel-linus-4.4.32-1.mga5 available in updates to confirm its an upstream issue (or not)
Comment 10 Thomas Backlund 2016-12-09 20:16:54 CET
Oh, and the stack straces is an aftereffect of this:

déc 07 09:11:39 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10020msec
déc 07 09:11:39 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:39 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10521msec
déc 07 09:11:39 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:40 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 11022msec
déc 07 09:11:40 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:40 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 11523msec
déc 07 09:11:40 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:41 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 12024msec
déc 07 09:11:41 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:41 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 12525msec
déc 07 09:11:41 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
déc 07 09:11:42 local4.local kernel: radeon 0000:01:00.0: ring 3 stalled for more than 13026msec
déc 07 09:11:42 local4.local kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000005e last fence id 0x000000000000005f on ring 3)
Comment 11 Thomas Backlund 2016-12-09 20:20:28 CET
And I missed those in the 4.4.32 logs on first check...
Now there is only a few gpu patches between 4.4.30 and 4.4.32 so it should hopefully not be hard to check wich one triggers
Comment 12 Thomas Backlund 2016-12-09 22:47:13 CET
Can you try 2.1, 2.2 and 2.3 from: 
ftp://ftp.free.fr/mirrors/mageia.org/people/tmb/5/bugs/19892/

and see if any of them locks up ?
Comment 13 Thomas Backlund 2016-12-10 12:26:03 CET
also, on 4.4.36-2 try to add

radeon.runpm=0

to kernel command line, does that change anything?
Comment 14 Gilles Allard 2016-12-11 21:06:48 CET
Created attachment 8752 [details]
log of 4.4.36 boot with kernel parm: radeon.runpm=0

2.1 seems OK (similar to 4.4.30-2)
2.2 freeze after login (tried twice)
2.3 freeze after login (tried once)

with radeon kernel parm, we have something different:
- after login, the desktop is displayed
- no mouse (neither touchpad, nor trackpoint, nor external USB
- no keyboard (internal)
- no sshd
A logfile is attached.

For your knowledge, a Dell Precision M4800 is a weird machine.
It has 2 mice (touchpad and trackpoint) and 2  graphic chipset (Intel G4 and Radeon 8870M).
Comment 15 Thomas Backlund 2016-12-11 22:11:09 CET
ok, so here is a fully updated 4.4.38 with a "fix" for this, try:

ftp://ftp.free.fr/mirrors/mageia.org/people/tmb/5/bugs/19892/kernel-desktop-4.4.38-1.1.mga5-1-1.mga5.x86_64.rpm
Comment 16 Gilles Allard 2016-12-13 21:00:37 CET
4.4.38-1 seem to work fine.
Thank you very much Thomas.
May I know what was the problem (with some details)?
Will the fix be included in mainstream kernel?

Regards

Gilles Allard
Comment 17 Thomas Backlund 2016-12-30 16:35:58 CET
The bug was introduced by another "possible stability fix" in 4.4.31.
But as that caused regression on your hw, and fixing it would need backporting of several upstream changes, some of wich could cause other regressions in the 4.4 branch.

So the "fix" I did was to revert the commit introducing this problem, as it's better to stay on "known good and working" code and keep everyone happy.

The fix is now in the 4.4.39-1 kernel released as:
http://advisories.mageia.org/MGASA-2016-0429.html

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 18 Thomas Backlund 2017-01-28 09:39:01 CET
Upstream has now fixed the regression differently...

Can you test 4.4.45-1.mga5 currently in updates_testing that it still works for you
Comment 19 Gilles Allard 2017-02-10 23:14:33 CET
Hi Thomas!
Where should be 4.4.45?
I can't find it.
"Core Update Testing" is enabled
Comment 20 Gilles Allard 2017-03-02 16:46:35 CET
Can't find 4.4.45 but I tried 4.4.50 and it works well on my Dell Precision M4800.
Fixed and resolved.
Comment 21 Thomas Backlund 2017-03-02 17:32:23 CET
Great, I just wanted to verify that it still works for you...

Note You need to log in before you can comment on or make changes to this bug.