Bug 13098 - kernel on Mageia 4 does not do KMS properly on Dell PowerEdge R610
Summary: kernel on Mageia 4 does not do KMS properly on Dell PowerEdge R610
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 4
Hardware: i586 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 14301
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-26 23:12 CET by David Walser
Modified: 2014-11-15 19:35 CET (History)
1 user (show)

See Also:
Source RPM: kernel
CVE:
Status comment:


Attachments
dmesg output (67.07 KB, text/plain)
2014-05-01 22:46 CEST, David Walser
Details
lsmod output from Dell server (7.21 KB, text/plain)
2014-09-17 14:56 CEST, David Walser
Details
lsmod from 3.11.2 (7.41 KB, text/plain)
2014-09-18 22:22 CEST, David Walser
Details
dmesg from 3.11.2 (67.40 KB, text/plain)
2014-09-18 22:22 CEST, David Walser
Details

Description David Walser 2014-03-26 23:12:14 CET
Very early on after the kernel loads, it switches the console resolution, presumably using KMS.  With the Mageia 3 kernel, this works fine.  With the Mageia 4 kernel, viewing the screen physically locally, it looks like it's frozen (just the one line about initializing the SAS adapter stays at the top), even though it's actually booting just fine.

dmidecode says it has an embedded Matrox G200 Video card in the server.

Reproducible: 

Steps to Reproduce:
David Walser 2014-04-28 23:41:16 CEST

Component: Release (media or process) => RPM Packages

Comment 1 Thomas Backlund 2014-05-01 22:11:30 CEST
Pasting specs from bug 13264


Dell PowerEdge R610
Intel Xeon X5670, 2 6-core CPUs with hyperthreading
Intel ICH9 chipset
Matrox G200eW WPCM450
Comment 2 David Walser 2014-05-01 22:18:31 CEST
Card:Matrox Millennium G series (single head): Matrox Electronics Systems Ltd.|MGA G200eW WPCM450 [DISPLAY_VGA] (vendor:102b device:0532 subv:1028 subd:0236) (rev: 0a)
Comment 3 Thomas Backlund 2014-05-01 22:36:57 CEST
Can you also attach a dmesg from that system
Comment 4 David Walser 2014-05-01 22:46:27 CEST
Created attachment 5134 [details]
dmesg output
Comment 5 David Walser 2014-08-29 00:12:36 CEST
These lines look relevant:
[    4.574462] [drm:mga_vram_init] *ERROR* can't reserve VRAM
[    4.574467] mgag200 0000:06:03.0: Fatal error during GPU init: -6

I also see those with the 3.14.17 in updates_testing.
Comment 6 Thomas Backlund 2014-08-29 00:18:36 CEST
is uvesafb loaded on that system ?

if so, does it help to blacklist it ?
Comment 7 David Walser 2014-08-29 00:25:29 CEST
It is not loaded.
Comment 8 Thomas Backlund 2014-09-17 11:51:06 CEST
Can you provide attach output of lsmod
Comment 9 David Walser 2014-09-17 14:56:11 CEST
Created attachment 5422 [details]
lsmod output from Dell server
Comment 10 David Walser 2014-09-18 21:03:42 CEST
I just built 3.11.2 from our kernel package SVN revision 487545 and it works fine.
Comment 11 Thomas Backlund 2014-09-18 21:54:06 CEST
Ok, so there is minimal changes in the gpu code between 3.11 and 3.12, so maybe this is not really the gpu code that has been broken, but some acpi/mm/mtrr code change...

Can you get a dmesg and lsmod from running the 3.11 kernel too
Comment 12 David Walser 2014-09-18 22:22:39 CEST
Created attachment 5423 [details]
lsmod from 3.11.2
Comment 13 David Walser 2014-09-18 22:22:57 CEST
Created attachment 5424 [details]
dmesg from 3.11.2
Comment 14 David Walser 2014-09-19 00:28:17 CEST
Not having luck booting the first kernel I built in the bisect process.  When grub selects it, the screen immediately goes completely black, and then a minute later dracut prints a bunch of errors about devices not existing (UUIDs which correspond to my swap, /, and /usr partitions).  Usually the first message is about the megasas raid adapter, so I'm guessing the one I built is failing to initialize the hardware raid adapter properly.  I have no idea why, as I tried starting with the configs from both 3.11.2 and 3.12-rc5, which both at least do boot.

My process for building and installing the kernel was extracted from our kernel spec, so maybe you can find some flaw in my process here, but it looks right to me.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

cd linux
git bisect start
git bisect bad v3.12-rc5
git bisect good v3.11

cp /boot/config-3.12.0-desktop-0.rc5.1.mga4 .config

make oldconfig
make -j 24 -s all
KernelVer=3.11.0 # based on the contents of the Makefile currently
install -m 644 System.map /home/admin/bisect/boot/System.map-$KernelVer
install -m 644 .config /home/admin/bisect/boot/config-$KernelVer
xz -c Module.symvers > /home/admin/bisect/boot/symvers-$KernelVer.xz
cp -f arch/x86/boot/bzImage /home/admin/bisect/boot/vmlinuz-$KernelVer
make INSTALL_MOD_PATH=/home/admin/bisect KERNELRELEASE=$KernelVer modules_install
rm -rf /home/admin/bisect/lib/firmware
find /home/admin/bisect/lib/modules -name "*.ko" | xargs -P 24 xz -6e
rm -f /home/admin/bisect/lib/modules/{build,source}
pushd /home/admin/bisect/lib/modules
/sbin/depmod -ae -b /home/admin/bisect -F /home/admin/bisect/boot/System.map-$KernelVer $KernelVer
pushd $KernelVer
modules=`find . -name "*.ko.[g,x]z"`
echo $modules | xargs -P 24 /sbin/modinfo | perl -lne 'print "$name\t$1" if $name && /^description:\s*(.*)/; $name = $1 if m!^filename:\s*(.*)\.k?o!; $name =~ s!.*/!!' > modules.description
popd
popd
pushd /home/admin/bisect
tar -cvf ../${KernelVer}.tar lib/modules/$KernelVer boot/*
popd

su -
cd /
tar -xvf /home/admin/${KernelVer}.tar
/sbin/installkernel $KernelVer
Comment 15 David Walser 2014-09-19 16:50:10 CEST
This really doesn't make sense.  I thought maybe the upstream kernel just didn't work and it was some patch in the Mageia kernel that made it work, but kernel-linus 3.12 from mga4 boots.  So, I don't understand why the one I built in Comment 14 won't boot.

Regardless, I found a convoluted way to create a patch from 3.11.0 to the current contents of my git bisect and integrate that into our RPM build procedure, so I managed to build a kernel in an RPM that boots and works.  So, step 1 => git bisect good.
Comment 16 David Walser 2014-09-19 23:27:49 CEST
I made it all the way through the git bisect and all of the kernels came up good.  So, the issue was caused by a config change in our package, namely here:
http://svnweb.mageia.org/packages/cauldron/kernel/current/PATCHES/configs/i386.config?r1=496716&r2=496715&pathrev=496716
Comment 17 Thomas Backlund 2014-09-19 23:45:45 CEST
Can you try to disable: CONFIG_FB_SIMPLE
Comment 18 David Walser 2014-09-20 00:45:27 CEST
(In reply to Thomas Backlund from comment #17)
> Can you try to disable: CONFIG_FB_SIMPLE

I've played with that option as well as CONFIG_X86_SYSFB.

X86_SYSFB = n, FB_SIMPLE = n, works
X86_SYSFB = y, FB_SIMPLE = y, doesn't work
X86_SYSFB = y, FB_SIMPLE = n, doesn't work
X86_SYSFB = n, FB_SIMPLE = y, works

So the issue is actually X86_SYSFB, a new option that was added during 3.12 development, that when set to yes, breaks our server's console.

FB_SIMPLE does have a slight noticeable impact, with it set to y, I can at least see that megasas adapter initialization message at the beginning, with FB_SIMPLE set to n, I can't even see that.  So it appears FB_SIMPLE is better off staying as yes.
Comment 19 David Walser 2014-09-20 01:36:26 CEST
I rebuilt the mga4 updates_testing kernel with X86_SYSFB = n, and the display works again :D
Comment 20 claire robinson 2014-09-23 13:35:34 CEST
See also https://forums.mageia.org/en/viewtopic.php?f=7&t=8450
Comment 21 David Walser 2014-09-23 15:17:16 CEST
(In reply to claire robinson from comment #20)
> See also https://forums.mageia.org/en/viewtopic.php?f=7&t=8450

I haven't tried to run X on it.  I would think that would be a different issue.
Comment 22 Thomas Backlund 2014-09-28 20:50:17 CEST
@David:

There is now a kernel-3.14.19-1.mga4 building with X86_SYSFB disabled and also the interesting vfs fix I mentioned on irc

X86_SYSFB is also disabled in Cauldron in the upcoming 3.17-rc7 kernel
Comment 23 David Walser 2014-10-01 19:35:18 CEST
Thanks Thomas!  This is now running on our production Squid server.
David Walser 2014-11-14 22:20:52 CET

Depends on: (none) => 14301

Comment 24 David Walser 2014-11-15 19:35:16 CET
Fixed in http://advisories.mageia.org/MGASA-2014-0453.html

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.