Bug 11188 - kernel(s) in Mageia 3 do not boot on Foxconn mini system
Summary: kernel(s) in Mageia 3 do not boot on Foxconn mini system
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 4
Hardware: i586 Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-06 23:11 CEST by David Walser
Modified: 2015-09-21 13:27 CEST (History)
1 user (show)

See Also:
Source RPM: kernel-3.8.13.4-1.mga3.src.rpm
CVE:
Status comment:


Attachments

Description David Walser 2013-09-06 23:11:51 CEST
The system is this one:
http://www.newegg.com/Product/Product.aspx?Item=N82E16856119051

It has this 4GB RAM and SATA HDD:
http://www.newegg.com/Product/Product.aspx?Item=N82E16820231341
http://www.newegg.com/Product/Product.aspx?Item=N82E16822145454

The bootloader on this system is LILO.  The Mageia 2 (server) kernel still works.

The server kernel spontaneously reboots very soon after being loaded (after the BIOS data check successful message), with only some early boot messages shown very briefly (the last one about freeing initrd memory).

The desktop kernel goes into a kernel panic very quickly.

The desktop586 kernel does the same.  I don't think the output is the same as the desktop one, although I think the desktop one also says something about not syncing.  Here's the last few lines from desktop586:
Kernel panic - not syncing: fatal exception interrupt
Shutting down CPUs with NMI
Bad dumping mode, switching to all CPUs dump
Dumping ftrace buffer: (ftrace buffer empty)

Reproducible: 

Steps to Reproduce:
Comment 1 David Walser 2013-09-06 23:33:30 CEST
Removing the nokmsboot option seems to have solved the spontenous quick reboot.  Now all kernels give this output at the end:
Kernel panic - not syncing: fatal exception interrupt
Shutting down CPUs with NMI
Comment 2 David Walser 2013-09-07 00:00:34 CEST
The rest of the trace above that that's available at the crash screen follows (hand to copy down by hand!).  Each line has a timestamp [ 0.173017 ] at the beginning of it.

[<c0628755>] bad_page+0xc5/0x86
[<c01f7e4a>] free_pages_prepare+0x12a/0x140
[<c01f7e81>] free_hot_cold_page+0x21/0x130
[<c01f7fd9>] __free_pages+0x49/0x70
[<c01f8066>] free_pages+0x26/0x30
[<c0130c2a>] free_init_pages+0xda/0x150
[<c08df5d9>] free_initrd_mem+0x1b/0x1d
[<c08cfa2e>] free_initrd+0x71/0x8a
[<c08cfd7c>] populate_rootfs+0x203/0x25d
[<c08cfb79>] ? maybe_link.part.2+0xf2/0xf2
[<c0101222>] do_one_initcall+0x112/0x160
[<c08cdad7>] kernel_init_freetable+0x113/0x1ad
[<c08cd44b>] ? do_early_param+0x7a/0x7a
[<c061a410>] kernel_init+0x10/0xd0
[<c0636ebb>] ret_from_kernel_thread+0x1b/0x30
[<c061a400>] ? rest_init+0x60/0x60
Code: c3 90 8d 74 26 00 e8 63 fc ff ff eb e8 90 55 89 e5 83 ec 0c 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 cb 89 c7 c1 e9 02 89 d6 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 5d f4 8b 75 f8 8b 7d fc 89
EIP: [<c03f725d>] memcpy+0x1d/0x50 SS:ESP 0068:f186ba00
CR2: 000000008d68192c
---[end trace 435803b87998836e ]---
Comment 3 David Walser 2013-11-29 02:22:38 CET
i586 kernel 3.10.19 also won't boot on this system.  The desktop kernels just spontaneously reboot immediately after loading the kernel, with the server one I see some early boot messages from the kernel and then it spontaneously reboots, but I didn't see a crash message.
Comment 4 Thomas Backlund 2013-12-07 00:20:12 CET
does kernel-linus work ?
Comment 5 David Walser 2013-12-07 00:21:39 CET
(In reply to Thomas Backlund from comment #4)
> does kernel-linus work ?

Ahh, good question.  I tried desktop, desktop586, and server, but I didn't try linus.  I'll let you know as soon as I can (hopefully this weekend).
Comment 6 David Walser 2013-12-09 20:17:45 CET
(In reply to Thomas Backlund from comment #4)
> does kernel-linus work ?

No, it also immediately reboots the system when I try to boot it.
Comment 7 David Walser 2013-12-15 16:27:17 CET
I tried some older versions of kernel-linus from SVN:
3.5.3 (revision 281573) - works
3.6.6 (revision 311685) - works
3.7.1 (revision 329569) - works

So I guess the breakage happened during the 3.8 merge window.  Not quite sure how to go about trying to narrow it down further than that.
Comment 8 David Walser 2013-12-21 20:01:20 CET
I tried more kernel-linus versions:
3.8-rc2 (revision 340147) - works
3.8-rc3 (revision 344622) - works
3.8-rc5 (revision 392576) - works
3.8-rc6 (revision 394675) - works
3.8-rc7 (revision 397137) - same crash message as in Comment 2

The 3.10.24 kernel-server update still spontaneously reboots early in the process.
Comment 9 Thomas Backlund 2013-12-21 23:26:58 CET
Ok,
nothing really stands out in the commits between -rc6 and -rc7.

So a git bisect is still needed:

do:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

cd linux
git bisect start
git bisect bad v3.8-rt7
git bisect good v3.8-rc6

copy working .config to tree, build, and install kernel and try to boot with it,
if it works do:

git bisect good

if not:

git bisect bad


then repeat the buid/install/boot process

until git will come out with a list of first bad commit...

it shouldnt take many builds as there is not that many commits between -rc6 and -rc7
Comment 10 David Walser 2014-01-08 17:26:13 CET
I'm building the last kernel in the bisect process now.  I won't be able to test it until I'm back home this weekend.  So far it's been git bisect good every single time, so the revision it's on now is 3992313488, which is the last commit before tagging rc7, and is merging of ARM stuff from Linaro.  I'll be surprised if this is really the commit that breaks it.  I'm a bit confused now, because rc7 definitely gives that error I mentioned earlier and doesn't boot.
Comment 11 David Walser 2014-01-08 18:11:55 CET
I guess since that last commit didn't change much, the build was very fast and finished before I could leave.  It also worked, so git things it broke when Linus tagged 3.8-rc7, which obviously isn't correct.  Not sure what to do now.

https://github.com/torvalds/linux/commits/v3.8-rc7
Comment 12 David Walser 2014-03-02 21:44:06 CET
Further Mageia 3 updates have killed this machine's ability to even boot from the old kernels.  I have two of them, so I tried upgrading the other one to Mageia 4, it's no better there.  These machines are of no use to me now.
Comment 13 AL13N 2014-07-15 01:43:59 CEST
not sure if this is related or not, but i have a new atom based machine now (ASROCK chipset), that gives kernel panic immediately on first boot.

i see something with _cpu_idle in the stack trace followed by
"Shutting down CPUs with NMI"

the timestamp is 1.*

the weird part is that the PXE boot with rescue and install with mga4 worked...

CC: (none) => alien

Comment 14 David Walser 2014-07-15 02:07:54 CEST
Yes, odd indeed that the installer runs but the installed system does not.
Comment 15 AL13N 2014-07-15 08:22:22 CEST
after looking around, it might or might not be the same issue, but specifying "intel_idle.max_cstate=0" in kernel parameters does actually make it boot...
Comment 16 AL13N 2014-07-15 08:22:43 CEST
@David perhaps you could try this too? see if it works?
Comment 17 David Walser 2014-07-19 21:38:33 CEST
Didn't work on either of them.
Comment 18 Marja Van Waes 2015-03-31 16:04:28 CEST
Mageia 3 changed to end-of-life (EOL) status 4 months ago.
http://blog.mageia.org/en/2014/11/26/lets-say-goodbye-to-mageia-3/ 

Mageia 3 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of Mageia
please feel free to click on "Version" change it against that version of Mageia
and reopen this bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--
The Mageia Bugsquad

Status: NEW => RESOLVED
Resolution: (none) => OLD

Comment 19 David Walser 2015-03-31 21:14:50 CEST
I know Mageia 4 doesn't work on those machines, I haven't tried Mageia 5.

Status: RESOLVED => REOPENED
Version: 3 => 4
Resolution: OLD => (none)

Comment 20 AL13N 2015-04-01 22:04:12 CEST
you have tried a new kernel, right? iirc, i ran my machines with Mageia 4, but with kernel 3.17.2 at the time. of course, it might not be the same issue.

well, good luck with it.
Comment 21 David Walser 2015-04-01 23:54:51 CEST
Unfortunately I've had to retire the machines since they've been non-operational for so long.  I'm now using the DHCP on my router at home and have no internal DNS :o(

I'll have to play with them again once Mageia 5 is out to see if it'll run again.
Comment 22 David Walser 2015-05-11 02:00:30 CEST
I upgraded one of the machines to Mageia 5 as of this morning.  It works!!!!
Comment 23 David Walser 2015-05-11 02:05:55 CEST
I get a message right after I start to boot (from LILO) that looked like this:

Initial ramdisk loads below 4Mb.  Kernel overwrite is possible.
Comment 24 David Walser 2015-05-11 02:16:12 CEST
I regenerated the initrd with dracut and now LILO says
Loading linux... ..
Kernel and Initrd memory conflict

and now it won't boot :o(
Comment 25 David Walser 2015-05-11 02:29:32 CEST
Changed it to GRUB and it boots again.  The first message when it boots is now:

could not find module by name='r8169'

which means it probably isn't including it in the initrd, but that's the module for my eth0, so it probably should be in there.
Comment 26 Samuel Verschelde 2015-09-21 13:19:46 CEST
Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer 
maintained, which means that it will not receive any further security or bug 
fix updates.

Package Maintainer: If you wish for this bug to remain open because you plan to 
fix it in a currently maintained version, simply change the 'version' to a later 
Mageia version.

Bug Reporter: Thank you for reporting this issue and we are sorry that we weren't 
able to fix it before Mageia 4's end of life. If you are able to reproduce it 
against a later version of Mageia, you are encouraged to click on "Version" and 
change it against that version of Mageia. If it's valid in several versions, 
select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.

Although we aim to fix as many bugs as possible during every release's lifetime, 
sometimes those efforts are overtaken by events. Often a more recent Mageia 
release includes newer upstream software that fixes bugs or makes them obsolete.

If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].

[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/
Comment 27 David Walser 2015-09-21 13:27:09 CEST
Closing this as OLD.

Status: REOPENED => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.