Bug 27796 - Memory test can cause reboot (Toshiba Portege R930)
Summary: Memory test can cause reboot (Toshiba Portege R930)
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Martin Whitaker
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-10 04:10 CET by Ben McMonagle
Modified: 2021-01-08 01:08 CET (History)
2 users (show)

See Also:
Source RPM: pcmemtest-1.3
CVE:
Status comment:


Attachments
image of error message (269.83 KB, image/jpeg)
2020-12-10 04:38 CET, Ben McMonagle
Details
journalctl -d as root output (172.12 KB, text/plain)
2020-12-11 02:25 CET, Ben McMonagle
Details

Description Ben McMonagle 2020-12-10 04:10:32 CET
Description of problem: attempting to run the memory test from the installer boot menu causes the system to immediately reboot or to report an error, likely hardware specific


Version-Release number of selected component (if applicable):
Mageia-8-beta2-x86_64
DATE.txt: Tue 08 Dec 2020 06:18:06 PM CET
md5sum:   68c6b67cb8325716a3eaa314afedc658


How reproducible: every time


Steps to Reproduce:
1. boot current M8 beta-2 iso 
2.run memory test
3.
Comment 1 Dave Hodgins 2020-12-10 04:37:03 CET
I have it running ok currently on my asus laptop, currently at 4% of 16GB,
so it looks like it is a hardware problem.

CC: (none) => davidwhodgins
Resolution: (none) => WORKSFORME
Status: NEW => RESOLVED

Comment 2 Ben McMonagle 2020-12-10 04:38:15 CET
Created attachment 12066 [details]
image of error message
Comment 3 Dave Hodgins 2020-12-10 05:05:28 CET
https://en.wikipedia.org/wiki/Machine-check_exception

On my laptop, it completed the 1st check of all 16GB of ram with all tests in
27 minutes, with no errors detected before it started the second pass at which
time I pressed the escape key to reboot.
Comment 4 Martin Whitaker 2020-12-10 13:06:57 CET
According to https://en.wikipedia.org/wiki/Machine-check_exception, a machine check exception indicates a hardware fault and can't be caused by software, so Dave is probably right, but I'd like to see if we can find out what's causing it.

From the screenshot I deduce that the fault is occurring when the application is probing the ACPI tables to determine how many CPU cores are available. I've uploaded a small test ISO ("memtest.iso") to the "testing" directory on the rsync server that just contains a version of PCMemTest with SMP detection disabled. Could you try that Ben, and see if it bypasses the fault.

Resolution: WORKSFORME => (none)
Status: RESOLVED => REOPENED
Assignee: bugsquad => mageia
CC: (none) => mageia

Comment 5 Ben McMonagle 2020-12-10 19:21:45 CET
for some reason I cannot download the file

I tried mageiasync and myrsync
Comment 6 Dave Hodgins 2020-12-10 19:37:23 CET
(In reply to ben mcmonagle from comment #5)
> for some reason I cannot download the file
> 
> I tried mageiasync and myrsync

In the preferences, did you change the Source from
rsync://isoqa@bcd.mageia.org/isos/mageia8-beta2/
to
rsync://isoqa@bcd.mageia.org/isos/testing/

I was able to download it using mageiasync. Don't forget to change the
Source back after downloading that one.
Comment 7 Martin Whitaker 2020-12-10 19:38:55 CET
I used dorsync, which worked. But mageiasync wants an extra level of hierarchy. I've changed it now so it is in the testing/memtest directory.
Comment 8 Dave Hodgins 2020-12-10 19:41:27 CET
Oops. Guess I looked to see what was on the server and tested mageiasync just
after it was changed.
Comment 9 Ben McMonagle 2020-12-10 20:19:18 CET


(In reply to Martin Whitaker from comment #7)
> I used dorsync, which worked. But mageiasync wants an extra level of
> hierarchy. I've changed it now so it is in the testing/memtest directory.

thanks for the move.

ok, apart from booting straight into the memtest - ok, F1 and esc work as expected

thanks
Comment 10 Martin Whitaker 2020-12-11 01:45:06 CET
PCMemTest is directly bootable from the BIOS, so there's no intermediate bootloader on that test ISO.

Seems likely to be an ACPI problem. Do you need to use any of noacpi/noapic/nolapic on the boot command line on that machine? If not, could you attach the output of 'journalctl -b' run as root on that machine so I can see what the Linux kernel found in the ACPI tables.

In any case, I'm adding a startup option to enable/disable the probe for additional CPU cores. I'll make it default to disabled, as that's the safe option.
Comment 11 Ben McMonagle 2020-12-11 02:24:36 CET
(In reply to Martin Whitaker from comment #10)

> Seems likely to be an ACPI problem. Do you need to use any of
> noacpi/noapic/nolapic on the boot command line on that machine? 

no, it normally has no issues with defaults.
Comment 12 Ben McMonagle 2020-12-11 02:25:45 CET
Created attachment 12068 [details]
journalctl -d as root output
Comment 13 Martin Whitaker 2020-12-11 18:11:16 CET
I've added the option to enable/disable SMP by pressing F2 during the 5 second delay before the tests start running (if SMP is disabled, the ACPI probe won't be performed). I've also modified the ACPI probe code to match the order that Linux does it (and spotted and fixed a couple of bugs in the process). I've uploaded both 32-bit and 64-bit versions of the test ISO to the testing/memtest directory. Could you give them a try, Ben. Either should work for legacy boot, pick the appropriate version for UEFI boot. They work on all my machines (but then, so did the old version...).
Comment 14 Ben McMonagle 2020-12-11 21:30:47 CET
both do work with CSM/legacy boot

machine will only boot 64 UEFI, and that tests ok too

F2 option works in both 32 & 64.

good to go?
Comment 15 Martin Whitaker 2020-12-12 12:01:47 CET
Thanks Ben. I've pushed pcmemtest-1.4 to cauldron, so it will be fixed on installed systems and on the 8-rc ISOs when they are built.

Source RPM: (none) => pcmemtest-1.3

Comment 16 Ben McMonagle 2021-01-08 01:08:54 CET
works mga8-rc

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.