Bug 24089

Summary: System completely freeze with systemd logs mentionning FS being read only
Product: Mageia Reporter: Augier <christophe>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED OLD QA Contact:
Severity: critical    
Priority: Normal CC: basesystem, ftg, ghibomgx, kernel, marja11, ouaurelien
Version: 6   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: kernel? CVE:
Status comment:
Attachments: Smartmontools test result log
log.1.txt
log.2.txt

Description Augier 2018-12-29 16:44:07 CET
Description of problem:

This one is pretty stange. I mentionned the kernel as the problematic package though I'm not sure the kernel itself is responsible. This bug happens on kernel 4.14 and kernel 4.9.56 and even after a fresh system install (with /home untouched and unformatted though).

This bug happens quite randomly and, due to the very specific nature of it (the whole system freezes, any shell command becomes impossible to execute due to them not being found), I'm unable to collect any log when it happens. 

The only thing I'm able to do when it happen is to switch to another shell (Ctrl+Alt+F2) and see systemd printing messages about /dev/sda being read only (messages I'm not able to collect since I can't do anything on the system).


Version-Release number of selected component (if applicable):

Mageia 6, kernel 4.14 & 4.9

How reproducible:

This bug is unreproductible as of now. I'm asking for help for a diagnostic of what's happening.
Comment 1 Frank Griffin 2018-12-29 17:55:20 CET
I've seen this before.  It happens when disk errors occur on the root partition.  The kernel responds by making the partition read-only to minimize further damage.

If you're lucky, the bad blocks are within a log file of some sort.  Boot from a rescue disk and run fsck on the partition with badblocks enabled, and then try to reboot.  The reboot will start out with the partition read-write, and if you don't hit further errors, it should stay that way.

CC: (none) => ftg

Comment 2 Marja Van Waes 2018-12-31 18:27:03 CET
Thanks, Frank!

@ Augier

Was Frank's advice useful?

CC'ing the kernel maintainers.

CC: (none) => kernel, marja11

Comment 3 Augier 2019-01-01 15:31:41 CET
I already ran fsck before (though not sure I ran it with bad block option). I doubt this is the problem, though, because I made a clean install of / since then and the problem persists. Plus, if I hard reboot the computer, the partition is not read-only anymore. Someone on another channel evoked a potential hardware failure and advised me to run tests using smartctl which I still have to do.

I regret that this problem slows my release of the Handbrake 1.2.0 package for Mageia.
Comment 4 Frank Griffin 2019-01-01 15:52:45 CET
>I doubt this is the problem, though, because I made a clean install of / since then and the problem persists.

If the bad blocks had not been reassigned, this is entirely possible.

>Plus, if I hard reboot the computer, the partition is not read-only anymore.

Of course.  The read-only condition is not inherent in the hardware, only in the currently running kernel instance.  AS I said above, if you reboot, it will revert to read-write until another error forces read-only again.
Comment 5 Augier 2019-01-01 16:12:20 CET
Ok. I will try to run a few tests tomorrow then and come back to you.
Comment 6 Augier 2019-01-01 18:14:56 CET
I ran a SMART test this afternoon and simingly, it found no problem. I enclose the log as smart_result.log
Comment 7 Augier 2019-01-01 18:15:34 CET
Created attachment 10631 [details]
Smartmontools test result log
Comment 8 Marja Van Waes 2019-01-02 20:45:01 CET
(In reply to Augier from comment #0)
 
> 
> The only thing I'm able to do when it happen is to switch to another shell
> (Ctrl+Alt+F2) and see systemd printing messages about /dev/sda being read
> only (messages I'm not able to collect since I can't do anything on the
> system).
> 

Where is your root partition?

When this happens again, then, immediately after reboot, as root run:

     journalctl -ab-1 > log.txt

Is log.txt indeed from the previous time you booted the system, as it should be?

If so, then please attach it.

Keywords: (none) => NEEDINFO

Comment 9 Augier 2019-01-03 20:10:33 CET
It happened yesterday night. I stopped my computer using the shell and left it on the couch. Around 4:00 am, I found it, still turned on. Moving to shell 2, I saw these systemd catacteristic messages about FS being read only. The clock was indicating 3:00 am so I supposed it had been in this state for an hour. I have no idea why the computer did not stop as I demanded but there is nothing in the logs after 22;53 yesterday evening. I included an exerpt of it between 22:41 and tonight's boot.
Comment 10 Augier 2019-01-03 20:11:13 CET
Created attachment 10634 [details]
log.1.txt
Comment 11 Marja Van Waes 2019-01-03 21:40:27 CET
Thanks, 

So one second after 

janv. 02 22:53:04 localhost org_kde_powerdevil[4096]: powerdevil: Suspend session triggered with QMap(("Explicit", QVariant(bool, true))("Type", QVariant(uint, 2)))
janv. 02 22:53:04 localhost org_kde_powerdevil[4096]: powerdevil: Suspend session triggered with QMap(("Explicit", QVariant(bool, true))("SkipFade", QVariant(bool, true))("Type", QVariant(uint, 2)))
janv. 02 22:53:04 localhost org_kde_powerdevil[4096]: powerdevil: Starting Login1 suspend job

logging stops, which suggests the disk was remounted read-only at that time.

CC'ing KDE team.

Source RPM: kernel-desktop-latest => kernel? powerdevil?
Keywords: NEEDINFO => (none)
CC: (none) => kde

Comment 12 Augier 2019-01-05 10:19:26 CET
It happened again during last boot, 2 days ago. I enclose the full log of the boot.
Comment 13 Augier 2019-01-05 10:20:00 CET
Created attachment 10646 [details]
log.2.txt
Comment 14 Marja Van Waes 2019-01-05 17:15:27 CET
(In reply to Augier from comment #12)
> It happened again during last boot, 2 days ago. I enclose the full log of
> the boot.

but there:

janv. 03 19:52:13 localhost kdeinit5[4180]: kscreen.kded: PowerDevil SuspendSession action not available!

So suspend seems no longer a possible culprit.


I'm lost, I have no idea how this can be debugged, I mean: if the entire filesystem becomes read-only, then moving /var/log/journal/ to a different (external) disk and linking /var/log/journal to it, won't help.

Do you have a camera with which you could record the messages in tty2?
Comment 15 Augier 2019-01-05 17:36:44 CET
> Do you have a camera with which you could record the messages in tty2?

Yes, I could take photos or videos with my phone.
Comment 16 Giuseppe Ghibò 2019-01-05 18:27:24 CET
Are you sure your hardware is OK? For instance you could run a memtest, to check that some of your memory modules are not faulty. E.g. this one: https://www.memtest86.com.

And try to disable nouveau (boot passing nouveau.modeset=0 to your booting command line), relying only on intel video card of your Skylake Core i7 6700HQ.

Also check that it's not a temperature problem (running "sensors" should show the CPU and other temps).

CC: (none) => ghibomgx

Comment 17 Marja Van Waes 2019-01-08 14:43:26 CET

Does this _never_ happen when you're typing or moving your mouse, but only after having been inactive for a while?

If so, then please check your "System Settings" => "Energy Saving" settings and such.
Comment 18 Augier 2019-01-08 18:18:03 CET
No, it really happends randomly. Sometimes when I'm using the computer, sometimes when it's just locked. It happened yesterday when I was working on my server.
Comment 19 Marja Van Waes 2019-01-13 08:23:03 CET
(In reply to Augier from comment #18)
> No, it really happends randomly. Sometimes when I'm using the computer,
> sometimes when it's just locked. It happened yesterday when I was working on
> my server.

So I was wrong about powerdevil being a possible culprit, but since it hasn't been proven (yet) that this is a hardware issue: 

Maybe there's more information in tty2 after the freeze, if you manage to let it show more lines. In the past, that could be done by adding some "vga=" value to the kernel parameters in your bootloader screen. I can't find how that should be done nowadays, maybe with "vconsole.font=", but then I don't know how to find the possible values. I hope you're better at figuring that out, or that one of the base system maintainers will tell :-)

Assignee: bugsquad => kernel
Source RPM: kernel? powerdevil? => kernel?
CC: kde => basesystem

Comment 20 Aurelien Oudelet 2020-08-16 22:23:05 CEST
Does this still apply with Mageia 7 or Cauldron?

CC: (none) => ouaurelien

Comment 21 Aurelien Oudelet 2020-08-26 11:33:12 CEST
Since we have not received feedback to the information we have requested above, we will assume the problem was not reproducible, or has been fixed in one of the updates we have released for the reporter's distribution.

Users who have experienced this problem are encouraged to upgrade to the latest update of our distribution, and if this issue turns out to still be reproducible in the latest update, please reopen this bug with additional information.

Closing as OLD.

Status: NEW => RESOLVED
Resolution: (none) => OLD