Bug 24978 - sick system, boot fails, likely multifactorial
Summary: sick system, boot fails, likely multifactorial
Status: RESOLVED WORKSFORME
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard: 7
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-20 10:00 CEST by Tony Blackwell
Modified: 2019-07-03 11:59 CEST (History)
3 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
ouput of journalctl -r (263.69 KB, application/x-xz)
2019-06-20 10:07 CEST, Tony Blackwell
Details

Description Tony Blackwell 2019-06-20 10:00:15 CEST
Immediate problem: boot fails every 2-3 seconds and recycles.
Hardware: 8-core Intel Core I7-7700K, 32Gb RAM, SSD boot disk plus half a dozen others.  GeForce GTX1080Ti graphics.
Upgraded M6 with M7 x86_64 beta3 went well and was current.  Started behaving badly, mostly due to ran out of disk space, but in context I was concerned re virus etc and did  RC re-install.  Used existing partitions, formatted install partition.  Was OK for 2-3 days

Seemed to 'go bad' over couple of days.  opencpn which had been working normally came up with greyed-out screen.  Was nouveau; attempts to install proprietary Nvidia driver failed after partial download (never seen that before.  Fast cable so that's not the issue).

Now boot attempts cycling every 3 sec or so.  Only able to use system now after advanced options boot to root recovery prompt, then init 5 brings up graphical session - but falls over again after power-down.

I'll attach output of journalctl -r in compressed format once bug established.
Comment 1 Tony Blackwell 2019-06-20 10:07:54 CEST
Created attachment 11118 [details]
ouput of journalctl -r
Tony Blackwell 2019-06-20 10:08:37 CEST

Whiteboard: (none) => 7RC

Comment 2 Shlomi Fish 2019-06-20 13:46:32 CEST
(In reply to Tony Blackwell from comment #1)
> Created attachment 11118 [details]
> ouput of journalctl -r

Can you try disabling the Xorg / gdm service? See https://duckduckgo.com/?q=gdm+systemd&atb=v140-1&ia=web . It seems to fail there.

CC: (none) => shlomif

Comment 3 Tony Blackwell 2019-06-21 11:14:49 CEST
disabling gdm and enabling sddm instead does not change the behaviour; boot still fails and startup cycles every 2-3 seconds.
Tony
Comment 4 Lewis Smith 2019-06-23 12:34:58 CEST
Can we take it that the M7 system is up-to-date?
If I understand your description correctly, if you boot 'rescue' and then 'init 5', that session works OK; and it is the *next* normal re-boot that fails.

I was confused by the number of sessions the journal contained, and what they represented. I wonder whether, the re-boot fault occuring rapidly, you could then re-boot rescue as you describe, and once that session flies, save the journal file. It is possible to time-limit it e.g. to today. The -r does not help.
This might enable comparison between the failing boot and one that (via rescue) works.

Despite the earlier download problem, can you try the proprietary nVidia driver in case that alters things?

CC: (none) => lewyssmith

Comment 5 Martin Whitaker 2019-06-23 22:36:10 CEST
If the boot is failing after 2-3 seconds, I doubt the journal is helping us - the boot will still be running off the initrd, so nothing will be saved on disk.

A system going bad over a couple of days makes me immediately suspect a hardware problem. Have you tried running smartctl to check your disks are healthy?

CC: (none) => mageia

Comment 6 Tony Blackwell 2019-07-03 11:59:49 CEST
Martin: I take your point.  Some of the issue was the system running out of disk space, and I wonder if some corruption as a result.  Disks are healthy.

Lewis, I absolutely appreciate your time and effort going through the journal - yes, I found it hard to track as well. Appreciate your suggestions as to how to narrow it down. 

This being my main system, I've decided to go to a clean install of released M7 x86_64, and after some days all is well. So, I've got over the problem without a satisfying explanation, but very grateful for the support and ideas from you both.

Marking as resolved.  My thanks
Tony

Status: NEW => RESOLVED
Whiteboard: 7RC => 7
Resolution: (none) => WORKSFORME


Note You need to log in before you can comment on or make changes to this bug.