Bug 16032 - GDM timing race issue on LiveDVDs occasionally ends up with a fail whale
Summary: GDM timing race issue on LiveDVDs occasionally ends up with a fail whale
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Release (media or process) (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: High critical
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2015-05-24 10:31 CEST by Rémi Verschelde
Modified: 2017-01-07 17:03 CET (History)
14 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Rémi Verschelde 2015-05-24 10:31:25 CEST
+++ This bug was initially created as a clone of Bug #15653 +++

Quoting Martin Whitaker from bug 15653 comment 83:
> Part of the problem is that there are several different bugs with the same
> "Oh no" end result. I've just done some testing with the QA pre-release
> 64-bit GNOME LiveDVD, and I'm seeing it occasionally fail  - maybe one in
> twenty boots (I'd never have noticed if it hadn't failed the first time I
> tried it!).
> 
> Looking in the journal from a failed attempt, the first abnormal message is
> 
>   systemd[1]: prefdm.service: main process exited, code=exited,
> status=1/FAILURE
>   systemd[1]: Unit prefdm.service entered failed state.
>   systemd[1]: prefdm.service failed.
>   gdm-Xorg-:0[3004]: (EE)
>   gdm-Xorg-:0[3004]: Fatal server error:
>   gdm-Xorg-:0[3004]: (EE) Server is already active for display 0
>   gdm-Xorg-:0[3004]: If this server is no longer running, remove
> /tmp/.X0-lock
>   gdm-Xorg-:0[3004]: and start again.
> 
> following shortly after
> 
>   finish-install[1986]: calling ask_gnome_reboot
> 
> Looks like there is a race here, and gdm is trying to restart the server
> before it has finished shutting down.
Rémi Verschelde 2015-05-24 10:40:46 CEST

Depends on: (none) => 16033

Rémi Verschelde 2015-05-24 10:43:03 CEST

Depends on: 15653 => (none)

Comment 2 Martin Whitaker 2015-05-28 21:59:57 CEST
This seemed to be fixed in round 2 but came back in round 4, and is still occurring in the latest round. Is the 2 second delay still present?

CC: (none) => tmb

Comment 3 Thomas Backlund 2015-05-28 22:09:19 CEST
nope, round4+ tested if dropping "--no-block" from systemctl restart command would make it wait long enough, but apparently not, so I will re-add the delay
Comment 4 Rémi Verschelde 2015-06-02 11:37:12 CEST
Just a reminder for Thomas, don't forget to re-add the delay in the next round to hopefully workaround this bug :)
Comment 6 Samuel Verschelde 2015-06-06 02:06:35 CEST
So it's fixed, isn't it?

Keywords: (none) => NEEDINFO

Comment 7 Thomas Backlund 2015-06-06 09:37:19 CEST
Atleast mitigated... not sure if we are able to avoid it on all hw, but that remains to be seen...
Comment 8 Alberto Girlando 2015-06-08 18:07:43 CEST
If you wish, I can test the new live *.iso on different systems. I have tested the RC on notebook with intel 810 and later, Intel GMA 3600, ATI Radeon, and VBox emulator. In all these systems the results is the same when booting from CD or DVD, i.e. the Oh no! error.
If you wish me to make the test, simply let me know where I can get the new *.iso.
Comment 9 Lewis Smith 2015-06-08 20:45:22 CEST
Tried with latest (~6 June) Gnome Live DVD on real EFI hardware.
Booting from DVD, Live -> "Oh No! Something has gone wrong".
Booting from USB, Live -> runs OK, and installs OK from the desktop.
Comment 10 Nicolas Lécureuil 2015-06-15 10:09:00 CEST
can you test with latest iso ?
Comment 11 Lewis Smith 2015-06-15 11:05:18 CEST
(In reply to Nicolas Lécureuil from comment #10)
> can you test with latest iso ?
Have just done so, exactly the same result as my Comment 9.
Comment 12 Martin Whitaker 2015-06-16 00:09:44 CEST
(In reply to Nicolas Lécureuil from comment #10)
> can you test with latest iso ?

Using the latest (14th June) ISO, I've booted from a USB stick 25 times without a problem. Given the frequency of failure I was seeing, I'd want another 25 before I said it was fixed, but it looks good so far.

(In reply to Lewis Smith from comment #11)
> Have just done so, exactly the same result as my Comment 9.

I'd guess that's bug 16033.
Comment 13 Lewis Smith 2015-06-16 10:59:24 CEST
> (In reply to Lewis Smith from comment #11)
> > Have just done so, exactly the same result as my Comment 9.
> I'd guess that's bug 16033.
I agree; I should have been aware of that - and much sooner. I shall note on that the continuation of the problem, and unsubscribe from *this* bug.

CC: lewyssmith => (none)

Rémi Verschelde 2015-06-16 15:29:53 CEST

Depends on: 16033 => (none)
See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=16033

Comment 14 Rémi Verschelde 2015-06-20 11:16:27 CEST
According to comment 7 and 12, this bug seems to be worked around, and at any rate Mageia 5 has been released so dropping priority ;)

Keywords: NEEDINFO => (none)
Priority: release_blocker => High

Comment 15 Alberto Girlando 2015-06-22 15:46:49 CEST
Tested Mageia 5 Gnome DVD on ATI Radeon and Intel 810 & later (two reasonably old systems), the error is still there. For me the work around has been: boot in the failsafe mode, then once in the maintenance console type "init 5". This worked in all the system I tested, perhaps it is worth adding this to the errata (maybe it is not compliant with systemd, but I not used to it...)
Comment 16 Martin Whitaker 2015-06-22 21:11:53 CEST
Alberto, have you confirmed that you are seeing the timing race, and not one of the other issues that leads to the "Oh no" screen?
Comment 17 Alberto Girlando 2015-06-23 08:12:26 CEST
In the RC, I have found the error on five different computers, only one had the ATI driver. The work around has been the same. That's why I was assuming the error was the timing race, but I am not sure. I am a simple user, trying to help, so unless precisely instructed, I do not know how to verify the origin of the error.
Comment 18 Martin Whitaker 2015-06-23 10:03:00 CEST
To verify what's causing the error, when you get the "Oh no" screen, press Ctrl-Alt-F2 to switch to a text login screen, log in as 'root', and type 'journalctl'. This will output the system log, using the 'less' pager. Search for the error message in this bug's description. '/' is the less command to start a search, so

  /Server is already active

should find it. If you get the message "Pattern not found", it's likely you are seeing a different bug, and we need to look through the system log for other error messages.
Comment 19 Samuel Verschelde 2016-10-10 17:27:56 CEST
Hi Alberto. Could you answer the question at comment #18?

Has anyone else recently got the timing race issue in Mageia 6 Lives?

Keywords: (none) => NEEDINFO

Comment 20 Alberto Girlando 2016-10-10 18:26:59 CEST
Hi Samuel,
unfortunately I had very little spare time for testing in this period. Anyway, I just tried mageia sta-1 live 32 bits on virtual box, and no problems. I will try tp test other machines and let you know.

Hardware: x86_64 => i586

Comment 21 Marja Van Waes 2017-01-07 10:18:33 CET
(In reply to Samuel Verschelde from comment #19)
> Hi Alberto. Could you answer the question at comment #18?
> 
> Has anyone else recently got the timing race issue in Mageia 6 Lives?

(In reply to Alberto Girlando from comment #20)
> Hi Samuel,
> unfortunately I had very little spare time for testing in this period.
> Anyway, I just tried mageia sta-1 live 32 bits on virtual box, and no
> problems. I will try tp test other machines and let you know.

In this bug report, no one reported (maybe) still having this problem since 2015-06-23

Closing as FIXED. Feel free to reopen if needed.

Status: NEW => RESOLVED
CC: (none) => marja11
Resolution: (none) => FIXED

Alan Augustson 2017-01-07 17:03:05 CET

CC: alan.augustson => (none)


Note You need to log in before you can comment on or make changes to this bug.