Bug 5274

Summary: boot after network minimal setup hangs a bit, and does not produce login on tty1
Product: Mageia Reporter: AL13N <alien>
Component: InstallerAssignee: Colin Guthrie <mageia>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: release_blocker CC: davidwhodgins, ennael1, mageia, thierry.vignaud, tmb
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: mageia first boot output

Description AL13N 2012-04-07 20:52:34 CEST
i did a boot.iso network install, with a /boot, a swap and a / in btrfs

then chose minimal installation, but of course that didn't boot, since it likely doesn't have btrfs tools in minimal install

thus, i used the rescue to try and fix it, but, even though the rescue has btrfs, it cannot mount btrfs partitions... :-(

i'm not versed enough in btrfs to find out what is wrong (i never saw mount.btrfs so i assume it's named differently)

CC'ing knowlegable people, i assume for people who know what this is, that they can fix it.

as a secondary (nice to have), in the installation, is there some kind of CAT_ target that is set when btrfs partitions are found? so that the rpmsrate can include btrfs tools in minimal install for people only having btrfs?

as such as it is, i cannot see a way to install a minimal installation when using btrfs.
AL13N 2012-04-07 20:52:45 CEST

CC: (none) => thierry.vignaud

AL13N 2012-04-07 20:53:51 CEST

CC: (none) => tmb

Comment 1 AL13N 2012-04-07 20:54:45 CEST
setting release blocker since it's an installer bug and thus cannot be fixed afterwards

Priority: Normal => release_blocker

Comment 2 Thierry Vignaud 2012-04-07 21:43:29 CEST
fixed in SVN: installing btrfs-progs if needed.
Comment 3 AL13N 2012-04-08 00:14:47 CEST
hmm, it seems looking through a rescue (i had to load btrfs module manually), i can see that btrfs-progs was installed...

since systemd was already running, i would guess dracut mounted / and so the btrfs module was loaded... so ... maybe this is not btrfs related...

trying to reproduce again to see why the first boot hangs
Comment 4 AL13N 2012-04-08 00:21:51 CEST
Created attachment 1945 [details]
mageia first boot output

added screenshot of what i see... perhaps one of these are a clue

(vbox VM btw)
Comment 5 AL13N 2012-04-08 00:55:49 CEST
ok, confirmed that this is nothing to do with btrfs.

how to reproduce:

1) use boot.iso (on a vbox?)
2) custom partitioning
3) (small /boot 500MB; swap 4GB; / 17GB)
4) choose custom
5) deselect everything in next screen
6) in next screen just proceed with defaults
7) after install reboot
8) boot up and see the screenshot

waited 30min once

always reproducable.

(i did choose belgian keyboard; and in summary selected Belgium)

Summary: rescue cannot mount btrfs, even though it has the tools, mount -t btrfs doesn't work => boot after network minimal setup hangs

Comment 6 Dave Hodgins 2012-04-08 02:12:18 CEST
I don't see anything in there about it trying to mount the root
fileystem, just /boot.

Try adding rd.break=pre-pivot as a kernel option.  That will
drop you to a bash shell after the root should be mounted,
but before the chroot to the on-disk /.  There are a limited
number of commands, available though.

You can then try to modprobe the module, and manually mount
the / filesystem, if it hasn't already been mounted, to see
what (if any) error messages are shown.

CC: (none) => davidwhodgins

Comment 7 AL13N 2012-04-08 02:21:18 CEST
well, since systemd is starting, the / was mounted fine in dracut.

so i don't think this is related to / (what i thought first).

someone mentioned it could be related to bug 4772

maybe it is, but with a different cause.

in any case, it just doesn't boot at all, not even after 30min
Comment 8 AL13N 2012-04-08 02:25:35 CEST
huh, wait. i now seem to have a login on tty2 (not 1)...

will retry for same behavior with btrfs tomorrow, to see if it's not related after all...
Comment 9 Dave Hodgins 2012-04-08 02:55:36 CEST
Btw, systemd gets started before / gets mounted.
Comment 10 AL13N 2012-04-08 08:08:58 CEST
ok, after looking more at tty12 since the boot, i actually see that systemd is finished, but still it hangs. then i notice it's still dhcping on the unconfigured eth1 interface.

and after a few minutes, apparently there's a login process on tty2, but NOT tty1. maybe this is also some issue.

so i imagine the real problem here is that i didn't notice the hang due to tty1 not spawning a login process for some reason...
AL13N 2012-04-08 08:34:23 CEST

Summary: boot after network minimal setup hangs => boot after network minimal setup hangs a bit, and does not produce login on tty1

Comment 11 Colin Guthrie 2012-04-08 13:15:36 CEST
@Dave: systemd does not get started before / gets mounted... that's impossible unless you include systemd in initrd and that's not something we do!

@AL12N: If it's trying to dhcp on eth1 this means that network-up script is probably still waiting for network to be up.

This is likely holding up graphical logins (which wait for network-up in some circumstances).

Some tests that would be interesting would be:
 1. Do you have any NFS mounts in /etc/fstab? (e.g. from the install?)
 2. Is the network-auth sysconfig script enabled?

If there are no NFS mounts (or any network mounts actually) in fstab, or network-auth is NOT enabled, then graphical logins should start without waiting for network-up. In a minimal install, this could mean that no DM is available, but in this case it should display a warning message.

CC: (none) => mageia

Comment 12 Colin Guthrie 2012-04-08 13:18:22 CEST
Actually..... OK, I know what's up :)

I made the warning message shown on TTY1 when X fails to start wait for default.target to complete. I did this because systemd was still outputting messages and the warning message got lost. In your case, it's waiting until the network-up times out before showing the default.target is reached. Thus no warning is shown until this happens. If you are patient, network-up should eventually time out and the warning will be displayed.

However, I should not wait for default.target as that is bogus. I should actually just send SIGRTMIN+21 to pid 1 in the script. This will tell systemd to shut up and not output stuff any more and I can display the warning early.

I'll try this.

Status: NEW => ASSIGNED
Assignee: bugsquad => mageia

Comment 13 AL13N 2012-04-08 14:17:51 CEST
(In reply to comment #12)
> Actually..... OK, I know what's up :)
> 
> I made the warning message shown on TTY1 when X fails to start wait for
> default.target to complete. I did this because systemd was still outputting
> messages and the warning message got lost. In your case, it's waiting until the
> network-up times out before showing the default.target is reached. Thus no
> warning is shown until this happens. If you are patient, network-up should
> eventually time out and the warning will be displayed.
> 
> However, I should not wait for default.target as that is bogus. I should
> actually just send SIGRTMIN+21 to pid 1 in the script. This will tell systemd
> to shut up and not output stuff any more and I can display the warning early.
> 
> I'll try this.

actually, i don't think so...

systemd list-jobs showed nothing, so i assume systemd was ready. but alas, on "warning message" as i get on graphical targets.

tty2 and others did show a login, without any emergency.

perhaps the default target is not graphical atm? i'll check.
Comment 14 Colin Guthrie 2012-04-08 14:23:10 CEST
If the default target is not graphical, then the getty on tty1 should show up fine, (prefdm.service conflicts with it, hence why it doesn't show up for graphical logins). So systemd is thinking it's going to start prefdm.service (thus handling the conflicts fine), but for some reason the OnFailure of prefdm is failing to kick in.

Having looked at this myself in my VirtualBox, it seems an upstream patch breaks this onfailure handling. Reverting that patch seems to make things work.

Without doing anything special to work around it, can you update to systemd 44-7 and see if things work better?
Comment 15 AL13N 2012-04-09 14:02:20 CEST
it has emergency screen on tty1 again
Comment 16 Colin Guthrie 2012-04-09 15:02:34 CEST
Yup, a patch in systemd broke the OnFailure handling

https://bugs.freedesktop.org/show_bug.cgi?id=45511

I've reverted that patch pending a fuller fix (the general principle of the patch is good).

Assuming this doesn't fail again (I'll be on my radar to test any future patch), can we close this bug? I did several installs yesterday and all was as expected.
Comment 17 AL13N 2012-04-09 21:25:10 CEST
i was planning on keeping this open until you reintroduced the patch with a fix for this use case...

but if you think it needs to be resolved, and a new one opened later, then sure...
Comment 18 Anne Nicolas 2012-04-13 12:56:48 CEST
ok let close it then

Status: ASSIGNED => RESOLVED
CC: (none) => ennael1
Resolution: (none) => FIXED