Bug 13284

Summary: GUI (KDE and others) login completes before automount daemon is ready
Product: Mageia Reporter: Mike Arnautov <mla>
Component: RPM PackagesAssignee: Colin Guthrie <mageia>
Status: RESOLVED OLD QA Contact:
Severity: normal    
Priority: Normal CC: doktor5000, fri
Version: 4   
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
URL: https://forums.mageia.org/en/viewtopic.php?f=8&t=7401
Whiteboard:
Source RPM: systemd-208-10.5.mga4.src.rpm CVE:
Status comment:
Attachments: systemd-analyze plot for fstab automout
systemd-analyze plot for autofs mount
Output of "journalctl -b -u athlon-btrfs.mount"
Systemd SVG plot after a problematic boot
References to autofs/automount in journalctl -b

Description Mike Arnautov 2014-04-29 19:28:04 CEST
Description of problem:

When remote file systems are mounted via autofs,(KDE, Gnome...) before the automount daemin is actually running. This is particularly serious if the whole home directory is automounted. User gets logged in into / which is found to be no-writable (of course).

If home directory is local, and hence it is possible to login, attempts to access automounted file systems results in "no such file or directory". This condition persists for up to 20+ seconds after login is completed. During that time the automount daemon is absent from the process table. As sson as the daemon appears in the process table, error messages cease and the remote file system is accessed after a brief pause.

The automount set up used is extremely simple. The auto.master file contains just one non-comment line:

/athlon auto.athlon

The indirect map auto.athlon also contains just one line:

btrfs -rw athlon:/btrfs

On the server, the relevant file system is listed in the exports file thusly:

/btrfs 192.168.1.0/8(no_root_squash,async,secure,no_subtree_check,rw)

The fact that the exported mount is from a BTRFS file syste is not an issue. The same effect has been observed exporting and automounting an ext4 file system

Only GUI login is affected. Console login does not become available until the automount daemon is running. Curiously enough, the problem is alleviated (but not resolved) by shifting the autofs.service file from multi-user.target.wants to graphical.target.wants. This shortens the period of the automount process' absence by 10+ seconds.

Version-Release number of selected component (if applicable):


How reproducible: On my machines, 100%


Steps to Reproduce:
1.Set up automount to mount home directories from a file server
2. Reboot
3. Login as soon as possible




Reproducible: 

Steps to Reproduce:
Comment 1 Florian Hubold 2014-04-29 22:45:58 CEST
See also the forums thread https://forums.mageia.org/en/viewtopic.php?f=8&t=7401 for related information.

URL: (none) => https://forums.mageia.org/en/viewtopic.php?f=8&t=7401
CC: (none) => doktor5000

Comment 2 Colin Guthrie 2014-04-30 09:52:23 CEST
FWIW, a lot of automount setups can now be done inside systemd itself with .automount units or by specifying the x-systemd.automount option inside fstab. This generally simplifies the whole process.

That said, external automounting is obviously still supported.

Can you test the following workaround to see if it's a valid solution:

As root:

1) mkdir -p /etc/systemd/system/autofs.service.d/
2) echo "Before=systemd-user-sessions.service" > /etc/systemd/system/autofs.service.d/mga13284.conf
3) Reboot.

If this is sufficient we can patch the official unit accordingly.
Comment 3 Mike Arnautov 2014-04-30 11:46:57 CEST
Have tried. Makes no difference. Sorry.

I did try using x-systemd.automount in fstab thusly (currently commented out):

# athlon:/btrfs /athlon/btrfs nfs soft,x-systemd.automount,noauto,_netdev 0 0

That has the unfortunate effect of making boot/login slower. By "boot/login" I mean the overall time before the desktop becomes fully available to users, including the remote file system mount. The extra time is about 15 to 20 seconds.

Interestingly, the delay with x-systemd.automount is about the same as one gets by just using normal nfs mount in fstab, which is what prompted me to use autofs in the first place.
Comment 4 Mike Arnautov 2014-04-30 11:50:56 CEST
Actually, your suggested woraround *does* have an effect compared to standard arrangement of autofs.service being in multi-user.target.wants. From the end-user PoV the effect is the same as moving autofs.service into graphical.target.wants. With the latter arrangement your workaround shows no additional gains.
Comment 5 Colin Guthrie 2014-04-30 12:08:28 CEST
The 10 to 20 seconds extra very much sounds like the delays introduced due to waiting for the network to initialise and come online. i.e. the intended behaviour!!

With a standard NFS mount in fstab, a delay will be automatically introduced as the filesystem is seen as critical to the boot process. i.e. any mountpoint not explicitly marked as non-critical (the "nofail" option) will have to be mounted and ready before the user sessions are allowed.

In the case of a network mountpoint, this has the additional effect of ensuring we wait for the network to be properly initialised such that we can mount things.

It seems to that you're trying to work around things that are explicitly and deliberately done in order to ensure smooth ordering and then complaining that things are no longer stable! :)

Theoretically if you combine x-systemd.automount and nofail options, I *think* this could have the desired effect. i.e. allow the user sessions to be started quickly, but still wait for the actual automount when it is requested. I'm not 100% sure of that as I'd have to look at the fstab generator code and double check the deps it adds to the generated mount/automount units, but it would certainly make sense to me that this combination *should* do what you want...
Comment 6 Mike Arnautov 2014-04-30 12:46:49 CEST
Um... I am not complaining! My users (family) are! :-)

Yes, one can use nofail in /fstab, but I've just timed it:

Using systemd's automount via fstab, the first login immediately after boot took 34 seconds. Automounted fs available immediately when login completed.

Using "old-fashioned" autofs, first login took 14 seconds, and automounted fs was unavailbale for another 6 seconds. That's 20 seconds altogether.

So why does systemd take additional 14 seconds?
Comment 7 Colin Guthrie 2014-04-30 13:25:01 CEST
(In reply to Mike Arnautov from comment #6)
> Um... I am not complaining! My users (family) are! :-)

:)

> So why does systemd take additional 14 seconds?

Good question. Perhaps you could upload the results of "systemd-analyze plot >plot.svg" from both scenarios?

I'm suspecting that there are other services involved which are holding up the boot. Perhaps the network-up service adds some additional safety margin that was somehow not triggered before.
Comment 8 Mike Arnautov 2014-04-30 16:39:17 CEST
Created attachment 5129 [details]
systemd-analyze plot for fstab automout
Comment 9 Mike Arnautov 2014-04-30 16:40:02 CEST
Created attachment 5130 [details]
systemd-analyze plot for autofs mount
Comment 10 Colin Guthrie 2014-04-30 16:59:38 CEST
Hmm, in both cases, it seems to take until about 57s for the "network-up.service" unit to complete. The main difference between the two plots is that in the fstab automount case the "athlon-btrfs.mount" seems to be ongoing - e.g. hanging... I'm guessing that you performed the plot before this unit was actually finished (likely due to it taking so long).

It would be very interesting to get the output from "systemctl status -n 0 athlon-btrfs.mount" after boot - specifically check the "Active:" line, and wait until it no longer says "Active: activating..." and either says "active (mounted)" or something showing a proper failure (e.g. "inactive (dead)").

When it reaches this state then the output from  "journalctl -b -u athlon-btrfs.mount" would be nice too.

But before doing this, can you try adding "systemd.log_level=debug" to the kernel command line for that boot just so you generate extra log info that could help!

Many thanks.

PS, I really wish you hadn't called your share btrfs! I know it's an NFS mount and thus the filesystem is irrelevant here, but my brain is still struggling with that name :p
Comment 11 Mike Arnautov 2014-04-30 17:13:58 CEST
OK. I'll try to do that, but it'll have to wait till tomorrow at least -- we have guests coming and I have to go off-line.

But... No, in both cases the plot was done after the remote file system became available.

Also, not to forget... While automounting via autofs is supported by systemd, graphical login really ought to be delayed until after autofs is ready -- as quite properly happens for console login.
Comment 12 Colin Guthrie 2014-04-30 17:56:00 CEST
(In reply to Mike Arnautov from comment #11)
> Also, not to forget... While automounting via autofs is supported by
> systemd, graphical login really ought to be delayed until after autofs is
> ready -- as quite properly happens for console login.

Yeah, I agree that regardless of finding fixes for the other ways, this should likely still be done!

Strangely tho', both graphical and console logins should be ordered After: systemd-user-sessions.service (unsurprisingly!), and thus whatever the situation is now, should be the same for both. What I mean is that this service is the main ordering point for when "logins are possible" - I suspect that it's only that tty1 isn't available until various other boot stuff is finished, but technically you could maybe proactively switch to tty2 during boot and actually login in much the same way as you login to a graphical login - i.e. before autofs is ready. If this is the case, then the fix suggested in comment 2 is likely the correct one.
Comment 13 Mike Arnautov 2014-05-01 15:40:56 CEST
There turns out to be a complicating factor, namely the history of mount requests to the file server (which, incidentally, is also a Mageia 4 machine). If the mout request to the file server is the first one the server received since *its* boot time, then

a) If the client machine does automount via fstab, then it is very likely to take of the order of two minutes. Out of 6 or 7 such occasions all but one were very slow.

b) If the client machine uses autofs, then the initial boot under the same conditions is very likely *not* to be slow. Out of the last 5 such boots I saw just one very slow one.

In all cases the server had at least several minutes after completing its boot.

I cannot tell whether the apparent difference in the frequency of veru slow mounts is significant or due to somewhat unlikely but possible chance. It is also possible that after a prolonged time with no NFS access to it, the server spontaneously gets into the state causing the next mount request to be slow.

Worse, after all this playing about with the two machines I can no longer reproduce my original problem. Even when using autofs without your drop-in additional service, the remote mount is now always  ready by the time my login completes (and client machine boot always takes 10 to 15 seconds longer than it previously used to with autofs). I have *no* idea what may have caused the difference. To the best of my knowledge I have undone all config changes introduced during my investigation of the problem.

Which leaves the family happy, but not me. The upshot is that I can no longer exhibit any untoward behaviour by systemd, so I am effectively wasting your time. :-( 

If you are still interested, I do have the additional info you have requested and I can post it here. In the meantime, though, the sensible thing is probably to close the bug --it can be reopened if the sysmptoms reappear.
Comment 14 Mike Arnautov 2014-05-01 15:44:21 CEST
Sorry, should have added... Console logins have always been precluded on tty2 as well as on tty1 until autofs was fully ready.
Comment 15 Mike Arnautov 2014-05-12 18:20:04 CEST
Well, after nearly two weeks the problem is back today, and I *swear* I haven't changed anything on either machine, other than applying routine updates as they come along.

To be quite clear... I am not talking about the two minute boot/login, which is down to some delay on the server side (as domenostrated by the fact that it happens, though not always, when the server receives its first NFS mounting request after server's reboot.

I am talking about the situation where the boot and login take the usual amount of time (about 25 seconds from start of boot to the KDE login prompt, about 15 seconds from login to desktop being available to the user, plus say 4 seconds to type ID and password in beteen). The desktop becomes available to the user before the automount pops up in the process table, with the result that an attempt to access automounted content results in "no such file or directory". 

I have now reproduced it both without and with the /etc/systemd/system/autofs.service.d/mga13284.conf (containing "Before=systemd-user-sessions.service", with no quotes.)

One point which may or may not be significant... During the period when I could not reproduce the problem, the login box (id and password fields) wuold go away, but the screen would otherwise remain unchanged -- with the small couldron in the bottom left corner. It would sit there for about 15 or so seconds, before switching to the scree with the bigger cauldron and bubbles gradually rising from it.

When the problem does occur, the switch to the other screen happens within a few seconds. I do not recall a single exception to this relationship.
Comment 16 Mike Arnautov 2014-05-12 18:39:54 CEST
Created attachment 5151 [details]
Output of "journalctl -b -u athlon-btrfs.mount"
Comment 17 Mike Arnautov 2014-05-12 18:41:08 CEST
Created attachment 5152 [details]
Systemd SVG plot after a problematic boot
Comment 18 Mike Arnautov 2014-05-12 18:43:11 CEST
NB: both of the new attachments were generated after a boot with systemd.log_level=debug added to the grub boot entry, as requested.
Comment 19 Mike Arnautov 2014-05-13 14:11:02 CEST
Created attachment 5155 [details]
References to autofs/automount in journalctl -b

This may help. I grepped for autofs and automount in the output of journalctl -b and it shows the drop-in autofs.service.d/mga13284.conf being ignored due to " Assignment outside of section".
Comment 20 Morgan Leijström 2014-06-14 02:37:34 CEST
I read somewhere that NFS may introduce delays if time are not the same on server and clients.  So if you not have already, set up NTP time sync on all systems.

PS  Thanks Colin for mentioning the x-systemd.automount fstab option !
( i was reading up on the old way when i found this thread. )

CC: (none) => fri

Comment 21 Mike Arnautov 2014-06-14 21:52:14 CEST
NTP is running on both machines.
Comment 22 Samuel Verschelde 2015-09-21 13:20:20 CEST
Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer 
maintained, which means that it will not receive any further security or bug 
fix updates.

Package Maintainer: If you wish for this bug to remain open because you plan to 
fix it in a currently maintained version, simply change the 'version' to a later 
Mageia version.

Bug Reporter: Thank you for reporting this issue and we are sorry that we weren't 
able to fix it before Mageia 4's end of life. If you are able to reproduce it 
against a later version of Mageia, you are encouraged to click on "Version" and 
change it against that version of Mageia. If it's valid in several versions, 
select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.

Although we aim to fix as many bugs as possible during every release's lifetime, 
sometimes those efforts are overtaken by events. Often a more recent Mageia 
release includes newer upstream software that fixes bugs or makes them obsolete.

If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].

[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/
Comment 23 Marja Van Waes 2015-10-27 06:57:36 CET
As announced over a month ago, Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer maintained, which means that it will not receive any further security or bug fix updates.

This issue may have been fixed in a later Mageia release, so, if you still see it and didn't already do so: please upgrade to Mageia 5 (or, if you read this much later than this is written: make sure you run a currently maintained Mageia version)

If you are able to reproduce it against a maintained version of Mageia, you are encouraged to 
1. reopen this bug report, by changing the "Status" from "RESOLVED - OLD" to "REOPENED"
2. click on "Version" and change it against that version of Mageia. If you know it's valid in several versions, select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.
3. give as much relevant information as possible. If you're not an experienced bug reporter and have some time: please read this page:
https://wiki.mageia.org/en/How_to_report_a_bug_properly

If you see a similar issue, but are _not_sure_ it is the same, with the same cause, then please file a new bug report and mention this one in it (please include the bug number, too). 


If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].
[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/

Status: NEW => RESOLVED
Resolution: (none) => OLD