I've noticed this on several fresh installs on two different machines since cauldron reopened. After the install, the first boot appears to hang at "Checking for new hardware". It's not a true deadlock, as I let one of them sit and the boot actually proceeded after about 10-15 minutes, but a 10-15 minute timeout for anything is insane. If I CTRL-ALT-DELETE to force a reboot, systemd takes over the shutdown, and it happens normally. On subsequent boots, "Checking for new hardware" completes within 5 seconds. No idea what causes this, or where to look for diagnostics. Reproducible: Steps to Reproduce:
Look at journalctl output (search for service_harddrake)
Keywords: (none) => NEEDINFOCC: (none) => thierry.vignaudSource RPM: drakx-tools => harddrake
Here's the output from the first boot: ************************************************************************* Sep 01 13:28:04 localhost.localdomain service_harddrake[821]: ### Program is starting ### Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: dmidecode Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: modified file /etc/sysconfig/harddrake2/kernel Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: Autoconfiguring laptop tools since we switched between laptop and desktop systems Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/systemctl disable --no-reload apmd.service Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) Sep 01 13:28:10 localhost.localdomain kernel: nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x10000000 FAULT at 0x002a00 [ !ENGINE ] Sep 01 13:28:12 localhost.localdomain systemd[1]: Received SIGRTMIN+20 from PID 287 (plymouthd). Sep 01 13:28:31 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:28:32 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:28:33 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. ... repeats every second or so ... Sep 01 13:29:06 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:07 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:35 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:36 localhost.localdomain systemd[1]: Received SIGINT. Sep 01 13:29:36 localhost.localdomain systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive. Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive. Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Unit entered failed state. Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Save boot dmesg content. Sep 01 13:29:36 localhost.localdomain kernel: audit_printk_skb: 3 callbacks suppressed Sep 01 13:29:36 localhost.localdomain kernel: audit: type=1131 audit(1441128576.919:39): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped target Sound Card. Sep 01 13:29:36 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd ... Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Login Service. Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd ... Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Name Service Cache Daemon. Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/systemctl enable --no-reload laptop-mode.service ************************************************************************* The "Stopped" line is probably the start of the CTRL-ALT-DELETE initiating shutdown. On the reboot, we have: ************************************************************************* Sep 01 13:30:12 localhost.localdomain service_harddrake[802]: ### Program is starting ### Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: running: dmidecode Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: modified file /etc/sysconfig/harddrake2/kernel Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: created file /etc/sysconfig/harddrake2/previous_hw Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: launched command: display_driver_helper --check-loaded ************************************************************************* and that's all.
Keywords: NEEDINFO => (none)
@Colin: it looks like if harddrake issues some systemctl commands during boot, systemd goes wild
CC: (none) => mageiaSource RPM: harddrake => harddrake, systemd
@Franck: please attach such info instead of pasting, as then it's wrapped and thus unreadable :-(
Yeah, the newer systemd versions have reworked a bit of the non-blocking stuff (although it could be the enable/disable logic too) that I put in a while back to avoid such situations. We should probably try and refactor as much of this as possible before MGA6 to try and avoid some of the calls rather than the sticking plaster I had in place previously. I'll try and take a look as I think there is also an issue about enabling/disabling when there is both a native systemd unit and a legacy sysvinit script (the latter should be dropped anyway as it has no value to us, but I think it's still a bug and thus should be fixed!)
(In reply to Thierry Vignaud from comment #4) > @Franck: please attach such info instead of pasting, as then it's wrapped > and thus unreadable :-( Good point. I usually look at these things with vi, so I'm too used to seeing it wrapped :-)
Still happening in current cauldron.
CC: (none) => eeeemail, tmb
It occurs to me that I've seen something similar (at least in symptom) in the past. I didn't do extensive diagnosis at the time, but it may have some bearing on this... A laptop of mine died because the power pin connection to the mainboard came detached and it couldn't charge or run from AC. As an interim solution, I simply transferred the hard drive to another laptop with different graphics, sound card, etc., specifications. When I tried to boot, it hung in the same way described here. I hadn't held out much hope that it would work, but I let it sit for about an hour. To my surprise, when I came back to it, the DM login screen was up, the system ran perfectly, and subsequent boots went fine. I assumed that harddrake2 had detected the hardware mismatch, and had triggered some sort of massive hardware analysis that configured all of the driver load specifications to the new hardware. I remember being pretty impressed with this. Could something be triggering this process unnecessarily here ?
CC: (none) => doktor5000
*** Bug 17299 has been marked as a duplicate of this bug. ***
CC: (none) => paiiou
Blocks: (none) => 15527
I have two remarks 1/ When I install Mageia, I have this message at the end of the installation of the packages: script failed for systemd-227-3mga6 script failed for pm-fallback-policy-0.1-14.mga5 2/ I also have a rather long wait (approximately 30 s) between the end of plymouth and the screen of sddm
(In reply to Georges Eckenschwiller from comment #10) > 1/ When I install Mageia, I have this message at the end of the installation > of the packages: > script failed for systemd-227-3mga6 > script failed for pm-fallback-policy-0.1-14.mga5 Hi, that doesn't belong here, it's covered in bug 17217 . > 2/ I also have a rather long wait (approximately 30 s) between the end of > plymouth and the screen of sddm That also doesn't really belong here, you should open a separate bug report if it's reproducible.
I do not know if it is important, but 1/ As indicated in bug 17299, It is mandriva-everytime which lasts a little more than 6 minutes 2/ I notice that my partitions sda6(/home), sda7-/var/www) and sda8(/var/ftp) aren't mounted at the end of the boot.
(In reply to Georges Eckenschwiller from comment #12) > 1/ As indicated in bug 17299 That's why it has been marked as a duplicate of this bug. > 2/ I notice that my partitions sda6(/home), sda7-/var/www) and > sda8(/var/ftp) aren't mounted at the end of the boot. That does not belong here.
Problem still present with systemd 228
Assigning to systemd maintainer and setting as blocker for Mageia 6.
Priority: Normal => release_blockerAssignee: bugsquad => mageiaSeverity: normal => major
Just tested as requested by coling with debug option for systemd. The point is it freezes very early in boot so we cannot see anything in a console. I'm using a test iso using last version of systemd which still fails in post-install: "ddebug.log:* urpmi error: ERROR: 'script' failed for systemd-229-1.mga6"
CC: (none) => ennael1
Created attachment 7491 [details] Debug shell for systemd
We could switch some "checked_on_boot => 1" into "checked_on_boot => 0" in http://gitweb.mageia.org/software/drakx/tree/perl-install/harddrake/data.pm Those days, there's quite a lot of things that are autoloaded by udev (sata, raid, pata, card reader, firewire, scsi, usb, cd/dvd, ...) Some are even obsolete (zip, ...) I'll push some fixes for this tonight Obviously those who take time are those who have a configurator and thus sg to do when hardware changes. We could also alter stage2 so that it dumps a dummy /etc/sysconfig/harddrake2/previous_hw so that we don't do anything on first boot if hw hasn't actually changed since installation.
I cannot reproduce this bug but I've pushed a patched harddrake service that will log what takes time. Please test with drakxtools-17.19-2.mga6
Working on it. I'll post here the result today
So I had to update release tag that was not updated so I used drakxtools-17.19-3.mga6 in test iso. Following logs for this. First boot will end at 21:21:01.
Logs can be found here: http://people.hupstream.com/~ennael/journal.log
This seems to be the issue and apmd is a legacy sysv init script: /etc/rc.d/init.d/apmd mars 06 21:21:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload apmd.service mars 06 21:21:04 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd). mars 06 21:21:07 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd). mars 06 21:23:02 localhost service_harddrake[686]: ERROR: killing runaway process (process=/bin/systemctl, pid=710, args=disable --no-reload apmd.service, error=ALARM at /usr/lib/libDrakX/run_program.pm line 241. ) mars 06 21:23:02 localhost service_harddrake[686]: running: /sbin/chkconfig --list --type xinetd mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload laptop-mode.service -- Reboot -- Although not sure why it runs twice ... If we disable apmd at every boot via harddrake, why do we enable it during package installation in the first place? Can't we simply not enable it during apmd package installation and stop harddrake disabling it? But the systemd issue that coling meant still needs to be fixed.
We can make the installer to store a value for IS_LAPTOP in /etc/sysconfig/harddrake2/kernel so that harddrake doesn't play with that on first boot (thus no 2m timeout for apmd -- though we should investigate if it's OK on a laptop after that change)
I'm uploading a new stage2 with an untested fix
It looks good for me. I'll push my git tree later.
Keywords: (none) => PATCHStatus: NEW => ASSIGNED
commit 7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85 Author: Thierry Vignaud <thierry.vignaud@...> Date: Mon Mar 7 10:25:32 2016 +0100 prevent slow boot on first boot (mga#16684) --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85
Fixed in last set of test isos
We would to check on a laptop though...
*** Bug 17584 has been marked as a duplicate of this bug. ***
CC: (none) => westel
Presumed fixed
Status: ASSIGNED => RESOLVEDResolution: (none) => FIXED
Blocks: 15527 => (none)
CC: paiiou => (none)