Description of problem: System stops loading after starting cpufreq. After forcing shutdown the system starts normally. How reproducible: 1. Install using Beta1 64bit DVD 2. Reboot 3. System stops loading after starting cpufreq service 4. Force shutdown by pressing the power button 5. System starts normaly
Are you sure it's cpufreq ? test with disabling the daemon ? (systemctl disable cpufreq.service )
Keywords: (none) => NEEDINFOSource RPM: (none) => cpufreq
Keywords: NEEDINFO => (none)CC: (none) => eandry, n54Blocks: (none) => 2120Summary: Unable to start all services at first boot. => boot hangs at 'Starting cpufreq (via systemctl)'
*** Bug 4826 has been marked as a duplicate of this bug. ***
CC: (none) => herbert
Created attachment 1691 [details] 1st boot
Created attachment 1692 [details] 2nd boot
Priority: Normal => release_blocker
*** Bug 4984 has been marked as a duplicate of this bug. ***
CC: (none) => mageiabugs
I can confirm this with Beta2 x86_64. Logs are identical. Difference: From the hanging system at first reboot I could shutdown with Ctrl+Alt+Del.
CC: (none) => molch.b
CC: (none) => mageia
I suspect *something* is going on here, even tho' you think it's hanged. Some questions: 1. Does it boot if you just leave it long enough (i.e. >5mins (maybe even 15-20 mins!) This should give all the timeouts time to finish. 2. Can you switch to tty2 and get a login prompt?
(In reply to comment #7) > I suspect *something* is going on here, even tho' you think it's hanged. Yes. When I did Ctrl+Alt+Del to reboot out of that I saw a long list of messages flying by (all of them starting with "Starting...." and "Started ....") so I guess you are right. > Some questions: > 1. Does it boot if you just leave it long enough (i.e. >5mins (maybe even > 15-20 mins!) This should give all the timeouts time to finish. > > 2. Can you switch to tty2 and get a login prompt? 1. I waited almost exactly 10 minutes (according to my wrist watch) 2. No. I checked all of the possible ttys (F2 -> F12).
Having just installed Beta 2 three times, I can confirm that the first boot takes over ten minutes.That's from grub to KDM screen. The "starting cpufreq" does eventually time out and the boot continues. Second boot is about 45 seconds.
CC: (none) => randy
Just adding a "me too" on this for Beta 2. On a fresh install, it hangs for 10 minutes on the cpufreq message, then continues the boot process. The problem is gone on subsequent boots.
CC: (none) => remco
But what is causing it? systemd-analyze blame - what's the output after delayed boot?
CC: (none) => sander.lepik
Yup, we need to do that first boot and get as much debug about it as possible. Finding out exactly what services caused the delay and time out. This will include the systemd-analyze blame output, but any other log output would be critical too e.g. it's worth doing systemd-journalctl -a >journal.log too and maybe capturing syslog output as well. That would be most helpful.
Being a newbie I would need a detailed description what to do before, while and after the first boot to get all info you need. I will do a fresh installation later this afternoon (standard x86_64 KDE with as many defaults as possible), so the opportunity is there. :)
@wobo: Well pretty much what I said above. As the first boot seems to be the bad case here, it could be something we only do once (e.g. mdv firsttime service etc.) or it could just be something that needs to write out some config and exits, but in a way that it doesn't track itself properly... who knows. But let the first boot happen, leave it to time out (as Remco reported and as I suspected earlier, it's really a matter of leaving it for a long time to let the timeouts happen and then letting the boot continue. It's critical that it is this boot that you let run to it's completion such that the data on the problems experienced then can be collated. The output from "systemd-analyze blame" is useful, as is the output from "systemd-journalctl -a". Also, "systemctl --failed" output and the subsequent "systemctl status xxx.service" for any services listed in the --failed output. Any other syslog output could also be useful. That's all I can think of for now - hopefully it'll be enough to point the finger. Thanks as always wobo :) KUTGW!
Sorry to ask again: 1. I do the installation and reboot. I let the machine do what it wants until the system is completely up. Then I go to a tty and do the following commands: systemd-analyze blame > blame.txt systemd-journalctl -a > journalctl.txt systemctl --failed > failed.txt systemctl status [failed services].service >> failed_services.txt syslog (part before and after the delay) Is that correct?
@wobo: Perfect! Thanks. If I think of anything else I'll post it here (hopefully before you reboot :D)
I should add lspci output to give information about the hardware? I should wrap all files into a bz2? Don't hurry, on my schedule for now is a nice after-lunch-nap (don't forget, I'm old & retired. :)
quickly for your information, the "faulty" service could be harddrake in mandrake_everytime; recently, I tried to add HARDDRAKE_ONBOOT=no in /etc/sysconfig/system at the end of the install before the first reboot, and in this case, the first boot was not delayed. Sadly, I have very few time currently, and I didn't have time to retry on a new install and verify that this is repeatable and not just a coincidence.
CC: (none) => lmenut
Ok, I will do that first before the end of the installation. If it works - is all the rest (as described in comment #15) still needed ? In other words: is that a possible solution or just a workaround until the real cause is found and fixed?
(In reply to comment #19) > In other words: is that a possible solution or just a workaround until the real > cause is found and fixed? Harddrake is probably still usefull and needed, so it's probably only an indication about the service/tool that needs a fix.
Created attachment 1824 [details] Output of journalctl
I ran the following command: systemd-analyze blame command not found updatedb locate systemd-analyze returns no results This is Mageia 2 BETA 2 32bit The system sat for 11 minutes before continuing cat failed_services.txt fedora-l...odules.service loaded failed failed Load legacy module configuration
journalctl shows: Mar 22 08:57:34 laptop.dd.inet service_harddrake[663]: running: /etc/rc.d/init.d/cpufreq start Mar 22 09:07:34 laptop.dd.inet service_harddrake[663]: ERROR: killing runaway process (process=/etc/rc.d/init.d/cpufreq, pid=690, args=start, error=ALARM at /usr/lib/libDrakX/run_program.pm line 92.
This is quite interesting Doug. For reference, systemd-analyze is in the systemd-tools package so you'll need to install that first (urpmi systemd-tools). It's interesting that harddrake itself runs the cpufreq stuff and that it ultimately seems to have it's own watchdog that takes 10 minutes..... All seems to tie in really. Will take some time to debug.
Just a "me too" confirmation: Same message as Doug, so I need not upload an attachment. This really seems to be the spot.
Hi, Do you have: cpufreq_performance in /etc/modprobe.preload.d/cpufreq.conf? This is no longer wanted. Seems I commented it out on my machine way back in January.... I'm not sure this is fundamentally the issue, but I'll certainly take a look.
I do have a cpufreq (No .conf) in the /etc/modprobe.preload.d The contents of which are: acpi-cpufreq cpufreq_performance cpufreq_powersave cpufreq_conservative cpufreq_ondemand Doug
Yup I'm just looking in drakx code... need to remove the performance one. Not sure it's relevant.
I need to do a test install, but I'm hoping that simply removing the performance module from the cpufreq.pm module in harddrake will avoid this runaway process problem... that's a bit optimistic perhaps however :)
I can confirm it. I have installed my system twice. The first boot: it was hanging 15-20 minutes at the cpufreq message. I think (I'm not sure), that after 50% of this time, there is appearing '[ok]' and new line. This was just an imporession I had.. maybe wrong. I'm attaching all the needed files (except blame).
Created attachment 1883 [details] bug 4772 2012.03.29 log
I have reproduced it again a few days ago... I have succeed to switch to TTY2-4 (I'm not sure what was that), and I saw a loop of errors. I have got a photo of it. And then I switched again to TTY1.. and it was hanging for 6 hours. I can confirm that the "[ok]" part is in the middle of pause.
Created attachment 1905 [details] cpu freq - infinite (?) loop with an error - photo of a monitor
OK, I think I worked out what is wrong here... it came to me in a dream :p (I fixed a similar issue in networkmanager yesterday, so my dreams were "primed"!!) It's basically a deadlock as systemctl waits to ensure it's ordering is correct. In harddrake (via mandrake_everytime) we're telling it to start a service that is order to start after a target that mandriva_everytime itself starts before... this is not good!!! I've worked around this by adding the --no-block option to systemctl. I've not had time to test, but this should fix the issue. I'd really appreciate it if people could test an install once the new initscripts rel = 15) is available to them in order to see if this is resolved.
I've reduced harddrake timeout from 10 to 2 mn in my git
CC: (none) => thierry.vignaud
(In reply to comment #35) > I've reduced harddrake timeout from 10 to 2 mn in my git As per bug 44, the timeout should be increased, or preferably made into a config variable that can easily be overridden via a kernel commandline option.
CC: (none) => davidwhodgins
I don't see the link with bug #44. harddrake service only run a couple basic stuff at startup. If they take more than a couple minutes, something is very wrong as seen in this bug report
(In reply to comment #37) > I don't see the link with bug #44. > harddrake service only run a couple basic stuff at startup. > If they take more than a couple minutes, something is very wrong as seen in > this bug report The 10 minute timeout is hard coded in line 87 of /usr/lib/libDrakX/run_program.pm. That's the same code that is used to run mkinitrd during bootloader-config. If that is reduced to 2 minutes, people with older i586 systems like mine, won't be able to install a kernel, even on real hardware, let alone under virtualbox.
I'm very aware of where came this timeout as a drakx manager for +10 years. If you actually read what I wrote or see what has been commited in SVN, you'll see that default timeout has _NOT_ been changed. What has been changed is timeout for commands runned by harddrake service and only those. Which doesn't include running bootloader-config btw.
(In reply to comment #34) > OK, I think I worked out what is wrong here... it came to me in a dream :p (I > fixed a similar issue in networkmanager yesterday, so my dreams were > "primed"!!) > > It's basically a deadlock as systemctl waits to ensure it's ordering is > correct. > > In harddrake (via mandrake_everytime) we're telling it to start a service that > is order to start after a target that mandriva_everytime itself starts > before... this is not good!!! > > I've worked around this by adding the --no-block option to systemctl. I've not > had time to test, but this should fix the issue. > > I'd really appreciate it if people could test an install once the new > initscripts rel = 15) is available to them in order to see if this is resolved. i have rel = 17 the first boot takes ages, i'm gonna try running it overnight to see if it ever gets out of it. is there a place where i can find these cyclic dependencies while in a rescue?
CC: (none) => alien
well, after 30min still no dice...
btw: bugzilla time seems to be 2hours extra
OK, so I think I can reproduce a similar problem in virtualbox. On first boot, there is a large delay (somewhere around 120s) due to numlock. I think harddrake does similar things with numlock as it does with cpufreq. I can confirm that it is numlock at fault but chrooting in to the install before reboot and removing the numlock rpm. Simply disabling it (chkconfig --del) will automatically reenable it again (first time boot works out whether or not it should be enabled). I'll try and dig further.
OK, so it seems my systemd support patches likely reintroduced this problem! Hopefully re-resolved in latest drakxtools. Again, I'm not 100% sure about the cpufreq issue, but I suspect it's the same thing and should be solved too, but I'll leave it open until we can confirm with RC installs on real systems.
So final RC is out. Can people experiencing this issue please check it with the RC? I'm hoping it's resolved...
Just installed Mageia 2 RC into VMPlayer virtual machine. During boot following messages passed by: Failed to start Load legacy module configuration [FAILED]
CC: (none) => isis2000
(In reply to comment #46) > Just installed Mageia 2 RC into VMPlayer virtual machine. > During boot following messages passed by: > > Failed to start Load legacy module configuration [FAILED] Completely unrelated and also harmless (it just means your modprobe.preload* files have some strange things in it, but it's not a big problem overall. It's likely that the modules are just already loaded via udev and thus don't need to be manually loaded. The bigger questions is whether or not you saw *this* bug?
(In reply to comment #47) > (In reply to comment #46) > > Just installed Mageia 2 RC into VMPlayer virtual machine. > > During boot following messages passed by: > > > > Failed to start Load legacy module configuration [FAILED] > > Completely unrelated and also harmless (it just means your modprobe.preload* > files have some strange things in it, but it's not a big problem overall. It's > likely that the modules are just already loaded via udev and thus don't need to > be manually loaded. > > The bigger questions is whether or not you saw *this* bug? Thanks for your reply Colin, and apologies for disturbing this thread. Answering your question: no i didn't see *this* bug ;)
x86_64: Tested KDE LiveCD and installation DVD - bug not showing.
I have not got this bug anymore.
Thanks all for testing. Colin, thanks for the time spent on this. On my netbook, which also had this problem, a fresh install of the RC no longer had this problem. Merci!
Status: NEW => RESOLVEDResolution: (none) => FIXED