Bug 14249 - Last update of dbus broken upower service
Summary: Last update of dbus broken upower service
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 4
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 14102
Blocks: 14251
  Show dependency treegraph
 
Reported: 2014-10-08 13:32 CEST by José Jorge
Modified: 2014-12-21 22:25 CET (History)
11 users (show)

See Also:
Source RPM: dbus
CVE:
Status comment:


Attachments
Log files as requested (108.50 KB, image/svg+xml)
2014-10-13 16:23 CEST, Peter Christy
Details
Log files as requested (104.91 KB, text/x-log)
2014-10-13 16:23 CEST, Peter Christy
Details
Log files as requested (155.17 KB, text/plain)
2014-10-13 16:24 CEST, Peter Christy
Details
Log files as requested (79 bytes, image/svg+xml)
2014-10-13 16:25 CEST, Peter Christy
Details
journalctl-after.log (188.89 KB, text/x-log)
2014-10-13 16:38 CEST, Lew Molina
Details
boot-before.svg (94.64 KB, image/svg+xml)
2014-10-13 16:38 CEST, Lew Molina
Details
journalctl-before.log (129.11 KB, text/x-log)
2014-10-13 16:39 CEST, Lew Molina
Details
journalctl-before-downgrade.log (182.13 KB, text/x-log)
2014-10-13 16:39 CEST, Lew Molina
Details
downgrade.log (1.91 KB, text/x-log)
2014-10-13 16:40 CEST, Lew Molina
Details
journalctl-after1streboot.log (121.44 KB, text/x-log)
2014-10-13 16:40 CEST, Lew Molina
Details
boot-after1streboot.svg (91.66 KB, image/svg+xml)
2014-10-13 16:41 CEST, Lew Molina
Details
journalctl dbus errors (28.74 KB, text/plain)
2014-11-19 11:04 CET, Christian D
Details
Dbus messages after slow start with dbus 1.6.18-1.8 (27.18 KB, application/octet-stream)
2014-11-20 09:24 CET, Herman Viaene
Details
1strebootoutput (10.07 KB, application/octet-stream)
2014-11-20 17:02 CET, Lew Molina
Details
outputafter1stshutdown (10.44 KB, application/octet-stream)
2014-11-20 17:03 CET, Lew Molina
Details
3rdrebootoutput (9.97 KB, application/octet-stream)
2014-11-20 17:03 CET, Lew Molina
Details
4threbootoutput (9.97 KB, application/octet-stream)
2014-11-20 17:04 CET, Lew Molina
Details
outputafter2ndshutdown (10.00 KB, application/octet-stream)
2014-11-20 17:05 CET, Lew Molina
Details
system-local (2.97 KB, application/octet-stream)
2014-11-20 17:07 CET, Lew Molina
Details
Full journal of slowstart with dbus 1.6.18-1.8 (208.98 KB, application/octet-stream)
2014-11-20 21:04 CET, Herman Viaene
Details
journalctl -ba output (121.81 KB, text/plain)
2014-11-21 23:10 CET, Olivier Delaune
Details
Corrected journal (239.34 KB, application/octet-stream)
2014-11-22 12:19 CET, Herman Viaene
Details

Description José Jorge 2014-10-08 13:32:16 CEST
Laptop with MGA4 i586 : I have a problem with this update : KDE alerts after a few minutes about powermanagement failure. Watching logs, upower service do not start anymore!

------------------
systemctl status upower.service
upower.service - Daemon for power management
   Loaded: loaded (/usr/lib/systemd/system/upower.service; enabled)
   Active: failed (Result: signal) since Qua 2014-10-08 10:52:06 CEST; 35min ago
     Docs: man:upowerd(8)
  Process: 756 ExecStart=/usr/libexec/upowerd (code=killed, signal=TRAP)
 Main PID: 756 (code=killed, signal=TRAP)
   CGroup: /system.slice/upower.service

Out 08 10:52:06 celeron.homelinuxserver.org upowerd[756]: (upowerd:756): UPower-ERROR **: failed to get pokit authority: Error init...ed out
Out 08 10:52:06 celeron.homelinuxserver.org systemd[1]: upower.service: main process exited, code=killed, status=5/TRAP
Out 08 10:52:06 celeron.homelinuxserver.org systemd[1]: Failed to start Daemon for power management.
Out 08 10:52:06 celeron.homelinuxserver.org systemd[1]: Unit upower.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.

Reproducible: 

Steps to Reproduce:
José Jorge 2014-10-08 13:33:10 CEST

Depends on: (none) => 14102

Comment 1 José Jorge 2014-10-08 13:33:44 CEST
This bug is confirmed by another user on  [discuss-fr] ML.
Comment 2 Thomas Backlund 2014-10-08 14:16:37 CEST
Can you confirm that downgrading dbus makes it work again?

CC: (none) => tmb

Comment 3 José Jorge 2014-10-08 14:23:20 CEST
Well, I don't get what has changed, but rebooting the system near 4 times made the trick... sorry for the noise.

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Olivier Delaune 2014-10-08 19:00:29 CEST

Blocks: (none) => 14251

Comment 4 José Jorge 2014-10-09 08:34:39 CEST
Another system, x86_64 this time. Still the same problem, solved with 2 crazy reboots : nothing is responding, even sysrq help key. No console login possible.


 systemd-logind[936]: Failed to enable subscription: Did not receive a reply. Possible causes include: the remote application did not send a reply, the messag
 dbus[937]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 dbus[937]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out
 dbus[937]: [system] Successfully activated service 'org.freedesktop.login1'
 upowerd[941]: (upowerd:941): UPower-ERROR **: failed to get pokit authority: Error initializing authority: Error calling StartServiceByName for org.freedeskt
 kernel: traps: upowerd[941] trap int3 ip:7fc904b40139 sp:7fff37a1b5b0 error:0
 systemd-logind[936]: New seat seat0.
 systemd[1]: upower.service: main process exited, code=killed, status=5/TRAP
 systemd[1]: Failed to start Daemon for power management.
 systemd[1]: Unit upower.service entered failed state.
Comment 5 Thomas Backlund 2014-10-09 12:09:55 CEST
Can you downgrade _only_ the dbus packages and confirm it starts working again....

maybe upower was/is abusing dbus, something that now isn't allowed anymore with the security fixes that went in.
Comment 6 José Jorge 2014-10-09 13:18:46 CEST
Sorry Thomas, but as the problem disappears after a few reboots, I cannot do this test.

Anyway, I already closed this bug, I only added some info to indicate it is not arch dependent.

Hardware: i586 => All

Comment 7 Peter Christy 2014-10-09 13:54:10 CEST
Just been hit by this bug myself on both my wife's machine (Intel, 64-bit) and my spare machine (AMD, 64-bit).

My wife carried out the update on her machine, and now it takes forever to log her in. However, if I log in as myself, the machine is its usual, slick self!

In other words, it appears to be the user who carried out the upgrade who gets hit with the slow login!

And, no, it doesn't improve, even after many reboots.

I'm the only login on my spare machine (my main runs Slackware), but I'll go and create another login to confirm this. I may be some time.......!!!

Thought I'd mention this before downgrading dbus in case it helps pinpoint the problem.

-- 
Pete

CC: (none) => christy

Comment 8 Peter Christy 2014-10-09 16:06:12 CEST
Looks like that was a red herring. After a couple of reboots, my login also became very slow - slow enough to make the machine almost unusable.

Downgrading dbus seems to have fixed them both, though.

-- 
Pete
Comment 9 James Kerr 2014-10-09 16:20:42 CEST
I've re-opened the bug since it seems that there may be a problem with the latest dbus.

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Comment 10 Olivier Delaune 2014-10-09 18:54:54 CEST
@Peter how did you downgrade your dbus?

CC: (none) => olivier.delaune

Comment 11 Thomas Backlund 2014-10-09 18:58:52 CEST
Something like 

urpmi --downgrade dbus lib64dbus1_3


should work...

(remove "64" if you are on i586)
Comment 12 Olivier Delaune 2014-10-09 19:03:47 CEST
Hello Thomas, it does not work. I get

urpmi --downgrade dbus lib64dbus1_3
Afin de poursuivre la mise à jour, les paquetages suivants doivent être désinstallés :
dbus-1.6.18-1.4.mga4.x86_64
 (pour installer le paquetage dbus-1.6.18-1.4.mga4.x86_64)
lib64dbus1_3-1.6.18-1.4.mga4.x86_64
 (pour installer le paquetage lib64dbus1_3-1.6.18-1.4.mga4.x86_64) (o/N) o


    $MIRRORLIST: media/core/updates/dbus-1.6.18-1.4.mga4.x86_64.rpm
    $MIRRORLIST: media/core/updates/lib64dbus1_3-1.6.18-1.4.mga4.x86_64.rpm                                                                                   
installation de lib64dbus1_3-1.6.18-1.4.mga4.x86_64.rpm dbus-1.6.18-1.4.mga4.x86_64.rpm depuis /var/cache/urpmi/rpms                                          
Préparation...                   ############################################################################################################################
L'installation a échoué :       paquet dbus-1.6.18-1.4.mga4.x86_64 déjà installé
        paquet lib64dbus1_3-1.6.18-1.4.mga4.x86_64 déjà installé

Anyway, I also type systemctl -a | grep fail and this is what I get
avahi-daemon.service                                                                         loaded    failed     failed          Avahi mDNS/DNS-SD Stack
cpupower.service                                                                             loaded    failed     failed          Configure CPU power related settings
display-manager-failure.service                                                              loaded    inactive   dead            Display Manager Failure Message
speech-dispatcherd.service                                                                   loaded    failed     failed          Speech-Dispatcher an high-level device independent layer for speech synthesis.
udisksd.service                                                                              loaded    failed     failed          UDisks Daemon
upower.service                                                                               loaded    failed     failed          Daemon for power management
Comment 13 Thomas Backlund 2014-10-09 19:11:02 CEST
oh, sorry... you need to specify version...

so urpmi --downgrade dbus-1.6.18-1.3.mga4 lib64dbus1_3-1.6.18-1.3.mga4
Comment 14 Peter Christy 2014-10-09 20:55:20 CEST
Olivier,

I used:

urpmi --downgrade dbus-1.6.18-1.3.mga4 lib64dbus1_3-1.6.18-1.3.mga4 dbus-x11-1.6.18-1.3.mga4 

as suggested on the forum by Jim Kerr (Thanks, Jim!)

So far everything is now behaving normally on both machines.

When in the "fault" mode, the disk activity light only flickers sporadically and briefly during the login process. When the machine is logging in normally, the disk light is "on" pretty constantly. I get the impression that something is tying up the machines resources pretty thoroughly and slowing disk access to a crawl. I don't know if that's any help - just my 2p worth!

-- 
Pete
Florian Hubold 2014-10-11 16:31:50 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=14251

Comment 15 Florian Hubold 2014-10-11 16:32:24 CEST
This seems to affect quite a lot of users, and there's also major impact on KDE users, as noted in Bug 14251

CC'ing Luc on both bugs.
Florian Hubold 2014-10-11 16:32:32 CEST

CC: (none) => doktor5000

Comment 16 Thomas Backlund 2014-10-13 11:29:26 CEST
As I dont have access to any system that behaves this way...

Can atleast some of you do the following:

when you have the older working dbus installed:

reboot to get clean logs, login to KDE/Gnome/... and then do as root:

journalctl -b > journalctl-before.log
systemd-analyze plot >boot-before.svg

then install the problematic update again, reboot and login again and:

journalctl -b > journalctl-after.log
systemd-analyze plot >boot-after.svg

And attach the files to this bug.
Comment 17 Peter Christy 2014-10-13 16:23:10 CEST
Created attachment 5492 [details]
Log files as requested
Comment 18 Peter Christy 2014-10-13 16:23:48 CEST
Created attachment 5493 [details]
Log files as requested
Comment 19 Peter Christy 2014-10-13 16:24:23 CEST
Created attachment 5494 [details]
Log files as requested
Comment 20 Peter Christy 2014-10-13 16:25:58 CEST
Created attachment 5495 [details]
Log files as requested

Note: when I issued the command to generate this file, the system only produced an error message and a zero length file. I have pasted the error message into this file.
Comment 21 Peter Christy 2014-10-13 16:30:54 CEST
A couple of other bits of information: When logging in with the faulty dbus, the bootsplash after the login window gets to 3 bubbles over the cauldron fairly quickly, but then slows to a crawl.

SU ing in a konsole window also took forever to log in, and trying to save the zero length file after I'd pasted the error message in also took a very long time before the filename window opened. (I'm using KDE).

Not sure if this is relevant, but both machines showing the fault have NVidia graphics cards and the proprietary driver.

-- 
Pete
Comment 22 Lew Molina 2014-10-13 16:38:18 CEST
Created attachment 5496 [details]
journalctl-after.log

CC: (none) => lewmail101-mailbox

Comment 23 Lew Molina 2014-10-13 16:38:50 CEST
Created attachment 5497 [details]
boot-before.svg
Comment 24 Lew Molina 2014-10-13 16:39:25 CEST
Created attachment 5498 [details]
journalctl-before.log
Comment 25 Lew Molina 2014-10-13 16:39:53 CEST
Created attachment 5499 [details]
journalctl-before-downgrade.log
Comment 26 Lew Molina 2014-10-13 16:40:18 CEST
Created attachment 5500 [details]
downgrade.log
Comment 27 Lew Molina 2014-10-13 16:40:47 CEST
Created attachment 5501 [details]
journalctl-after1streboot.log
Comment 28 Lew Molina 2014-10-13 16:41:16 CEST
Created attachment 5502 [details]
boot-after1streboot.svg
Comment 29 Lew Molina 2014-10-13 16:41:58 CEST
Hello Thomas

I have done what you have requested above and kindly take some time reading all the steps I did.

First, my Laptop specs:
Model: MSI CX41
Processor: Intel i5 3210m
Graphics (Dual): AMD Radeon 7670m (discrete) and HD4000 (shared memory graphics)
Memory: 8 GiB DDR3
HDD: 750 GiB

-I did a fresh install of Mageia 4 (64bit) using a DVD.

-I selected KDE only as my DE. grub setup is quite different since I am dual booting from CentOS 6.5 x64 but nothing worth mentioning.

-English is the language and keyboard layout I used

-Did a custom partition of /boot / SWAP & /home partitions using ext4

-After setup is complete, I restarted the laptop then went to MCC > Software Management > Configure your sources. Removed CDROM as source and added http and ftp sources + the default mirror lists.

-Ran Updates via MCC > Software Management > Update your system

-I rebooted the system after completing the updates and this time I installed the following via MCC > Software Management > Install & Remove software)
conky, traceroute, hddtemp, lm-sensors, iotop and htop

-I did 2 restarts after installing the applications above and the issue of slow loading of desktop can be observed.

-I ran the following command as a root:
journalctl -b > journalctl-before-downgrade.log
(please see attached file journalctl-before-downgrade.log)

-I ran the follwing command as a root:
systemd-analyze plot >boot-before-downgrade.svg
...and got the output:
"Failed to issue method call: Activation of org.freedesktop.systemd1 timed out"

-I ran the following command as root:
urpmi --downgrade dbus-1.6.18-1.3.mga4 lib64dbus1_3-1.6.18-1.3.mga4 dbus-x11-1.6.18-1.3.mga4
(please see attached file downgrade.log for output)

-I restarted my laptop and run the following commands:
journalctl -b > journalctl-before.log
(please see attached file journalctl-before.log)
systemd-analyze plot >boot-before.svg
(please see attached file boot-before.svg)

-I rebooted 2 more times, without any issues, before running an update via MCC > Software Management > Update your system

-After the update, I rebooted and ran the following commands as root: (please note at this point, system seems to be behaving normally)
journalctl -b > journalctl-after1streboot.log
(Please see atttached file journalctl-after1streboot.log)
systemd-analyze plot >boot-after1streboot.svg
(Please see atttached file boot-after1streboot.svg)

-I rebooted again and this time the slow loading of desktop occurs again.
journalctl -b > journalctl-after.log
(Please see atttached file journalctl-after.log)
systemd-analyze plot >boot-after.svg
...and got the output:
"Failed to issue method call: Activation of org.freedesktop.systemd1 timed out"

File Upload Summary (Total of 7 files)
(Original Request) 3 files
journalctl-after.log
boot-after.svg (I did not uploaded this file since the file created was 0 bytes)
boot-before.svg
journalctl-before.log

(Extra files that I generated for comparison purposes) 4 files
journalctl-before-downgrade.log
downgrade.log (output of running urpmi --downgrade dbus-1.6.18-1.3.mga4 lib64dbus1_3-1.6.18-1.3.mga4 dbus-x11-1.6.18-1.3.mga4)
journalctl-after1streboot.log
boot-after1streboot.svg

I hope this helps.
Comment 30 José Jorge 2014-10-14 08:34:09 CEST
(In reply to Thomas Backlund from comment #5)
> systemd-analyze plot >boot-after.svg

I confirm, like in comment #29 that this command outputs nothing to STDOUT. AND STDERR says :
Failed to issue method call: Activation of org.freedesktop.systemd1 timed out
Comment 31 claire robinson 2014-10-14 14:03:09 CEST
Experienced this myself on my mums laptop. The only difference with this machine was it updated dbus and kernel at the same time. I suspect this might have something to do with it.

I had to boot runlevel 3 to get to a login. After downgrading dbus and rebooting it was back to normal. Updating dbus again caused no regression.

I have the journal if it's needed.
claire robinson 2014-10-14 18:41:00 CEST

CC: (none) => eeeemail

Comment 32 José Jorge 2014-10-15 13:33:12 CEST
(In reply to claire robinson from comment #31)
> I had to boot runlevel 3 to get to a login. After downgrading dbus and
> rebooting it was back to normal. Updating dbus again caused no regression.

I don't believe in a kernel and dbus sync update bug : the first laptop whom had this bug, then it seemed fixed after a few reboots, had the bug again today. Looks like a race condition, hence the fact the bug is not always there. Like the activation of org.freedesktop.systemd1 not done when upower wants it, and then upower blocking the activation.

Is there a way to give a delay to upower before it tries to start?
Comment 33 José Jorge 2014-10-20 16:01:46 CEST
the file 10003-config-change-default-auth_timeout-to-5-seconds.patch change the auth_timeout from 30 seconds to 5. Couldn't it be the problem?
Comment 34 christian fischer 2014-10-20 23:09:38 CEST
hello
my mini computer (processor AMD E1) has 2 disks : ssd 64 Go and an old 600 Go.
 Mageia4 on ssd as no problem .
 Mageia4 on old disk "classic" canot boot .
Timing ?
José is right !

CC: (none) => christian-maryse.fischer

Comment 35 christian fischer 2014-10-20 23:19:39 CEST
Other :
My old computer ( Intel, core2duo, 2 disks classic , update done ) can boot in text mode, but cannot in graphic mode ( with kdm )
In text mode, after login , startx work .
All is good , but Timing ?
Comment 36 Thomas Backlund 2014-10-21 12:30:40 CEST
(In reply to José Jorge from comment #33)
> the file 10003-config-change-default-auth_timeout-to-5-seconds.patch change
> the auth_timeout from 30 seconds to 5. Couldn't it be the problem?

Nope, it shouldn't be as that gets triggered only in unconfigured cases, wich is not the case here as /etc/dbus-1/session.conf overrides it.
Comment 37 Thomas Backlund 2014-10-21 12:41:19 CEST
So in order to bisect this issue can someone affected using x86_64 start testing theese: http://tmb2.mine.nu/Mageia/4/bugs/14249/x86_64/

Start with dbus-1.6.18-1.4.1, check if everything still works,
then go to -1.4.2, -1.4.3 .... and all the way up to -1.4.9 or until
you can trigger the issue.


I will release i586 testbuilds later when my other buildhost is up again
Comment 38 José Jorge 2014-10-21 16:06:40 CEST
I have submitted to core/updates_testing a dbus that fixed the boot problem on my 2 systems : to workaround a CVE, the timeout was changed from 30s to 5s, which triggered timeout bugs on some systems.

I have found that a 10s timeout both is better than 30s, and does not trigger timeouts anymore. Please, test dbus-1.6.18-1.5.mga4 and report if it fixes your systems, so that we can push it into regular updates.
Comment 39 Thomas Backlund 2014-10-21 16:36:39 CEST
(In reply to José Jorge from comment #38)
> I have submitted to core/updates_testing a dbus that fixed the boot problem
> on my 2 systems : to workaround a CVE, the timeout was changed from 30s to
> 5s, which triggered timeout bugs on some systems.
> 

Um, why did you add new patches that was pretty much copies of the old ones ?
http://svnweb.mageia.org/packages?view=revision&revision=792328

It breaks review process into some unneeded extra work...

And the proper way would have been to notify the maintainer (me) that you 
actually had tested changing the timeouts...

> I have found that a 10s timeout both is better than 30s, and does not
> trigger timeouts anymore. Please, test dbus-1.6.18-1.5.mga4 and report if it
> fixes your systems, so that we can push it into regular updates.

For this to be done properly it needs to be done for mga3 and cauldron too when we know what timeout works for all...
Comment 40 José Jorge 2014-10-21 17:31:13 CEST
(In reply to Thomas Backlund from comment #39)
> And the proper way would have been to notify the maintainer (me) that you 
> actually had tested changing the timeouts...

I'm sorry, you are right.
I thought no one had time to watch this regression which is affecting a lot of users, so I tried to advance. 

What I don't understand is how to update the patches without renaming them : mgarepo sync does nothing without that...
Comment 41 Olivier Delaune 2014-10-21 19:34:14 CEST
(In reply to Thomas Backlund from comment #37)
> So in order to bisect this issue can someone affected using x86_64 start
> testing theese: http://tmb2.mine.nu/Mageia/4/bugs/14249/x86_64/
> 
> Start with dbus-1.6.18-1.4.1, check if everything still works,
> then go to -1.4.2, -1.4.3 .... and all the way up to -1.4.9 or until
> you can trigger the issue.
> 
> 
> I will release i586 testbuilds later when my other buildhost is up again

I tested the steps 1 and 2 without any problem (Mageia runs normally). The problems (slowness) occur at the step 3.
Comment 42 Thomas Backlund 2014-10-21 22:41:43 CEST
(In reply to Olivier Delaune from comment #41)
> (In reply to Thomas Backlund from comment #37)
> > So in order to bisect this issue can someone affected using x86_64 start
> > testing theese: http://tmb2.mine.nu/Mageia/4/bugs/14249/x86_64/
> > 
> > Start with dbus-1.6.18-1.4.1, check if everything still works,
> > then go to -1.4.2, -1.4.3 .... and all the way up to -1.4.9 or until
> > you can trigger the issue.
> > 
> > 
> > I will release i586 testbuilds later when my other buildhost is up again
> 
> I tested the steps 1 and 2 without any problem (Mageia runs normally). The
> problems (slowness) occur at the step 3.

OK, so dbus-1.6.18-1.4.3 is the first failing one, can you test the new:

1.6.18-1.4.10 to 1.6.18-1.4.13

and let me know if any of them work ?
Comment 43 Christian D 2014-10-22 00:44:18 CEST
I also have 2 PCs that was affected by this problem.

One is based on a Phenom x4 945 - hd 1To, Mageia 4 x64. I did the tests proposed by Thomas Backlund :
- 1.6.18-1.4.1 : ok (2 reboot)
- 1.6.18-1.4.2 : ok (2 reboot)
- 1.6.18-1.4.3 : not ok : boot very slow
- 1.6.18-1.4.10 : ok (2 reboot)
- 1.6.18-1.4.11 : ok (2 reboot)
- 1.6.18-1.4.12 : ok (2 reboot)
- 1.6.18-1.4.13 : ok (3 reboot)

The second one is an older one based on a Athlon X2 5000 hd 250 Go. For this one, I reinstalled Mageia 4 (before I found this bug report) from the Live DVD kde x64 iso. I also did the updates and renamed .kde4 to .kde4-old to reset the kde configuration. Strangely, I cannot reproduce the problem. With the standard updates, the PC is now booting normaly ...

There are 2 other PCs closed to me with Mageia 4 x64 (one based on a Phenom x3 720 hd 3To, the other is a laptop with an i7 720qm + ssd) that are not affected by the problem ...

CC: (none) => christian_duhaut

Comment 44 Thomas Backlund 2014-10-22 08:45:46 CEST

OK, there is now a dbus-1.6.18-1.6.mga4 in updates_testing so both i586 and x86_64 users can test
Comment 45 Christian D 2014-10-22 09:28:16 CEST
I just tested the version dbus-1.6.18-1.6.mga4 in updates_testing x86_64 on my phenom x4 945 PC :
The result is not ok : first boot ok, second boot : very slow (more than 5 mns to start kde ...)
Thomas Backlund 2014-10-22 09:37:53 CEST

Attachment 5494 mime type: text/x-log => text/plain

Comment 46 Christian D 2014-10-22 09:53:14 CEST
(In reply to Christian D from comment #45)
> I just tested the version dbus-1.6.18-1.6.mga4 in updates_testing x86_64 on
> my phenom x4 945 PC :
> The result is not ok : first boot ok, second boot : very slow (more than 5
> mns to start kde ...)

I did 8 other reboots with this version without problem : so, globally 1 out of 10 reboot failed
Comment 47 Thomas Andrews 2014-10-22 18:11:28 CEST
Tested the 32-bit versions on the two affected machines I have with mixed results. They worked on a Dell Dimension E310 (P4 processor, desktop kernel) but did not work with the machine with an AMD Sempron 3100+ processor and the server kernel. Two boots OK, third one was even slower than before. Booting into safe mode and using Control-D to continue avoids the problem, as before.

I'm looking for the procedure to downgrade again. All I get is an error message telling me the ***.3 packages aren't in the database.

CC: (none) => andrewsfarm

Comment 48 Thomas Andrews 2014-10-22 20:55:32 CEST
Inability to downgrade was solved by changing from MIRRORLIST to the specific US kernel.org mirror.
Comment 49 Lew Molina 2014-10-23 16:19:38 CEST
Based on the steps I followed on the thread, https://forums.mageia.org/en/viewtopic.php?f=7&t=8564, I did not encounter the issue anymore.

I ran the following commands

urpmi.update "Core Updates Testing" (as a normal user)
urpmi --media "Core Updates Testing" dbus-1.6.18-1.6.mga4 lib64dbus1_3-1.6.18-1.6.mga4 dbus-x11-1.6.18-1.6.mga4 (as a root user)

I observed that the login and logout process was a bit slower compared to the files I was using previously.

dbus-1.6.18-1.3.mga4
libdbus1_3-1.6.18-1.3.mga4
dbus-x11-1.6.18-1.3.mga4

The difference in speed/responsiveness was somewhere between 5 to 20 seconds (rough estimate).

This was done on a 64bit laptop, MSI CX41 to be exact. After installing the patch, I did 5 consecutive reboots + 1 shutdown + another reboot after running updates via MCC (I did not get any updates).
Comment 50 José Jorge 2014-10-25 14:13:35 CEST
dbus 1.6.18-1.7 from updates-testing fixed the bug for me

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED

Comment 51 Thomas Andrews 2014-10-25 15:20:46 CEST
.7 version working on the machine that the .6 version didn't fix. Have not yet tried it on the machine where the .6 version worked.
Comment 52 Simon McVittie 2014-11-18 12:30:39 CET
Reported upstream, <https://bugs.freedesktop.org/show_bug.cgi?id=86431>.

Mageia security / D-Bus maintainers: please report regressions upstream, particularly if they are regressions in security updates. We can't fix bugs we don't know about.

See Also: (none) => https://bugs.freedesktop.org/show_bug.cgi?id=86431
CC: (none) => simon.mcvittie

Comment 53 Thomas Backlund 2014-11-18 13:07:19 CET
Thanks Simon, will try to do better communicating with upstream
Comment 54 Simon McVittie 2014-11-18 19:27:26 CET
To people who have seen this bug in the past: I'm an upstream D-Bus maintainer, trying to get this fixed for everyone, but I haven't been able to make it happen on my own hardware without setting a pathologically small timeout.

I would be interested to know whether the people who were affected are using a non-local user database (NIS, SSSD, YP, etc.) or any other unusual system infrastructure.

Also, would it be possible for some of you to do some testing for me? 

What I would like tested is:

* Compile a version of D-Bus with the logging patch from
  https://bugs.freedesktop.org/show_bug.cgi?id=86431#c4
  (perhaps a Mageia developer could put a patched build somewhere
  for people to try?)

* Put this in /etc/dbus-1/system-local.conf:

  <busconfig>
    <limit name="auth_timeout">5000</limit>
  </busconfig>

* Reboot enough times that you see the symptoms of either this or
  Bug #14251

* Check journalctl for messages a bit like this:

  dbus[123]: [system] Connection has not authenticated soon enough, closing it
  (auth_timeout=5000ms, elapsed: 5001ms)

What I expect is that when your computer is in the bad/slow situation, you will see those messages, and when it is not, you won't.

After testing, delete system-local.conf to return to the default auth_timeout. That should put everything back to normal; reinstall version 1.6.18-1.8 of Mageia's dbus packages if necessary.
Comment 55 Thomas Backlund 2014-11-18 20:42:07 CET
(In reply to Simon McVittie from comment #54)

> What I would like tested is:
> 
> * Compile a version of D-Bus with the logging patch from
>   https://bugs.freedesktop.org/show_bug.cgi?id=86431#c4
>   (perhaps a Mageia developer could put a patched build somewhere
>   for people to try?)


I submitted a dbus-1.6.18-1.9.mga4 to mga4 core updates_testing with that patch applied.
Comment 56 Lew Molina 2014-11-19 01:41:01 CET
Hi Thomas,

Although I haven't experienced the problem in the current patch that was published by Mageia team, I would like to help Simon put out a fix for everyone regardless of Distribution.

Can you please post a step-by-step instructions on what to do and what output and/or files are needed to be generated by non technical users like me?

Thank you.
Comment 57 claire robinson 2014-11-19 09:41:13 CET
Reopening the bug for debugging.

The packages can be found in Core Updates Testing. To install then, as root, use the following commands for either i586 or x86_64 depending on your installation.

For i586:

# urpmi.update "Core Updates Testing"
# urpmi --media "Core Updates Testing" dbus dbus-x11 dbus-doc libdbus1_3


For x86_64:

# urpmi.update "Core Updates Testing"
# urpmi --media "Core Updates Testing" dbus dbus-x11 dbus-doc lib64dbus1_3


Reboot.

After each reboot, as root, check "journalctl -a -b | grep -i dbus" for any message like "Connection has not authenticated soon enough, closing it" or "Timing out authentication for connection"

Status: RESOLVED => REOPENED
Resolution: FIXED => (none)

Comment 58 claire robinson 2014-11-19 10:00:46 CET
I've not had the problems on this computer but testing the new dbus anyway mga4 64, no issues or journal messages after 3 reboots.
Comment 59 Herman Viaene 2014-11-19 10:36:37 CET
Mageia 4 64-bit on AMD Phenom Quadcore.
Installed dbus 1.9 from Core Updates Testing and created system-local.conf as indicated above.
First reboot OK, second deard slow, also opening "Home" from desktop takes ages.
Used journalctl as in Comment 57 and get following:
After reboot and a number of "Activating systemd ....."
I get the first
Nov 19 09:56:18 xxx.yyyy.zzzz dbus[832]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=5000ms, elapsed: 10449ms)
then two more "Activating systemd ....."
and then 3 like
Nov 19 09:56:33 xxxx.yyyy.zzzz dbus[832]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
followed by
Nov 19 09:56:56 xxxx.yyyy.zzzz dbus[832]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.19" (uid=500 pid=3488 comm="kdeinit4: kded4 [kdeinit]                       ") interface="org.freedesktop.DBus.ObjectManager" member="GetManagedObjects" error name="(unset)" requested_reply="0" destination="org.bluez" (uid=0 pid=741 comm="/usr/libexec/bluetooth/bluetoothd ")
and then 14 of these "Failed to activate  ...timed out"
then another 3 of "Rejected message ...."
then some 150 of these "Failed to activate  ...timed out"
where another five of these "Rejected message ...." are interspersed.

CC: (none) => herman.viaene

Comment 60 Thomas Backlund 2014-11-19 10:55:24 CET
For testers, remember this part:

* Put this in /etc/dbus-1/system-local.conf:

  <busconfig>
    <limit name="auth_timeout">5000</limit>
  </busconfig>

* Reboot enough times that you see the symptoms of either this or
  Bug #14251

* Check journalctl for messages a bit like this:

  dbus[123]: [system] Connection has not authenticated soon enough, closing it
  (auth_timeout=5000ms, elapsed: 5001ms)
Comment 61 Christian D 2014-11-19 11:04:40 CET
Created attachment 5611 [details]
journalctl dbus errors

Tested the version dbus-1.6.18-1.9.mga4 with system-local.conf as required above (5000ms) in updates_testing x86_64 on my phenom x4 945 PC :
- first reboot : login screen OK but starting KDE was very slow. A lot of timeout errors : see attached file dbus-problem-1st-and-3rd-reboot.txt
- second reboot : the same
- third reboot, after deleting system-local.conf : normal boot, but still a dbus error (see attached file)
Comment 62 Simon McVittie 2014-11-19 13:34:21 CET
Thanks, everyone!

Christian's log indicates that at least on Christian's system, the situation is as we suspect. In particular, this:

nov. 19 09:57:47 pc-bureau dbus[1216]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=5000ms, elapsed: 9545ms)

indicates that the system is *very* busy at that point: dbus-daemon scheduled a 5 second timer, then didn't get woken up until nearly 5 seconds later than it requested.

Herman's notes are also consistent with that.
Comment 63 Simon McVittie 2014-11-19 13:42:22 CET
Lew, Claire: if you want to replicate this testing, please follow the instructions from both Comment #57 and Comment #60. I would expect results similar to what Christian had in Comment #61.

Christian wrote:
> - third reboot, after deleting system-local.conf : normal boot,
> but still a dbus error (see attached file)

If you mean this:

"""
nov. 19 10:35:26 pc-bureau systemd[2472]: Failed to open private bus connection: Failed to connect to socket /run/user/500/dbus/user_bus_socket: No such file or directory
"""

then I think that's a systemd user-session weirdness, and is not harmful.
Comment 64 Christian D 2014-11-19 14:25:40 CET
(In reply to Simon McVittie from comment #63)
> Lew, Claire: if you want to replicate this testing, please follow the
> instructions from both Comment #57 and Comment #60. I would expect results
> similar to what Christian had in Comment #61.
> 
> Christian wrote:
> > - third reboot, after deleting system-local.conf : normal boot,
> > but still a dbus error (see attached file)
> 
> If you mean this:
> 
> """
> nov. 19 10:35:26 pc-bureau systemd[2472]: Failed to open private bus
> connection: Failed to connect to socket /run/user/500/dbus/user_bus_socket:
> No such file or directory
> """
> 
> then I think that's a systemd user-session weirdness, and is not harmful.

Thanks for the answer. This last error probably because I didn't close the user session before the reboot ...
Comment 65 Herman Viaene 2014-11-20 09:22:10 CET
On a Mageia4 64 bit AMD machine fully updated.
On cold boot this morning experienced again a very slow start. This machine has dbus 1.6.18-1.8.mga4, and has never been involved in testing.
There is no such time-out message as Simon indicated above, I piped the output of the command (Comment 57) to a file slowstart that I will upload.
Second boot went OK.
Comment 66 Herman Viaene 2014-11-20 09:24:14 CET
Created attachment 5616 [details]
Dbus messages after slow start with dbus 1.6.18-1.8
Comment 67 Simon McVittie 2014-11-20 12:58:36 CET
> 1.6.18-1.8.mga4 [...]
> There is no such time-out message as Simon indicated above

Revision -1.8 can never produce the message "Connection has not authenticated soon enough", even in situations where it would be correct, because the code to log that message does not exist in that version.

It would be great if you could test with -1.9, with or without the system-local.conf that reduces this timeout from 30s to 5s. That revision will log that message, if appropriate.

Also, how fast and how recent is this machine? Our working theory is still that this is much more likely to happen on old/slow hardware (I don't know whether the CPU or the disk is the important bit).

Finally, a full log from journalctl -ab (censor it if necessary) would be useful information, since it would indicate whether systemd is having other difficulties on your machine.
Comment 68 Lew Molina 2014-11-20 17:02:20 CET
Created attachment 5617 [details]
1strebootoutput
Comment 69 Lew Molina 2014-11-20 17:03:04 CET
Created attachment 5618 [details]
outputafter1stshutdown
Comment 70 Lew Molina 2014-11-20 17:03:51 CET
Created attachment 5619 [details]
3rdrebootoutput
Comment 71 Lew Molina 2014-11-20 17:04:37 CET
Created attachment 5620 [details]
4threbootoutput
Comment 72 Lew Molina 2014-11-20 17:05:15 CET
Created attachment 5621 [details]
outputafter2ndshutdown
Comment 73 Lew Molina 2014-11-20 17:05:38 CET
Hi,

I've performed the tests on my laptop. Kindly refer to the exact steps I did below.

I ran the following commands in my terminal as root:

urpmi.update "Core Updates Testing"
urpmi --media "Core Updates Testing" dbus dbus-x11 dbus-doc lib64dbus1_3

After completing the upgrade, I copied system.conf in /etc/dbus-1 directory via cp command and named it system-local.conf. Using vi, I added the tag <limit name="auth_timeout">5000</limit> on the bottom part of the file just above </busconfig>, please refer to "system-local" file that I attached, and then rebooted my laptop. The results below  were done in the exact order.

1st Reboot (after running update and modifying the system-local.conf file)
-boot time is around 1m 50s starting from the Mageia Grub(legacy) bootloader
-ran hddtemp (as root) in terminal and got the error below.
Error getting authority: Error initializing authority: Could not connect: Connection refused
-ran journalctl -a -b | grep -i dbus (as root) and the output of this is in "1strebootoutput"

1st laptop shutdown for 15 minutes
-boot time is around 1m 53s starting from the Mageia Grub(legacy) bootloader
-ran hddtemp (as root) in terminal and got the error below.
Error getting authority: Error initializing authority: Could not connect: Connection refused
-ran journalctl -a -b | grep -i dbus (as root) and the output of this is in "outputafter1stshutdown"

2nd Reboot
-boot time is around 1m 14s starting from the Mageia Grub(legacy) bootloader
-ran hddtemp (as root) in terminal and got the error below.
Error getting authority: Error initializing authority: Could not connect: Connection refused
-I failed to run "journalctl -a -b | grep -i dbus" because I accidentally rebooted my laptop so I do not have the output. 

3rd Reboot
-boot time is around 1m 14s starting from the Mageia Grub(legacy) bootloader
-ran hddtemp (as root) in terminal and got the error below.
Error getting authority: Error initializing authority: Could not connect: Connection refused
-ran journalctl -a -b | grep -i dbus (as root) and the output of this is in "3rdrebootoutput"

4th Reboot
-boot time is around 1m 14s starting from the Mageia Grub(legacy) bootloader
-ran journalctl -a -b | grep -i dbus (as root) and the output of this is in "4threbootoutput"

2nd laptop shutdown 3 minutes
-boot time is around 1m 28s starting from the Mageia Grub(legacy) bootloader
-ran journalctl -a -b | grep -i dbus (as root) and the output of this is in "outputafter2ndshutdown"

I did not encountered "Connection has not authenticated soon enough, closing it" or "Timing out authentication for connection" messages but the attached logs do have other error messages like "connection refused" and other errors.

After running all the tests, I reverted back to dbus-1.6.18-1.8.mga4, lib64dbus1_3-1.6.18-1.8.mga4 and dbus-x11-1.6.18-1.8.mga4. I ran the following command as root:

urpmi --downgrade dbus-1.6.18-1.8.mga4 lib64dbus1_3-1.6.18-1.8.mga4 dbus-x11-1.6.18-1.8.mga4

After completing the downgrade, I deleted /etc/dbus-1/system-local.conf and rebooted my laptop.

If you are interested to know my processor and hard disk drive here they are:

Processor: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz via cat /proc/cpuinfo

Hard Disk Drive: Hitachi HTS547575A9E384 via hdparm -i /dev/sda

My laptop is a MSI CX41. Hope this helps.
Comment 74 Lew Molina 2014-11-20 17:07:48 CET
Created attachment 5622 [details]
system-local
Comment 75 claire robinson 2014-11-20 17:34:02 CET
Thanks for the time you've put into this Lew. 

Journalctl -b shows output from the current boot only, so the logs you've attached after having troubles at shutdown using journalctl -b are unlikely to show relevant information, sadly.

You can use -b -1 to show the journal of the previous boot only though, which may show the problems at shutdown..

journalctl -a -b -1 | grep -i dbus > journalfrompreviousboot.txt
Comment 76 Simon McVittie 2014-11-20 19:52:35 CET
Lew, thanks for your efforts, but I'm afraid you put the wrong thing in system-local.conf, making these test results not useful.

What I intended you to have in system-local.conf is exactly this, and nothing more:

  <busconfig>
    <limit name="auth_timeout">5000</limit>
  </busconfig>

Configuration in system-local.conf adds to system.conf (via the <include> directive near the end of system.conf), it does not replace system.conf.

In this case, pasting the contents of system.conf into system-local.conf meant that dbus-daemon could not start at all, because system-local.conf was trying to include itself:

Nov 20 22:18:59 mageia.lew dbus-daemon[1003]: Failed to start message bus: Circular inclusion of file '/etc/dbus-1/system-local.conf'
Comment 77 Herman Viaene 2014-11-20 20:58:57 CET
(In reply to Simon McVittie from comment #67)
> > 1.6.18-1.8.mga4 [...]
> > There is no such time-out message as Simon indicated above
> 
> Revision -1.8 can never produce the message "Connection has not
> authenticated soon enough", even in situations where it would be correct,
> because the code to log that message does not exist in that version.

OK, so that explains why. 

> 
> It would be great if you could test with -1.9, with or without the
> system-local.conf that reduces this timeout from 30s to 5s. That revision
> will log that message, if appropriate.

No, this happened on my daily-use-PC, and I refrain from using anything on it that is not 100% standard approved. Sorry for that.


> 
> Also, how fast and how recent is this machine? Our working theory is still
> that this is much more likely to happen on old/slow hardware (I don't know
> whether the CPU or the disk is the important bit).

Well this is a (about) 2 year old AMD 6 core  processor FX-6100 with nVidia GeForce 6800 graphics.
I would also suppose this problem would occur more on slower hardware, but my experience (as far as it goes) is exactly the opposite. I explain:

This machine has suffered from the problem right from the beginning. I have a somewhat older laptop (AMD Phenom Quadcore) and that also had its share of the problems, but far less frequently.

AND

I have installed MGA-4 on a number of older machines (6 - 10 years old), some single core, some early dual core at my friends homes. Not all are running KDE, the eldest ones I installed with Xfce. But NONE of them reported any problem in the past weeks. And I think I can quite sure these people would ring me if their PC would start so smpw as mine did.

BUT

I also installed MGA-4-64 on a brand new PC (Intel i5), with an SSD, and I haven't had any complaints on this one either.

From the first two I could speculate that the more cores, the bigger the problem, but the last one (but I had only a single PC like that) would point to disk????

Sorry if I make the confusion even bigger than it was before. 
> 
> Finally, a full log from journalctl -ab (censor it if necessary) would be
> useful information, since it would indicate whether systemd is having other
> difficulties on your machine.
I presume you might prefer the one of the boot with the problem, I'll see if I can master journalctl 's options.
Comment 78 Herman Viaene 2014-11-20 21:04:02 CET
Created attachment 5625 [details]
Full journal of slowstart with dbus 1.6.18-1.8
Comment 79 Olivier Delaune 2014-11-21 23:10:51 CET
Created attachment 5628 [details]
journalctl -ba output

This is what I get with <limit name="auth_timeout">5000</limit>. My computer was slow since the first reboot.

This is my hardware
* Intel(R) Pentium(R) CPU G840 @ 2.80GHz
* Hard disk ST500DM002-1BD142
If you need more information, please ask.
Comment 80 Simon McVittie 2014-11-22 11:39:33 CET
Thanks everyone, I don't need any more test results with the 5000 timeout. I will revert the timeout change upstream soon, equivalent to what Mageia developers did for 1.6.18-1.8.

Herman wrote:
> Full journal of slowstart with dbus 1.6.18-1.8

I think you have a different slow startup issue here: nothing in Attachment #5625 [details] indicates problems with D-Bus. It might be best to report this separately.

Olivier wrote:
> journalctl -ba output

Attachment #5628 [details] indicates that reverting the timeout to what it used to be (as done in 1.6.18-1.8) fixes this for you. Please delete system-local.conf and everything should work after that.
Comment 81 Herman Viaene 2014-11-22 12:19:19 CET
Created attachment 5631 [details]
Corrected journal
Comment 82 Herman Viaene 2014-11-22 12:22:26 CET
I checked again, and now found I made a mistake in my journalctl command. New file is the full log of the slow boot where I previously sent the filtered log. Sorry for that.
Comment 83 Thomas Backlund 2014-11-22 12:28:40 CET
(In reply to Simon McVittie from comment #80)
> Thanks everyone, I don't need any more test results with the 5000 timeout. I
> will revert the timeout change upstream soon, equivalent to what Mageia
> developers did for 1.6.18-1.8.
>

I was thinking about this...

would it be wise to ship a:

/etc/dbus-1/system-local.conf

with this commented out:

<busconfig>
  <limit name="auth_timeout">5000</limit>
</busconfig>

 and an explanation about enabling it to harden the system security ?
Comment 84 Simon McVittie 2014-11-22 13:12:50 CET
The whole point of system-local.conf is that unlike system.conf, it does not exist unless added by a local sysadmin; so, no, I'm not going to start shipping it now.

In the development branch, I intend to put the various default limits in system.conf, commented out, so sysadmins can use them as a reference. I'll probably also add a README.denial-of-service or similar at some point, but that's less urgent than fixing this regression.

I would rather limit the amount of churn on the 1.8 stable branch, so I probably won't do the same there.

Reducing this timeout from 30 to 5 seconds was a hardening change: it reduces the worst-case denial of service by reducing the time by which broken or malicious processes can delay legitimate connections. On my test system, with 100 processes each connecting in a loop, a legitimate connection takes around half the auth_timeout; so I don't think a shorter timeout than 30s is necessary unless your users are actively hostile.

If your users *are* that hostile, then I think dbus is probably the least of your concerns. You'd also want to apply a relatively strict RLIMIT_NPROC, which would also mitigate that attack.
Comment 85 claire robinson 2014-11-22 20:51:58 CET
I'd like to thank you, on behalf of everybody, for following up on this Simon.
Comment 86 David Walser 2014-12-15 20:06:52 CET
Just FYI, we can update Cauldron 1.8.12, which contains some of the fixes we've added locally:
http://lists.freedesktop.org/archives/dbus/2014-November/016430.html
Comment 87 Simon McVittie 2014-12-15 21:51:14 CET
I think upgrading to 1.6.28 or 1.8.12 should make all your patches for this issue unnecessary, if the policy for the relevant part of your distribution allows that. (I realise you are probably stuck on 1.6.18 + selected patches for your stable release - I'm in a similar situation with 1.6.8 in Debian 7.)

If you have additional downstream patches fixing other bugs, please send them to the freedesktop.org Bugzilla (product/component: dbus/core).
Comment 88 claire robinson 2014-12-17 11:31:40 CET
Thanks Simon. Our maintainer, Thomas, is having connection issues at the moment after moving into a new house.
Comment 89 Thomas Backlund 2014-12-21 22:25:56 CET
Thanks Simon.

as you suspected I will keep mga4 on 1.6.18 + patches, but for Cauldron / upcoming mga5 I've updated to 1.8.12

Closing this as fixed

Status: REOPENED => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.