Bug 6440 - Hung SATA drive after mount/unmount if drive is spun down
Summary: Hung SATA drive after mount/unmount if drive is spun down
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: x86_64 Linux
Priority: Normal major
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-13 01:36 CEST by M D
Modified: 2013-11-23 16:16 CET (History)
1 user (show)

See Also:
Source RPM: udisks-1.0.4-6.mga2, udisks2-1.93.0-2.mga2, kernel-desktop
CVE:
Status comment:


Attachments

Description M D 2012-06-13 01:36:57 CEST
I have several connected SATA drives.  I keep two of them in low-power mode (spun down) with the following in /etc/rc.d/rc.local:

/sbin/hdparm -S 6 /dev/sdc                   
/sbin/hdparm -S 6 /dev/sdd

Those drives are not mounted on boot (they are used for backups).  Without touching the drives, everything works as expected.  They stay spun down and unmounted and no errors.

But if I mount one of those drives (#mount /dev/sdc1 /mnt/sdc1) and use it for a while, then unmount it, later, the hard drive activity light on that drive will go solid red and the following error messages will appear in /var/log/messages:

-----------
Jun  4 20:09:23  kernel: [96812.704067] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun  4 20:09:23  kernel: [96812.704070] ata5.00: failed command: SMART
Jun  4 20:09:23  kernel: [96812.704074] ata5.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Jun  4 20:09:23  kernel: [96812.704075]          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Jun  4 20:09:23  kernel: [96812.704077] ata5.00: status: { DRDY }
Jun  4 20:09:23  kernel: [96812.704080] ata5: hard resetting link
Jun  4 20:09:33  kernel: [96822.705066] ata5: softreset failed (1st FIS failed)
Jun  4 20:09:33  kernel: [96822.705070] ata5: hard resetting link
Jun  4 20:09:43  kernel: [96832.706059] ata5: softreset failed (1st FIS failed)
Jun  4 20:09:43  kernel: [96832.706063] ata5: hard resetting link
Jun  4 20:10:18  kernel: [96867.707077] ata5: softreset failed (1st FIS failed)
Jun  4 20:10:18  kernel: [96867.707082] ata5: limiting SATA link speed to 1.5 Gbps
Jun  4 20:10:18  kernel: [96867.707084] ata5: hard resetting link
Jun  4 20:10:23  kernel: [96872.909053] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun  4 20:10:28  kernel: [96877.909019] ata5.00: qc timeout (cmd 0xec)
Jun  4 20:10:28  kernel: [96877.909027] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jun  4 20:10:28  kernel: [96877.909029] ata5.00: revalidation failed (errno=-5)
Jun  4 20:10:28  kernel: [96877.909033] ata5: hard resetting link
Jun  4 20:10:38  kernel: [96887.909058] ata5: softreset failed (1st FIS failed)
Jun  4 20:10:38  kernel: [96887.909062] ata5: hard resetting link
Jun  4 20:10:48  kernel: [96897.910023] ata5: softreset failed (1st FIS failed)
Jun  4 20:10:48  kernel: [96897.910028] ata5: hard resetting link
Jun  4 20:11:23  kernel: [96932.911019] ata5: softreset failed (1st FIS failed)
Jun  4 20:11:23  kernel: [96932.911024] ata5: hard resetting link
Jun  4 20:11:28  kernel: [96938.112046] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun  4 20:11:38  kernel: [96948.112067] ata5.00: qc timeout (cmd 0xec)
Jun  4 20:11:38  kernel: [96948.112075] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jun  4 20:11:38  kernel: [96948.112077] ata5.00: revalidation failed (errno=-5)
Jun  4 20:11:38  kernel: [96948.112082] ata5: hard resetting link
Jun  4 20:11:48  kernel: [96958.112059] ata5: softreset failed (1st FIS failed)
Jun  4 20:11:48  kernel: [96958.112063] ata5: hard resetting link
Jun  4 20:11:58  kernel: [96968.112035] ata5: softreset failed (1st FIS failed)
Jun  4 20:11:58  kernel: [96968.112040] ata5: hard resetting link
Jun  4 20:12:33  kernel: [97003.113066] ata5: softreset failed (1st FIS failed)
Jun  4 20:12:33  kernel: [97003.113071] ata5: hard resetting link
Jun  4 20:12:38  kernel: [97008.314075] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun  4 20:13:08  kernel: [97038.314030] ata5.00: qc timeout (cmd 0xec)
Jun  4 20:13:08  kernel: [97038.314039] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jun  4 20:13:08  kernel: [97038.314040] ata5.00: revalidation failed (errno=-5)
Jun  4 20:13:08  kernel: [97038.314042] ata5.00: disabled
Jun  4 20:13:08  kernel: [97038.325088] ata5: hard resetting link
Jun  4 20:13:18  kernel: [97048.326061] ata5: softreset failed (1st FIS failed)
Jun  4 20:13:18  kernel: [97048.326065] ata5: hard resetting link
Jun  4 20:13:28  kernel: [97058.327055] ata5: softreset failed (1st FIS failed)
Jun  4 20:13:28  kernel: [97058.327059] ata5: hard resetting link
Jun  4 20:14:03  kernel: [97093.328070] ata5: softreset failed (1st FIS failed)
Jun  4 20:14:03  kernel: [97093.328076] ata5: hard resetting link
Jun  4 20:14:09  kernel: [97098.529077] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun  4 20:14:09  udisksd[4587]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/ST32000542AS_9XW0AWPM: Error updating SMART data: sk_disk_smart_read_data: Input/output error (udisks-error-quark, 0)
Jun  4 20:14:09  kernel: [97098.540078] ata5: EH complete
Jun  4 20:19:10  udisksd[4587]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/ST32000542AS_9XW0AWPM: Error updating SMART data: sk_disk_check_sleep_mode: Operation not supported (udisks-error-quark, 0)
----------

After such an "explosion", I am unable to use the drive anymore, it will spew errors if I try to touch it in any way.  AHCI hotswap is also broken in Mageia 2 (different bug report: https://bugs.mageia.org/show_bug.cgi?id=6433 ) so unplugging and replugging the drive only results in the device node just disappearing forever.  The only way to recover from the above is to reboot the system.

I had been using this exact same arrangement under Mandriva 2010 and 2009 for years without a single problem.
Comment 1 Sander Lepik 2012-06-13 09:14:09 CEST
Thomas, any ideas?

CC: (none) => sander.lepik
Assignee: bugsquad => tmb

Comment 2 Marja Van Waes 2012-07-06 15:06:09 CEST
Please look at the bottom of this mail to see whether you're the assignee of this  bug, if you don't already know whether you are.


If you're the assignee:

We'd like to know for sure whether this bug was assigned correctly. Please change status to ASSIGNED if it is, or put OK on the whiteboard instead.

If you don't have a clue and don't see a way to find out, then please put NEEDHELP on the whiteboard.

Please assign back to Bug Squad or to the correct person to solve this bug if we were wrong to assign it to you, and explain why.

Thanks :)

**************************** 

@ the reporter and persons in the cc of this bug:

If you have any new information that wasn't given before (like this bug being valid for another version of Mageia, too, or it being solved) please tell us.

@ the reporter of this bug

If you didn't reply yet to a request for more information, please do so within two weeks from now.

Thanks all :-D
Comment 3 M D 2012-07-06 15:22:24 CEST
I don't have much information to report or add.  I am still having the same problems and they are pretty irritating.  The only thing I can do once the drive is lost is to use the procedure discovered in https://bugs.mageia.org/show_bug.cgi?id=6433

This problem happens within the first day or so, every time the system is rebooted.  Once the problems are triggered and I go through all the steps in 
https://bugs.mageia.org/show_bug.cgi?id=6433 to recover the drive, and then set the sleep mode back (hdparm), the system will remain 100% stable until the next reboot.
Comment 4 M D 2013-01-18 21:42:06 CET
Mageia recently pushed through a kernel update to 3.4.24-desktop-3.mga2.  Unfortunately this has *NOT* fixed the problem I have reported.
Manuel Hiebel 2013-02-14 18:00:11 CET

Source RPM: lib64udisks2_0-1.93.0-2.mga2, udisks-1.0.4-6.mga2, udisks2-1.93.0-2.mga2, kernel-desktop-3.3.6-2.mga2-1-1.mga2 => udisks-1.0.4-6.mga2, udisks2-1.93.0-2.mga2, kernel-desktop

Comment 5 Manuel Hiebel 2013-10-22 12:20:50 CEST
This message is a reminder that Mageia 2 is nearing its end of life.
Approximately one month from now Mageia will stop maintaining and issuing updates for Mageia 2. At that time this bug will be closed as WONTFIX (EOL) if it remains open with a Mageia 'version' of '2'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Mageia version prior to Mageia 2's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Mageia 2 is end of life.  If you would still like to see this bug fixed and are able to reproduce it against a later version of Mageia, you are encouraged to click on "Version" and change it against that version of Mageia.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Mageia release includes newer upstream software that fixes bugs or makes them obsolete.

-- 
The Mageia Bugsquad
Comment 6 Manuel Hiebel 2013-11-23 16:16:21 CET
Mageia 2 changed to end-of-life (EOL) status on ''22 November''. Mageia 2 is no
longer maintained, which means that it will not receive any further security or
bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of Mageia
please feel free to click on "Version" change it against that version of Mageia
and reopen this bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--
The Mageia Bugsquad

Status: NEW => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.