| Summary: | Ext4fs corruption occurred after resuming from hibernation | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Mike Burgener <mburgener> |
| Component: | RPM Packages | Assignee: | Thomas Backlund <tmb> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | sysadmin-bugs, thierry.vignaud |
| Version: | Cauldron | Keywords: | NEEDINFO |
| Target Milestone: | Mageia 5 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | kernel-desktop-3.19.8-1.mga5 | CVE: | |
| Status comment: | |||
| Attachments: | dmesg output | ||
|
Description
Mike Burgener
2015-06-04 12:52:56 CEST
Exact version of the kernel would be nice, in addition to logs, if you can provide them. Component:
Release (media or process) =>
RPM Packages of course, @ the moment hacking around to get them, working on a dd image with testdisk from the windows installation to my NAS, perhaps i can even provide the DD image (after i removed my pers data when i can read it) as a download on my tuxinator servers for debugging, hope it's hardware and not a software issue, so the release of 5 is not in danger Created attachment 6698 [details]
dmesg output
got it booting after some manual fsck kernel log is attached look for the "Ext4" messages
Mike Burgener
2015-06-04 14:11:25 CEST
Priority:
Normal =>
High
Mike Burgener
2015-06-04 14:11:53 CEST
Target Milestone:
--- =>
Mageia 5 Linux hostname 3.19.8-desktop-1.mga5 #1 SMP Mon May 11 16:35:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux I guess you were you using RAID, didn't you? If use, you got hit by the infamous raid bug that was fixed in 3.19.8-desktop-2.mga5 that was released 2 weeks ago: " - md/raid0: fix restore to sector variable in raid0_make_request" See http://lwn.net/Articles/645720/ for details CC:
(none) =>
thierry.vignaud no, there is and never was any raid setup on that machine, its a acer aspire s3 with a samsung evo 530 SSD and the SSD health is ok regards Mike
Thierry Vignaud
2015-06-05 10:59:55 CEST
Attachment 6698 mime type:
text/x-log =>
text/plain You forgot to say that this happened after hibernation!!! Thomas, before hibernating, we can see several "BUG: Bad page map in process" messages. After resuming, there's a lot of: "EXT4-fs error (device sda5): ext4_read_block_bitmap_nowait:427: comm kworker/u16:7: Cannot get buffer for block bitmap") Mixed with a couple: "JBD2: Spotted dirty metadata buffer (dev = sda5, blocknr = 0). There's a risk of filesystem corruption in case of system crash." Mike: Are you sure you didn't boot any other OS while Mageia was hibernated? Could you have booted eg a Windows with an ext4 driver or another Linux that could have mounted Mageia partition, thus causing differences between on-disk image and what the suspended kernel kept in its suspend imaged? Keywords:
(none) =>
NEEDINFO Hi, sorry forgot that i use hibernation sometimes on that machine. yes i'm sure i did not boot any windows on that machine before waking up from hibernation. and my windows does not have any ext4 driver at the moment. however at the moment the system works again after the fsck repaired some stuff. i will keep an eye on the machine and check if i get soon any weird kernel messages, however i think this is no real blocker and for the moment we can change the bug state to a lower level. regards Mike Status:
NEW =>
UNCONFIRMED
Mike Burgener
2015-06-05 13:52:51 CEST
Priority:
High =>
Normal Any data corruption is still a major bug, raising severity a little bit. Severity:
minor =>
major Mike: did you run another Linux distribution before resuming? Also you should update to kernel-desktop-3.19.8-2.mga5 which has important fixes Source RPM:
kernel =>
kernel-desktop-3.19.8-1.mga5 One question... when did you install this system, and with what ? mga4 ? mga5-beta ? mga5-rc ? ... There was an older ext4 bug that was fixed in 3.19.7 that could have caused that... (the delayed extents bug we also squashed for mga4 in http://advisories.mageia.org/MGASA-2015-0236.html) Having said that I see several possible related fixes in upstream -stable queue and there is also some specific to Samsung SSDs... I'll go review them... actually this seem to be a is an issue of fsck running during resume as found it a matching Fedora bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=1174945 I've added the same fix as fedora did in: dracut-038-19.mga5 So when that gets installed, recreate your initrd with the new dracut hmm, possible that i installed beginning with beta5, not sure anymore. for me the fedora report looks different regards Mike perhaps also related to https://bugzilla.redhat.com/show_bug.cgi?id=1185640 yes, and thats exactly what the fix I added to dracut should resolve ok nice, so i'll use hibernation much after i got that update/patch to give it a test regards Mike an update, never had the issue again, however after changing to another SSD on the same Notebook i frequently get those messages but everything continues to work, what i find also interesting is the UDMA133 message, as i think UDMA would be much to slow for a SATA 6Gb Jun 16 18:35:02 localhost kernel: ata4: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: ata5: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: ata2: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: usb 1-1: reset high-speed USB device number 2 using ehci-pci Jun 16 18:35:02 localhost kernel: usb 3-1: reset full-speed USB device number 2 using xhci_hcd Jun 16 18:35:02 localhost kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880148c32780 Jun 16 18:35:02 localhost kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880148c327e0 Jun 16 18:35:02 localhost kernel: usb 3-4: reset high-speed USB device number 3 using xhci_hcd Jun 16 18:35:02 localhost kernel: usb 1-1.3: reset high-speed USB device number 3 using ehci-pci Jun 16 18:35:02 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 16 18:35:02 localhost kernel: ata1.00: configured for UDMA/133 Jun 16 18:35:02 localhost kernel: usb 1-1.4: reset full-speed USB device number 4 usi ok, the UDMA seems to be a legacy message as speed looks ok: Timing O_DIRECT cached reads: 960 MB in 2.00 seconds = 479.95 MB/sec Timing O_DIRECT disk reads: 1168 MB in 3.00 seconds = 388.88 MB/sec was a longtime disk issue not kernel or fs related. Status:
UNCONFIRMED =>
RESOLVED |