Hi, i just got in to filesystem corruption with my SSD disk (Samsung SSD 830) on my Windows (dual boot) everything is still ok and the Samsung magician application say the disk is ok. also the smartctl application showed me that everything is ok. it begun with some weird messages appearing in syslog (journalctl -f) about read errors. at the moment i'm unable to boot the machine. will now try to recover and get some more info out of the non-booting installation. regards Mike
Exact version of the kernel would be nice, in addition to logs, if you can provide them.
Component: Release (media or process) => RPM PackagesAssignee: bugsquad => tmbSource RPM: (none) => kernel
of course, @ the moment hacking around to get them, working on a dd image with testdisk from the windows installation to my NAS, perhaps i can even provide the DD image (after i removed my pers data when i can read it) as a download on my tuxinator servers for debugging, hope it's hardware and not a software issue, so the release of 5 is not in danger
Created attachment 6698 [details] dmesg output
got it booting after some manual fsck kernel log is attached look for the "Ext4" messages
Priority: Normal => High
Target Milestone: --- => Mageia 5
Linux hostname 3.19.8-desktop-1.mga5 #1 SMP Mon May 11 16:35:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
I guess you were you using RAID, didn't you? If use, you got hit by the infamous raid bug that was fixed in 3.19.8-desktop-2.mga5 that was released 2 weeks ago: " - md/raid0: fix restore to sector variable in raid0_make_request" See http://lwn.net/Articles/645720/ for details
CC: (none) => thierry.vignaud
no, there is and never was any raid setup on that machine, its a acer aspire s3 with a samsung evo 530 SSD and the SSD health is ok regards Mike
Attachment 6698 mime type: text/x-log => text/plain Attachment 6698 description: logfiles => dmesg output
You forgot to say that this happened after hibernation!!! Thomas, before hibernating, we can see several "BUG: Bad page map in process" messages. After resuming, there's a lot of: "EXT4-fs error (device sda5): ext4_read_block_bitmap_nowait:427: comm kworker/u16:7: Cannot get buffer for block bitmap") Mixed with a couple: "JBD2: Spotted dirty metadata buffer (dev = sda5, blocknr = 0). There's a risk of filesystem corruption in case of system crash." Mike: Are you sure you didn't boot any other OS while Mageia was hibernated? Could you have booted eg a Windows with an ext4 driver or another Linux that could have mounted Mageia partition, thus causing differences between on-disk image and what the suspended kernel kept in its suspend imaged?
Keywords: (none) => NEEDINFOSummary: filesystem corruption occured using ext4 fs after working normaly => Ext4fs corruption occurred after resuming from hibernation
Hi, sorry forgot that i use hibernation sometimes on that machine. yes i'm sure i did not boot any windows on that machine before waking up from hibernation. and my windows does not have any ext4 driver at the moment. however at the moment the system works again after the fsck repaired some stuff. i will keep an eye on the machine and check if i get soon any weird kernel messages, however i think this is no real blocker and for the moment we can change the bug state to a lower level. regards Mike
Status: NEW => UNCONFIRMEDEver confirmed: 1 => 0
Priority: High => NormalSeverity: critical => minor
Any data corruption is still a major bug, raising severity a little bit.
Severity: minor => major
Mike: did you run another Linux distribution before resuming? Also you should update to kernel-desktop-3.19.8-2.mga5 which has important fixes
Source RPM: kernel => kernel-desktop-3.19.8-1.mga5
One question... when did you install this system, and with what ? mga4 ? mga5-beta ? mga5-rc ? ... There was an older ext4 bug that was fixed in 3.19.7 that could have caused that... (the delayed extents bug we also squashed for mga4 in http://advisories.mageia.org/MGASA-2015-0236.html) Having said that I see several possible related fixes in upstream -stable queue and there is also some specific to Samsung SSDs... I'll go review them...
actually this seem to be a is an issue of fsck running during resume as found it a matching Fedora bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=1174945 I've added the same fix as fedora did in: dracut-038-19.mga5 So when that gets installed, recreate your initrd with the new dracut
hmm, possible that i installed beginning with beta5, not sure anymore. for me the fedora report looks different regards Mike
perhaps also related to https://bugzilla.redhat.com/show_bug.cgi?id=1185640
yes, and thats exactly what the fix I added to dracut should resolve
ok nice, so i'll use hibernation much after i got that update/patch to give it a test regards Mike
an update, never had the issue again, however after changing to another SSD on the same Notebook i frequently get those messages but everything continues to work, what i find also interesting is the UDMA133 message, as i think UDMA would be much to slow for a SATA 6Gb Jun 16 18:35:02 localhost kernel: ata4: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: ata5: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: ata2: SATA link down (SStatus 0 SControl 300) Jun 16 18:35:02 localhost kernel: usb 1-1: reset high-speed USB device number 2 using ehci-pci Jun 16 18:35:02 localhost kernel: usb 3-1: reset full-speed USB device number 2 using xhci_hcd Jun 16 18:35:02 localhost kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880148c32780 Jun 16 18:35:02 localhost kernel: xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880148c327e0 Jun 16 18:35:02 localhost kernel: usb 3-4: reset high-speed USB device number 3 using xhci_hcd Jun 16 18:35:02 localhost kernel: usb 1-1.3: reset high-speed USB device number 3 using ehci-pci Jun 16 18:35:02 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 16 18:35:02 localhost kernel: ata1.00: configured for UDMA/133 Jun 16 18:35:02 localhost kernel: usb 1-1.4: reset full-speed USB device number 4 usi
ok, the UDMA seems to be a legacy message as speed looks ok: Timing O_DIRECT cached reads: 960 MB in 2.00 seconds = 479.95 MB/sec Timing O_DIRECT disk reads: 1168 MB in 3.00 seconds = 388.88 MB/sec
was a longtime disk issue not kernel or fs related.
Status: UNCONFIRMED => RESOLVEDResolution: (none) => INVALID