Bug 9714 - BtrFS failing on boot as root filesystem (btrfs: open_ctree failed) ...
Summary: BtrFS failing on boot as root filesystem (btrfs: open_ctree failed) ...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2013-04-12 19:16 CEST by George Mitchell
Modified: 2015-05-17 23:41 CEST (History)
1 user (show)

See Also:
Source RPM: kernel / btrfs-progs
CVE:
Status comment:


Attachments
This is the dump that I am getting from initramfs upon failure. (75.16 KB, text/plain)
2013-04-12 19:22 CEST, George Mitchell
Details
From the latest boot failure of 04-19-13 (78.80 KB, text/plain)
2013-04-19 22:14 CEST, George Mitchell
Details
/etc/fstab from system intermittantly failing boot (665 bytes, text/plain)
2013-04-19 22:18 CEST, George Mitchell
Details
Boot failure from 04-20-13 (104.13 KB, text/plain)
2013-04-21 02:09 CEST, George Mitchell
Details
Possible dracut issue??? (6.53 KB, text/plain)
2013-06-04 17:24 CEST, George Mitchell
Details
A classic narrative of the btrfs boot bug in action (44.98 KB, text/plain)
2013-06-14 15:53 CEST, George Mitchell
Details

Description George Mitchell 2013-04-12 19:16:10 CEST
Description of problem: I am running Mageia 3 Beta 4 on a BtrFS RAID 1 root filesystem.  It runs fine when all the drives associated with the root filesystem are on the system host controller.  But when one or more drives are located on an add on controller, boot fails when initramfs attempts to mount the root filesystem.  I have experienced this failure both with a PCI-X (NOT PCI-e!) 3ware hardware RAID card running in JBOD mode and with a PCI-X Syba Silicone Image based "fake RAID" card running naked (no software RAID involved).  It seems like initramfs is initializing the controller cards immediately prior to attempting to mount the root filesystem.  This *might* indicate some sort of timing issue whereby the controller card is not ready when access is attempted with the initial mount.


Version-Release number of selected component (if applicable):


How reproducible:

Bridge a BtrFS root file system off onto an add on controller card and attempt to boot it.


Steps to Reproduce:
1.
2.
3.


Reproducible: 

Steps to Reproduce:
Comment 1 George Mitchell 2013-04-12 19:22:33 CEST
Created attachment 3738 [details]
This is the dump that I am getting from initramfs upon failure.

Attached is the sosreport.txt file from a failure.  These failures are happening ONLY when the file system is bridged across an add on controller card. With the 3ware card the failure was intermittant.  With the Syba card it is continuous.
Comment 2 George Mitchell 2013-04-12 21:23:23 CEST
I am now suddenly getting this failure even when all drives are on the host controller, so not sure where to go from here.  The failure appears to be right after the root mount and the task that appears to come next is USB stuff related.  So it is hard to know whether it is the mount that has failed or some USB initialization thing.  One thing I am noticing with BtrFS on multiple drives is that say you have a single BtrFS partition on /dev/sde1 and /dev/sdf1.  Suddenly you cannot mount /dev/sdf1, but can still mount /dev/sde1.  Then you do a scrub, find no errors, but now both /dev/sde1 AND /dev/sdf1 mount fine.  Sorry to be so verbose about this, but I just want to share all this info I can pull together and hopefully something will lead to an answer.
Comment 3 George Mitchell 2013-04-13 05:09:11 CEST
I have now moved /usr out of the / partition and on to its own /usr partition.  Now root partition is successfully mounting repeatedly on boots BUT now /usr partition is failing to mount.  It really looks like dracut and resulting initrd might be having difficulty handling large btrfs partitions.  That would also explain why enlarging the root or usr partitions by adding drives would exacerbate the issue.  In every case I can mount these partitions with no problem externally from a functioning instance of Mageia.  There has got to be either a memory or timing issue going on here.
Comment 4 George Mitchell 2013-04-13 16:53:47 CEST
At this point I have reduced the size of the /usr partition by half (I had a big tarball sitting in it for testing) and now booting is uneventful again.  So it looks like this problem can be tested out by just loading up /root (or /usr) partitions with something like a very large tar file.  On my machine, doing that causes a mount failure on boot.
George Mitchell 2013-04-13 16:55:47 CEST

Summary: BtrFS failing on boot when file system devices are split between two controllers ... => BtrFS failing on boot when /root or /usr file system devices contain too much data ...

Manuel Hiebel 2013-04-13 19:30:57 CEST

Assignee: bugsquad => mageia

Comment 5 Thomas Backlund 2013-04-14 13:52:35 CEST
This is a kernel / btrfs-progs problem, not a dracut one:

[    5.366301] device label MAGEIA3BTR devid 3 transid 9386 /dev/sde1
[    5.367433] btrfs: disk space caching is enabled
[    5.376100] btrfs: failed to read chunk tree on sde1
[    5.410108] btrfs: open_ctree failed
[    5.428016] dracut: Checking, if btrfs device complete
[    5.443226] device label MAGEIA3BTR devid 3 transid 9386 /dev/sde1
[    5.444335] btrfs: disk space caching is enabled
[    5.445011] btrfs: failed to read chunk tree on sde1

I see some related fixes in upstream btrfs-progs git that I'll backport to current package.

Status: NEW => ASSIGNED
CC: (none) => tmb
Assignee: mageia => tmb
Source RPM: dracut-025-5.mga3 => kernel / btrfs-progs

Comment 6 George Mitchell 2013-04-14 17:25:49 CEST
(In reply to Thomas Backlund from comment #5)
> This is a kernel / btrfs-progs problem, not a dracut one:
> 
> [    5.366301] device label MAGEIA3BTR devid 3 transid 9386 /dev/sde1
> [    5.367433] btrfs: disk space caching is enabled
> [    5.376100] btrfs: failed to read chunk tree on sde1
> [    5.410108] btrfs: open_ctree failed
> [    5.428016] dracut: Checking, if btrfs device complete
> [    5.443226] device label MAGEIA3BTR devid 3 transid 9386 /dev/sde1
> [    5.444335] btrfs: disk space caching is enabled
> [    5.445011] btrfs: failed to read chunk tree on sde1
> 
> I see some related fixes in upstream btrfs-progs git that I'll backport to
> current package.

Thanks Thomas for spotting this!  Other than this ONE ISOLATED problem, btrfs is running splendidly in every respect.  I have been waiting for a long time for it to get to the point of being reasonably stable and it is more than meeting my expectations.
Comment 7 Thomas Backlund 2013-04-16 22:19:10 CEST
btrfs-progs-0.20-0.rc1.20130117.2.mga3 uploaded with several upstream fixes and a couple of them should help / cope with open_ctree failures.
Comment 8 George Mitchell 2013-04-16 23:52:08 CEST
Thanks Thomas so much for helping to move this along.  I will mark this resolved if it proves to be a complete solution.  At this point I am finding that including the devices in fstab *seems* to help, but I still have failures some times.  I realize that btrfs is a new thing and running btrfs on root is a very new thing, so I have a lot of patience with this.  But I want to do as much as I can to spot issues and report them.
Comment 9 George Mitchell 2013-04-17 22:31:05 CEST
Well I have btrfs-progs-0.20-0.rc1.20130117.2.mga3 installed now and the problem continues at this point.  But I haven't run dracut or anything like that to generate a new initrd.  What IS new is that this morning after three boot fails, the boot got to the point of where the open_ctree problem occurred and things suddenly got pretty amazing.  Instead of bombing out there as it has always done before, it just sat there pounding on the partition over with the text flying by on the screen until finally I started seeing green "OK"s pour out.  And it was system up.  This is something I have not seen before and it looks to me like progress, even if the booting kernel can just keep trying until it gets a good read.  - George
Comment 10 Thomas Backlund 2013-04-17 22:57:09 CEST
try to generate a new initrd so new btrfs-progs gets added in there too and see if it helps
Comment 11 George Mitchell 2013-04-17 23:13:35 CEST
A new kernel just came down a few minutes ago and is now installed.  I do believe this runs a fresh dracut process which I think *should* update everything?  So I will keep you up to date on what I find along the way.  - George
Comment 12 George Mitchell 2013-04-19 22:12:50 CEST
Thomas, things *seem* to be getting a bit better, although it is hard to tell.  There may be more than one thing going on here, is seems like different boots are going in slightly different directions which seems strange to me.  with many of the boot failures I am left with no keyboard access which tells me that either the booting kernel's IO has locked up or that the booting kernel has dropped dead in panic.  However, as I get the opportunity, I will pull the sosreports and pass them one.  I am going to pass along the most recent one from a few minutes ago.  The ONLY thing I see that looks strange to me is:

[   13.475329] dracut: Mounting /usr with -o device=/dev/sdc2,device=/dev/sdd2,device=/dev/sdf2,device=/dev/sdg2,relatime,device=/dev/sdc1,device=/dev/sdd1,device=/dev/sde1,relatime,ro

But you can read this stuff far better than I can so the whole sosreport will get attached.

Thanks so much for your help on this.  - George
Comment 13 George Mitchell 2013-04-19 22:14:55 CEST
Created attachment 3770 [details]
From the latest boot failure of 04-19-13
Comment 14 George Mitchell 2013-04-19 22:18:57 CEST
Created attachment 3771 [details]
/etc/fstab from system intermittantly failing boot

Just for the sake of reference, I am including my fstab file.
Comment 15 George Mitchell 2013-04-21 02:09:44 CEST
Created attachment 3778 [details]
Boot failure from 04-20-13
George Mitchell 2013-05-01 20:38:51 CEST

Summary: BtrFS failing on boot when /root or /usr file system devices contain too much data ... => BtrFS failing on boot as root filesystem (btrfs: open_ctree failed) ...

Comment 16 George Mitchell 2013-05-01 20:39:28 CEST
I modified the title of this bug report to hopefully better describe the problem.  - George
Comment 17 George Mitchell 2013-06-02 16:05:35 CEST
At this point I am even more suspecting that this is either a timing issue (for some reason initrd is unable to wait for the mount process to complete) or a resource issue (for some reason initrd does not have the memory to accommodate the mount process).  I am not convinced it is a btrfs problem because for a long time it has never ever occurred outside of the boot process.  The only caveat to that conviction would be if it is somehow related to only read only mounts and I can't see how that would be.  It is more like something in the initrd environment is causing this to happen.  I DO believe that I might have found a bandaid solution for the problem at this point.  I have so far completed four boots in succession over a period of three days without open_ctree failure which is unusual. This resolution has occurred after running the system 24hrs a day for nearly a week doing file by file defragmentation of both data and metadata system wide, a long and tedious process.  At this point I am maintaining this by strategically defraging any file or directory that shows recent modification on a daily basis.  This has also resulted in a significantly faster boot process with much less disk activity along the way.  And this seemingly has perhaps solved the open_ctree boot problem, although I am sure that just posting this will probably cause it to recur again.  So I will let this play out for a few more weeks and keep you all up to date on how things go.  - George
Comment 18 George Mitchell 2013-06-04 17:24:54 CEST
Created attachment 4100 [details]
Possible dracut issue???

Here is an example of open_ctree failures which ultimately resolved during the boot process.  And what apparently triggered that resolution is this line:

Jun 04 06:26:25 localhost.localdomain dracut: Scanning for all btrfs devices

So why did dracut take this long into the boot process to do the device scan since this is a KNOWN ISSUE with btrfs?
Comment 19 George Mitchell 2013-06-05 00:41:14 CEST
Alas, the above is probably just a coincidence.  After review of previous logs looking for a similar pattern nothing shows up.  And I am realizing that this is probably not the same "btrfs device scan" as the one I was thinking of.  But what continues to remain a pattern is that defragging of the root filesystem fixes the problem temporarily virtually every time.  So this has to have something to do with btrfs metadata fragmentation.
Comment 20 George Mitchell 2013-06-11 04:55:19 CEST
What I have learned so far to now is:

1) The likely solution to this problem is adding "rootdelay=2" as an option in the kernel command line in grub.cfg.  Unfortunately, this option does not work apparently due to some problem in dracut/initrd.  This option would give extra time for the btrfs root filesystem to settle before the boot process attempts to query it.  I have opened bug 10484 (https://bugs.mageia.org/show_bug.cgi?id=10484) in regard to this problem.

2) Thorough defragging consistently helps to avoid open_ctree boot failures.

3) Specifying device name (/dev/sd..) rather than UUID on the grub.cfg kernel command line also seems to help avoid this problem.
Comment 21 George Mitchell 2013-06-14 15:53:48 CEST
Created attachment 4140 [details]
A classic narrative of the btrfs boot bug in action

This is the output of journalctl from a boot that eventually succeeded, ONLY after dracut began and completed btrfs scans.  This is the second time I have observed this phenomenon, but this time it is VERY clear.  It begs the questions of why dracut does not do a routine initial btrfs scan and what is it that initially triggers the scan when it does one and why it waits so long to do it. There just has to be copious evidence in this narrative that would suggest a dracut fix that would address and resolve this problem once and for all.
Comment 22 George Mitchell 2013-07-01 01:51:15 CEST
Not one boot failure since 06-11 with regular defragging and real device name on kernel command line.  Not a pretty solution, but it works.
Comment 23 Samuel Verschelde 2015-05-17 21:21:04 CEST
tmb, George, can I suppose it works way better in Mageia 4 & 5?

Keywords: (none) => NEEDINFO

Comment 24 George Mitchell 2015-05-17 22:42:30 CEST
Samuel, I certainly hope so, however I have yet to find out since I am still on Mageia 3.  I am hoping to transition from Mageia 3 32bit to Mageia 5 64bit soon and hopefully that will take care of this problem along with a few others.  I would say that you should probably mark this as resolved at this point since it is so stale that it has become meaningless.  Thanks for taking the time to revisit it.
Comment 25 Samuel Verschelde 2015-05-17 23:41:00 CEST
Thanks, marking it as resolved and counting on you for reopening if still valid.

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.