Bug 4750 - Unable to install Mageia beta 1 with raid ahci
Summary: Unable to install Mageia beta 1 with raid ahci
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Thierry Vignaud
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-29 14:46 CET by Fabrice Boyrie
Modified: 2012-05-26 16:26 CEST (History)
7 users (show)

See Also:
Source RPM: dracut
CVE:
Status comment:


Attachments
report.bug (301.50 KB, application/x-gzip)
2012-03-01 07:46 CET, Fabrice Boyrie
Details

Description Fabrice Boyrie 2012-02-29 14:46:48 CET
Description of problem:
I have a new computer with two SATA hard disks configured as RAID1 using raid-AHCI (integrated intel). During the installation, Mageaia has seen the raid volume, no problem. But after installation, kernel refuses to boot
with the message /dev/mapper/isw_baejehjid_Volume0p1 does not exist

dracut gives me a shell
In fact, there are no /dev/mapper at all.
It seems there are no mdadm bin into the initrd files/


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install mageia using ahci raid
2. Reboot
Comment 1 Olav Vitters 2012-02-29 14:50:46 CET
Did this work in Mageia 1?

CC: (none) => olav

Comment 2 Fabrice Boyrie 2012-02-29 15:38:42 CET
I don't know. The computer is new and I wanted to try the latest version.
If it is really necessary, I can download Mageia 1 and reinstall.

  For the moment, I'm trying to correct problems at hand. Using DVD to repair the install, automatic mount didn't works.
mdadm -A -s see two md  /dev/md127 and /dev/md126 (127 containing metadata for 126), but put the inactive. I had to stop all md, create by hand a /etc/mdam.conf ant startint /dev/md127 and /dev/md126. Now, the volume is resyncing.

  In the hard disk, mdadm was not install at all

By the way, every 5 minutes my console display
'INIT cannot execute /sbin/agetty'

But this is another bug.
Comment 3 Manuel Hiebel 2012-02-29 19:33:38 CET
Can be a side effect of the fix for bug 37.

Don't know if that can help dev, but could you provide the file /root/drakx/report.bug.gz as an attachment ?

Assignee: bugsquad => thierry.vignaud
Source RPM: (none) => draklive-installer-stage2

Comment 4 Thierry Vignaud 2012-02-29 20:28:35 CET
Yes please provide it.
But I think it's a dracut issue instead

CC: (none) => mageia
Source RPM: draklive-installer-stage2 => dracut

Comment 5 Fabrice Boyrie 2012-03-01 07:46:50 CET
Created attachment 1662 [details]
report.bug

This is the required file
Comment 6 Fabrice Boyrie 2012-03-01 07:56:35 CET
Some remarks. 
_ It seems you want to use dmraid to mount the ahci raid. Why don't you use mdadm as in fedora ?
_ I created my initrd to boot. 
The syntax was
dracut --mdadmconf   --force-add "mdraid dm" --add-drivers  "raid1 ext4"   -M -f
Why raid1 and ext4 modules not integarted by default ? (I was in a chroot from rescue mode, with /proc and /sys mounted in the chroot)
_ the rescue mode doesn't works correctly with the raid. When I ask it to mount the partition, it detected a raid but mounted sda1 !
Comment 7 Colin Guthrie 2012-03-01 12:06:52 CET
Which version did you install? beta1?

Regarding why the raid1 and ext4 modules not being included by default: They are only included if they are detected as being necessary. No need for someone to have raid drivers in their initrd if they do not use raid.

During the install, I suspect that we simply didn't "detect" that you'd setup raid. This will likely be due to how udev was used (or simply a bug in dracut: this one could be to blame possibly: http://colin.guthr.ie/git/dracut/commit/?id=004fd0557d24444d363620ed718027a0dc9a1ba2)

Can I ask a perhaps very subtle question. Did you create the raid during the install procedure (i.e. from blank disks) or did you pick an existing raid and simply format it? This difference is very subtle but it might make a difference depending on when the intird was generated.


Anyway, all the above aside, we now generate a non-hostonly initrd. That is an initrd that contains a lot of additional drivers etc. This will take a typicaly initrd size up from around 5meg compressed to about 15meg+. This is only done during install and from rescue boots (or basically any time the folder 
/run/initramfs/ does not exist). This should solve your problem (although does still cause problems for LVM systems with /usr on LVM but / on a regular ext* filesystem, but that's another problem).

And FWIW, we use pretty much the same dracut version as fedora (we've been feeding lots of bugfixes upstream actually). Quite why it favours dm over md I don't know.
Comment 8 Fabrice Boyrie 2012-03-01 12:39:58 CET
_ I still don't understand why dracut, has not put ext4 and raid1 by default.
In your /etc/dracut.conf.d/50-mageia.conf there is 
hostonly=yes 
  In my understanding of dracut, it should have included required modules for my host (When I launched dracut, the chroot had access to /proc and /sys)
_ The computer was blank. From the bios I said to create a raid 1 volume. And Mageia has seen a blank disk without any partition (and raid was syncing)
It is not the first time I see an error installing linux on newly created ahci raid volume. But usually, the problem is during installation not after.
_ It is not only dracut wich favour dmraid instead of mdadm, it is the installer. mdadm was not installed at all on the computer.
Comment 9 Colin Guthrie 2012-03-01 12:47:38 CET
The problem is that dracut relies a lot on udev metadata for it's detection. If this metadata is not present then the detection may fail and thus the modules are not included.

So it's not about having access to /proc and /sys, much more is about having a working udev daemon running that is started during early boot (i.e before any commands such as vgchange (lvm) and mdadm (for raid) are initiated.

Now, dracut itself does produce an initrd that ensures that udev is started early, but the installer and rescue systems are a little behind here in terms of that approach (they need work in that regard).

This is why the current idea is to generate a non-hostonly initrd during install and from rescue. This is done via a patch in our dracut package:

http://svnweb.mageia.org/packages/cauldron/dracut/current/SOURCES/0503-Handle-upgrades-better.patch?revision=213871&view=markup

Now as you can see, I only added this patch a week ago, which is why I asked what version you were using and if it was beta1 or cauldron... the latter should include this tweak, whereas the former will not. I'm sure you can also see that this patch overrides the hostonly=XXX value in any config file or on the command line.

Hopefully this would at least see the installer generate a working system.

I am not sure about the dm vs. md problem tho'.
Comment 10 Fabrice Boyrie 2012-03-01 13:03:51 CET
The install was from the beta1 x86_64 DVD. 
As It took me a long time to install, I'd prefer not to reinstall.
But, If there is a new DVD I can test the rescue mode.
Comment 11 Colin Guthrie 2012-03-01 13:41:16 CET
You could still do some very useful tests for me on the installed system if you don't mind? It would basically simulate what the installer would do.

I presume from your previous comments that you are very confident playing with initrds and thus won't leave yourself with an unbootable system (i.e. you'll keep backups of your working initrd?).

If that's the case could you do the following:

1. Ensure you have updated to latest cauldron rpms, especially the latest dracut package.
2. Temporarily move the folder /run/initramfs to a different name to trigger the code hack linked in my previous comment.
3. Use mkinitrd command (rather than dracut directly) to generate a new initrd.

This new initrd should be a generic initramfs and much larger than a hostonly one, but should contain everything you'd need to boot successfully (my previous comment about /usr on LVM notwithstanding...).

You should get a warning about the fact it's generating a generic initramfs too.

This *should* (in theory at least) be the same result you'd get from running the dracut command directly from a rescue boot.


FWIW, the mkinitrd command will basically run dracut but it will add an argument to omit the network module which adds about 10megs of useless (in this context) stuff to the initrd.
Comment 12 Fabrice Boyrie 2012-03-01 14:18:19 CET
mkinitrd -v /boot/initrd-3.2.6-desktop-3.mga2.img.new 3.2.6-desktop-3.mga2
ll -h  /boot/initrd-3.2.6-desktop-3.mga2.img*
-rw-r--r-- 1 root root 5,8M févr. 29 18:58 /boot/initrd-3.2.6-desktop-3.mga2.img
-rw-r--r-- 1 root root  17M mars   1 14:10 /boot/initrd-3.2.6-desktop-3.mga2.img.new

If I decompress the img, /sbin/mdadm, /etc/mdadm.conf and the required modules are installed.
  I will try this evening to reboot the computer with this img, to be sure there are no conflict between dmraid and mdadm
Comment 13 Thierry Vignaud 2012-03-01 14:28:14 CET
(In reply to comment #9)
> Now, dracut itself does produce an initrd that ensures that udev is started
> early, but the installer and rescue systems are a little behind here in terms
> of that approach (they need work in that regard).

I'm not aware of any specific issue. What's needed?
Some missing udev rule? Sg else?
 
> I am not sure about the dm vs. md problem tho'.

Both should work
Comment 14 Colin Guthrie 2012-03-01 15:16:06 CET
@Thierry, to be honest I'm not sure... I'm kinda going on a vague and inconclusive bug report about how reusing LVM (rather than recreating them) resulted in an unbootable system (can't find the number off hand, but will post it when I dig it out). Bug #4562.

Now when I wrote the patch, I had it in my head that the installer would be OK, but I'm not so sure now seeing as someone (in the above bug) had a problem relatively recently. They reused and existing LVM which made me think that some vgchange et al commands might have been run prior to udev starting in the installer. If that's the case dracut will have problems if they are reused rather than removed and recreated.

So the current approach just generates a non-hostonly initrd even in the installer. If we can confirm that the installer will not run vgchange and we can trace the problem down a bit better, then we don't have to do the non-hostonly trick (and the installer can set an env var that dracut will look out for to disable the hack). This means that urpmi based upgrades still get the non-hostonly initrd, but fresh installs will not.

It's all a bit complicated dealing with the different scenarios. The installer might be fine. If we test right now tho', the installer should go down the non-hostonly route, but I'll try and do some boot tests to see what metadata is in udevadm database such that I can do some theoretical tests as to whether or not this will work.

Hope that explains the problem, but if not please do ask for further clarification :)
Comment 15 Colin Guthrie 2012-03-01 15:17:59 CET
@Fabrice just thought of a further test you could do too.

When you boot with the new (big) initrd, and assuming of course that it does actually work, can you then generate a nice initrd normally, without any special options? i.e. just "dracut foo.img" and see that it includes all the necessary stuff second time round (after booting with a dracut generated initrd)?

If that doesn't produce a working initrd, then there are still problems we need to address!
Comment 16 Fabrice Boyrie 2012-03-01 18:34:47 CET
Computer reboots correctly with the big initrd
I've relaunched dracut without options (except -f) and it generated a new initrd which works (and even smaller than my own initrd).
Comment 17 Thierry Vignaud 2012-03-01 21:47:35 CET
(In reply to comment #14)
> (...) made me think that some
> vgchange et al commands might have been run prior to udev starting in the
> installer. If that's the case dracut will have problems if they are reused
> rather than removed and recreated.

stage1 knows nothing about LVM/DM/MD (and has no userland either anyway).
The code clearly run udev first thing (well, actually mounting /sys was done first).
As proved by the logs attached to this bug.
Modules are loaded quite some time after.
Same for running dmraid and the like

BTW It looks like we miss libdmraid-events-isw.so from dmraid-event:

The dynamic shared library "libdmraid-events-isw.so" could not be loaded:
    libdmraid-events-isw.so: cannot open shared object file: No such file or directory
Comment 18 Thierry Vignaud 2012-03-01 22:43:01 CET
Colin, we may miss some udev rules, don't you think?
You originally didn't include any when switching the installer to use udev.
I included the ones needed to make LVM work but the odds're high we're missing some for raid.
Comment 19 Thierry Vignaud 2012-03-01 23:00:58 CET
We may also miss other udev rules for other HW support (PCMCIA, ...)

CC: (none) => mageia

Comment 20 Thierry Vignaud 2012-03-02 09:06:42 CET
Pascal, we only install dmraid when we see a dmraid device, shouldn't we install mdadm too?
See install::any::default_packages()

CC: (none) => pterjan

Comment 21 Pascal Terjan 2012-03-02 10:47:24 CET
I have probably missed any thing happening over the last few years.

From what I remember:
- dmraid uses device-mapper (like lvm) to combine devices. It is used for fakeraid created by BIOS.
- mdadm is for md (multiple device) driver, not device-mapper. It is used for soft raid created from Linux.
Comment 22 Pascal Terjan 2012-03-02 11:40:13 CET
OK, so it seems mdadm can handle some Intel metadata, and in this case we should probably use it instead of mdadm.

Maybe first detecting raid arrays with mdadm, then filtering them out from the ones detected by dmraid.
Comment 23 Pascal Terjan 2012-03-02 11:40:47 CET
I mean "instead of dmraid" obviously.
Comment 24 Thomas Backlund 2012-03-02 15:25:26 CET
(In reply to comment #22)
> OK, so it seems mdadm can handle some Intel metadata, and in this case we
> should probably use it instead of mdadm.
> 

Intel has stated that they move all of their fakeraid support to mdadm as it supports more features.

Heinzm has stated that dmraid is now in maintenance mode, so no new features will be added, only bugfixes... (because of that I asked him if he could release a dmraid 1.0.0 final with all the fixes so far)

> Maybe first detecting raid arrays with mdadm, then filtering them out from the
> ones detected by dmraid.

Well, filtering out is easy for intel part....

Anything softraid that starts with isw_* is Intel.


Oh, and current dracut/systemd/kpartx/... since we sync with Fedora assumes anything isw_* is managed by mdadm, wich might explain why I have a lot of problem on my workstation where I so far use dmraid...

So I think we should modify the installer to always use mdadm for isw_*

CC: (none) => tmb

Comment 25 Thierry Vignaud 2012-03-02 16:07:58 CET
Patch welcome. I guess it'll be either you or Pascal :-)
Comment 26 Greg McGee 2012-03-11 07:11:49 CET
I was just about to file a bug to the beta for similar reasons, no NVRAID detection during install.

(clean install from 64 bit DVD) 

array preexisted and still works fine under Windows and in Mageia1 etc.
(multi-boot box) windows and mythtv recording XFS partition.

I found that dmraid and deps were not installed after updating everything, period.
(didn't attempt any troubleshooting until all updates complete, non-essential  filesystem...figured it might be fixed already)

As soon as dmraid was installed, a simple dmraid -ay brought everything online, viewable in MCC and mounted in <30 seconds.

Unless something I don't know about is actively UNinstalling them, I suggest it might be interesting to see if they are MIA from the installer iso.

As the 64bit iso is installer-only, dmraid IS present in the rescue image, but can't really say for certain on install image without another install...

CC: (none) => gjmcgee

Comment 27 Greg McGee 2012-03-11 07:18:16 CET
also found mdadm was not installed. (wasn't sure which was preferred)
Comment 28 Pascal Terjan 2012-03-12 00:28:38 CET
(In reply to comment #26)
> I was just about to file a bug to the beta for similar reasons, no NVRAID
> detection during install.

Please do, this is a different problem than the one reported here.

(In reply to comment #27)
> also found mdadm was not installed. (wasn't sure which was preferred)

Not a preference question, mdadm only handles intel ones not nvidia so you need dmraid.
Comment 29 Marja Van Waes 2012-05-26 13:07:56 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 30 Fabrice Boyrie 2012-05-26 14:10:23 CEST
I'd say the original bug is closed. There are still problems with intel raid ahci, but more difficult to debug. (I've tried with 2 3Tb hard disk and it doesn't works, I fear there are problems with more than 2 Tb. Completely software raid works)
Comment 31 Sander Lepik 2012-05-26 14:34:44 CEST
Can we close this bug then?

CC: (none) => sander.lepik

Comment 32 Fabrice Boyrie 2012-05-26 15:23:16 CEST
Yes.
Sander Lepik 2012-05-26 16:26:37 CEST

Keywords: NEEDINFO => (none)
Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.