Bug 19953

Summary: 4TB unmounted unpartitioned drive makes installer freeze when bootloader is configured, but also X in installed cauldron when kernel is updated.
Product: Mageia Reporter: George Mitchell <george>
Component: RPM PackagesAssignee: Base system maintainers <basesystem>
Status: RESOLVED OLD QA Contact:
Severity: critical    
Priority: Normal CC: davidwhodgins, kde, kernel, marja11, ouaurelien, thierry.vignaud, zen25000
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: lspci output
fdisk output
System log at time of freeze
Syslog output from partially failed kernel upgrade via MCC

Description George Mitchell 2016-12-15 20:56:02 CET
When the network graphic installer attempts to configure the bootloader and encounters an unmounted unpartitioned disk drive, it locks up completely and the system must be powered of to regain control.  At the same time the activity led for the controller managing the unpartitioned drive lights solidly and continuously until the machine is powered down.  The only way I could complete the install was to open the case and power off the offending 4TB hard drive.

The installer affected is the current Cauldron graphic installer.

Simply run the installer on a system with an unpartitioned hard drive.

Additionally, upgrading the kernel on a running Cauldron system has the same effect locking the system up until the KDE session doing the upgrade is killed. Making sure the unpartitioned drive is mounted before doing the upgrade *might* resolve the kernel upgrade problem.  I will post a follow up on that when I get to it.  The control center bootloader configuration program DOES work fine when the unpartitioned drive is mounted.  I have not tried it with the unpartitioned drive not mounted.  In the future I will probably not be continuing to use unpartitioned drives, but this one has worked well for me in many ways.
Comment 1 Marja Van Waes 2016-12-15 23:25:32 CET
Depending on your tests the summary might need to be adjusted again.

I added the size of your HD, because I heard twice in the last weeks about problems Mageia users had with 4TB disks, and I heard no problems about smaller disks

As last step of a kernel update, bootloader-config is run. That is similar to what happens when you configure your bootloader while installing Mageia.

Apart from Grub2, other tools are called (blkid, os-prober) to probe all available disks and partitions, so that all installed OSs will be found and added in the Grub2 boot menu. So the problem does not seem to be with installer, but with a tool.

Can you please *attach* lspci.txt that is the result of: 


    lspcidrake -v > lspci.txt

and also attach fdisk.txt after running, as _root_:

    fdisk -l /dev/sd? > fdisk.txt


(Both from when that 4TB drive is powered on.)

Also, if you could attach the logs from such a kernel upgrade that froze X, that would be great.

Run, again as root, (after adjusting date & time so that your kernel update is included):

journalctl -a --since="2016-12-11 09:00" --until="2016-12-11 10:00" > log.txt


and attach log.txt to this report.

CC: (none) => kernel, marja11, zen25000
Component: Installer => RPM Packages
Summary: Network Graphical Installer Freezes on Encountering Unpartitioned Hard Drive ... => 4TB unmounted unpartitioned drive makes installer freeze when bootloader is configured, but also X in installed cauldron when kernel is updated.
Keywords: (none) => NEEDINFO

Comment 2 George Mitchell 2016-12-16 04:53:20 CET
Created attachment 8784 [details]
lspci output
Comment 3 George Mitchell 2016-12-16 04:54:34 CET
Created attachment 8785 [details]
fdisk output
Comment 4 George Mitchell 2016-12-16 04:55:42 CET
Created attachment 8786 [details]
System log at time of freeze
Comment 5 George Mitchell 2016-12-16 05:37:43 CET
I am noticing from the log that there is an attempt to mount the partitionless drive as part of the process.  That certainly could explain the drive activity light remaining on for an extended period.  Mounting a 4TB volume takes a long time on this system as I am CPU constrained with an old dual core Pentium.  But it doesn't usually freeze the system up, but who knows?  Perhaps I have to try again and leave it to cook for a while and see what happens.  Perhaps I just did not wait long enough.  I don't see any critical errors on the log.
Comment 6 Marja Van Waes 2016-12-16 11:48:35 CET
(In reply to George Mitchell from comment #5)
> I am noticing from the log that there is an attempt to mount the
> partitionless drive as part of the process.  That certainly could explain
> the drive activity light remaining on for an extended period.  Mounting a
> 4TB volume takes a long time on this system as I am CPU constrained with an
> old dual core Pentium.  But it doesn't usually freeze the system up, but who
> knows?  Perhaps I have to try again and leave it to cook for a while and see
> what happens.  Perhaps I just did not wait long enough.  I don't see any
> critical errors on the log.

Well there are 9 messages about 
> The X11 connection broke (error 1). Did the X11 server die?"

The first thing I see after the disk was mounted, are some plasmashell messages. I don't have the slightest idea whether they're related to X freezing. Even if they would be related, they _cannot_ be the cause because you hit this bug in traditional installer, too.

Dec 14 21:16:39 localhost ghmitch[30024]: 50mounted-tests: debug: btrfs volume 534d18b0-fc56-42e6-bfeb-c63b0f0bdc07 mounted
Dec 14 21:16:41 localhost plasmashell[6714]: QFileInfo::absolutePath: Constructed with empty filename
Dec 14 21:17:08 localhost kernel: BTRFS info (device sdi): disk space caching is enabled
Dec 14 21:17:29 localhost plasmashell[6714]: file:///usr/lib64/qt5/qml/QtQuick/Controls/Button.qml:96: TypeError: Cannot read property of null
Dec 14 21:17:30 localhost plasmashell[6714]: QXcbConnection: XCB error: 2 (BadValue), sequence: 12014, resource id: 81794524, major code: 141 (Unknown), minor code: 3

later there's (skipping some lines here and there):

Dec 14 21:17:38 localhost pulseaudio[5978]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"

Dec 14 21:17:38 localhost kdeinit5[6058]: kdeinit5: Fatal IO error: client killed

Dec 14 21:17:38 localhost org.a11y.atspi.Registry[6816]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"

Dec 14 21:17:38 localhost klauncher[6059]: The X11 connection broke (error 1). Did the X11 server die?

Dec 14 21:17:38 localhost pulseaudio[5978]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"

Dec 14 21:17:38 localhost kdeinit5[6058]: kdeinit5: Fatal IO error: client killed

Dec 14 21:17:38 localhost org.a11y.atspi.Registry[6816]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"

Dec 14 21:17:38 localhost klauncher[6059]: The X11 connection broke (error 1). Did the X11 server die?

Guessing the problem is at basesystem level, and assigning accordingly

Keywords: NEEDINFO => (none)
Assignee: bugsquad => basesystem
CC: (none) => kde

Comment 7 Marja Van Waes 2016-12-16 11:50:55 CET
Oops, seems I pasted the last log lines twice, sorry about that!
Comment 8 George Mitchell 2016-12-16 16:32:07 CET
I will admit that I am seeing a lot of instability with the plasma desktop and it is freezing up periodically and randomly.  But when it happened with the installer there was no plasma desktop involved.  So they are clearly two different problems.  I am feeling that neither the installer nor the drakboot nor the package manager should be trying to mount volumes without the users explicit permission.  The only reason the volume is being mounted is to try to discover if there are OS's on it.  The user should have the option of skipping that step because it may be unnecessary and can lead to trouble in some cases.
Comment 9 George Mitchell 2016-12-16 16:39:43 CET
I will follow up with more comments as I continue to learn more about what is going on with this.
Comment 10 Marja Van Waes 2016-12-16 17:40:53 CET
(In reply to George Mitchell from comment #8)
> But when it happened with
> the installer there was no plasma desktop involved.  So they are clearly two
> different problems.  

Well, in both cases X freezes when grub2 is configured, so I'm not yet convinced they're different issues. 

Could you please redo the installation with that disk powered on and unmounted and try, after the freeze, whether you can still switch to tty2 with "Ctrl+Alt+F2" ?

If that succeeds, then please attach a USB key and type 

      bug

That'll write report.bug to the USB key


If you cannot switch to tty2, then fetch /root/drakx/ddebug.log from the root partition you were installing to, before reusing that partition

You'll probably need to compress report.bug or ddebug.log. Please do that with xz:

   xz report.bug

or 

  xz ddebug.log


and attach report.bug.xz or ddebug.log.xz to this bug report.

(This bug report will be cloned if they are really two different bugs)

CC: (none) => thierry.vignaud

Comment 11 Marja Van Waes 2016-12-16 17:42:58 CET
(In reply to George Mitchell from comment #8)
> I am feeling that neither the installer nor the
> drakboot nor the package manager should be trying to mount volumes without
> the users explicit permission.  The only reason the volume is being mounted
> is to try to discover if there are OS's on it.  The user should have the
> option of skipping that step because it may be unnecessary and can lead to
> trouble in some cases.

You can file an enhancement request for that.
Comment 12 Marja Van Waes 2016-12-16 17:50:49 CET
(In reply to Marja van Waes from comment #10)

> 
> Could you please redo the installation with that disk powered on and
> unmounted and try, after the freeze, whether you can still switch to tty2
> with "Ctrl+Alt+F2" ?
> 

Don't do a hard poweroff if that succeeds. When you're done writing report.bug to the USB key, you can use "Alt + Ctrl + Del" to reboot.

Or use Alt + SysRq , keep those two keys pressed, and very slowly type the sequence:     
                         R S E I U O
to poweroff cleanly.
Comment 13 Dave Hodgins 2016-12-16 17:59:53 CET
Given that mkfs can be run on a device, rather then a partition, I'm curious,
is the drive actually blank or has it been formatted previously, without a
partition table? I.E. does
dd if=/dev/sdi bs=512 count=1|od -x
show anything other then zeroes for the content?

CC: (none) => davidwhodgins

Comment 14 Pascal Terjan 2016-12-16 18:00:05 CET
(In reply to Marja van Waes from comment #11)
> (In reply to George Mitchell from comment #8)
> > I am feeling that neither the installer nor the
> > drakboot nor the package manager should be trying to mount volumes without
> > the users explicit permission.  The only reason the volume is being mounted
> > is to try to discover if there are OS's on it.  The user should have the
> > option of skipping that step because it may be unnecessary and can lead to
> > trouble in some cases.
> 
> You can file an enhancement request for that.

Such option was actually already added I believe https://bugs.mageia.org/show_bug.cgi?id=18538
Comment 15 George Mitchell 2016-12-17 05:43:18 CET
Whoa.  That is a huge request.  This installation took me three or four days to do and I really don't have the time to do it again. The network install is tedious because the Cauldron repository is constantly being updated and the installation breaks every time it hits an updated package and has to be manually restarted again with the existing installed packages being upgraded first.  But what I WOULD like to is install a very simple desktop like xfce4 and try the kernel upgrade from there and see what happens.  When I do that, I will attach the syslog data just as I did previously with Plasma out of the picture.


(In reply to Marja van Waes from comment #10)
> (In reply to George Mitchell from comment #8)
> > But when it happened with
> > the installer there was no plasma desktop involved.  So they are clearly two
> > different problems.  
> 
> Well, in both cases X freezes when grub2 is configured, so I'm not yet
> convinced they're different issues. 
> 
> Could you please redo the installation with that disk powered on and
> unmounted and try, after the freeze, whether you can still switch to tty2
> with "Ctrl+Alt+F2" ?
> 
> If that succeeds, then please attach a USB key and type 
> 
>       bug
> 
> That'll write report.bug to the USB key
> 
> 
> If you cannot switch to tty2, then fetch /root/drakx/ddebug.log from the
> root partition you were installing to, before reusing that partition
> 
> You'll probably need to compress report.bug or ddebug.log. Please do that
> with xz:
> 
>    xz report.bug
> 
> or 
> 
>   xz ddebug.log
> 
> 
> and attach report.bug.xz or ddebug.log.xz to this bug report.
> 
> (This bug report will be cloned if they are really two different bugs)
Comment 16 George Mitchell 2016-12-17 05:46:17 CET
(In reply to Dave Hodgins from comment #13)
> Given that mkfs can be run on a device, rather then a partition, I'm curious,
> is the drive actually blank or has it been formatted previously, without a
> partition table? I.E. does
> dd if=/dev/sdi bs=512 count=1|od -x
> show anything other then zeroes for the content?

The drive is formatted btrfs with multiple btrfs volumes.  Mounting the drive takes forever, but mounting individual volumes on the drive is significantly faster.  I rarely mount the whole drive.
Comment 17 George Mitchell 2016-12-17 05:51:15 CET
(In reply to George Mitchell from comment #16)
> (In reply to Dave Hodgins from comment #13)
> > Given that mkfs can be run on a device, rather then a partition, I'm curious,
> > is the drive actually blank or has it been formatted previously, without a
> > partition table? I.E. does
> > dd if=/dev/sdi bs=512 count=1|od -x
> > show anything other then zeroes for the content?
> 
> The drive is formatted btrfs with multiple btrfs volumes.  Mounting the
> drive takes forever, but mounting individual volumes on the drive is
> significantly faster.  I rarely mount the whole drive.

And blkid should be able to tell the installer that.
Comment 18 George Mitchell 2016-12-17 05:58:18 CET
(In reply to Pascal Terjan from comment #14)
> (In reply to Marja van Waes from comment #11)
> > (In reply to George Mitchell from comment #8)
> > > I am feeling that neither the installer nor the
> > > drakboot nor the package manager should be trying to mount volumes without
> > > the users explicit permission.  The only reason the volume is being mounted
> > > is to try to discover if there are OS's on it.  The user should have the
> > > option of skipping that step because it may be unnecessary and can lead to
> > > trouble in some cases.
> > 
> > You can file an enhancement request for that.
> 
> Such option was actually already added I believe
> https://bugs.mageia.org/show_bug.cgi?id=18538

Is that the check box determining whether os-prober gets run or not?  If so, I did not understand at the time that running os-prober would attempt to mount unmounted that were not partitioned, so my bad on that one if that is the case.  Now I know.  What I am really wondering and want to sort out at this point is whether this was really a malfunction or whether the system seemed frozen simply because the mount process was eating all my CPU time along with Plasma desktop.  I know that mounting that 4TB volume in one chunk is extremely CPU intensive on my system.
Comment 19 George Mitchell 2016-12-22 16:25:02 CET
Created attachment 8811 [details]
Syslog output from partially failed kernel upgrade via MCC

Today a new kernel upgrade came out for Cauldren.  I did the install via MCC on XFCE4 rather than buggy Plasma.  If finally completed without hanging but took forever.  On XFCE4 it requested specific permission to mount volumes which I provided. Everything went normally until it hit the unpartitioned 4TB drive.  At that point everything seemed to stop but I just waited and waited.  The MCC installer finally came back to life and completed normally with a message to reboot for new kernel.  BUT the OS prober continued to be hung on /dev/sdi the unpartitioned drive and just kept going until I finally manually killed it and shut the system down.  The attached document contains the whole syslog output for the complete time period.
Comment 20 Aurelien Oudelet 2020-09-09 21:54:03 CEST
os-prober should not bug on unpartitioned drive. 
Therefore, there is 4 years since last comment.

We are sorry, but we no longer maintains this version of Mageia. Please upgrade to the latest version and reopen this bug against that version if this bug exists there.
As a result we are setting this bug to RESOLVED:OLD

Resolution: (none) => OLD
Status: NEW => RESOLVED
CC: (none) => ouaurelien