18666 – [GPT] It enters emergency mode after a reboot. Netinstall of June 8th

Bug 18666 - [GPT] It enters emergency mode after a reboot. Netinstall of June 8th

Summary: [GPT] It enters emergency mode after a reboot. Netinstall of June 8th

Status:	RESOLVED FIXED

Alias:	None

Product:	Mageia
Classification:	Unclassified
Component:	Installer (show other bugs)
Version:	Cauldron
Hardware:	x86_64 Linux

Priority:	High Severity: critical
Target Milestone:	---
Assignee:	Thierry Vignaud
QA Contact:

URL:
Whiteboard:
Keywords:	NEEDINFO, PATCH

Duplicates (1):	18876 (view as bug list)
Depends on:
Blocks:

Reported:	2016-06-09 12:47 CEST by Bjarne Thomsen
Modified:	2016-07-31 17:31 CEST (History)
CC List:	11 users (show)

See Also:	17796
Source RPM:	parted, drakx-installer-stage2
CVE:
Status comment:

Attachments
Output from: journalctl -xb (115.88 KB, text/plain) 2016-06-09 12:47 CEST, Bjarne Thomsen	Details
/root/drakx/report.bug.xz (149.12 KB, application/x-xz) 2016-06-10 13:15 CEST, Bjarne Thomsen	Details
/root/drakx/report.bug.xz for an automatic install (203.74 KB, application/x-xz) 2016-06-16 13:00 CEST, Bjarne Thomsen	Details
Output from: journalctl -xb (135.21 KB, text/plain) 2016-06-16 18:37 CEST, Bjarne Thomsen	Details
prevent using the last 33 sectors (468 bytes, patch) 2016-07-05 15:50 CEST, Thierry Vignaud	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Description Bjarne Thomsen 2016-06-09 12:47:08 CEST

Created attachment 7954 [details]
Output from: journalctl -xb

After the netinstall ISO of June 8th (x86_64) a re-boot enters emergency mode. The /home directory does not exist. The only option is a re-boot, but that
enters the same state.

Comment 1 Marja Van Waes 2016-06-10 11:51:24 CEST

(In reply to Bjarne Thomsen from comment #0)
> Created attachment 7954 [details]
> Output from: journalctl -xb
> 
> After the netinstall ISO of June 8th (x86_64) a re-boot enters emergency
> mode. The /home directory does not exist. The only option is a re-boot, but
> that
> enters the same state.

Can you attach /root/drakx/report.bug.xz of that install, please?

Keywords: (none) => NEEDINFO
CC: (none) => marja11
Component: Release (media or process) => Installer
Assignee: bugsquad => thierry.vignaud

Comment 2 Bjarne Thomsen 2016-06-10 12:19:54 CEST

It is now clear that something went wrong with sda4 during auto allocation.
I have just done a manual partitioning and it is right now installing
packages. Let us see what happens.

Comment 3 Bjarne Thomsen 2016-06-10 13:15:11 CEST

Created attachment 7965 [details]
/root/drakx/report.bug.xz

The graphics driver does not work.

CC: (none) => bjarne.thomsen

Comment 4 Bjarne Thomsen 2016-06-10 13:21:40 CEST

But the manual disk partitioning worked for /dev/sda4 when formatted
with btrfs. Auto allocation gave a badly formatted superblock on sda4,
or a corrupted partition table. Is there something else I can try?
Like manual formatting with ext4?

Comment 5 Bjarne Thomsen 2016-06-10 16:49:59 CEST

I made a manual partitioning of the disk. But this time /dev/sda4 vas
formatted as ext4, and a re-boot finished in emergency mode.
Furthermore fsck -t ext4 gave this message:

The filesystem size (according to the superblock) is 18760054
The physical size of the device is 18760049

This did not happen when the same partition was formatted as btrfs.

Comment 6 Bjarne Thomsen 2016-06-15 23:25:05 CEST

This bug is still in netinstall of June 12. I manually decreased /dev/sda4 by
a couple of sectors, and now it booted. At the same time I selected the
xfce desktop. This time I could login. Then I installed both Mate an
KDE. Mate worked as expected. I presume that the KDE desktop in this version
is without menues, apart from a run-field to be selected by right clicking?

Comment 7 Thierry Vignaud 2016-06-16 10:28:03 CEST

I failed to see how this is an installer bug: the logs clearly shows that the fs already exceeded the partition size at the beginning of install
Then you manually format sda4 as btrfs.
However, the journal logs you provided shows an issue with a ex4t formatted sda4, that doesn't match your installation logs.
Please attach the older report.bug.xz in /root/drakx

Comment 8 Bjarne Thomsen 2016-06-16 13:00:36 CEST

Created attachment 8002 [details]
/root/drakx/report.bug.xz for an automatic install

I reported the bug as a system bug for the netinstall iso.
It actually looks as if the installer mounts /home on /dev/sda4
It is still unclear to me why it fails on the first boot after the install.

Comment 9 Thierry Vignaud 2016-06-16 15:16:16 CEST

I don't understand:

1) fsck told you the device was 18760049 4kblocks
That's 150080392 sectors.

2)fdisk ouput after partitioning shows the /dev/sda4 partition to be 150080398 sectors (234441614-84361216), which is 6 sectors (aka 2.5kb) bigger than the size reported by fsck.
The result is the same at end of installation.

So that should be OK, though there shouldn't be such a discrepancy when rebooting.
What's more, kernel did succeed mounting the fs while in install.
Are you sure you didn't do anything between install & reboot?

Especially, blkid reports the fs UUID to be  "07c433f9-274e-4530-8209-b4910402ade0" after formatting.
Whereas your journal shows blkid to report "070945ebf-5df4-4861-969f-df4eec6ec758"

Actually, none of the fs UUIDs in your journal logs match the UUIDs set in /etc/fstab by the installer, which suggest that the report.bug.xz is not the right one...

Comment 10 Bjarne Thomsen 2016-06-16 18:14:45 CEST

I was careful not to do anything but to click on reboot.
I did not attach a new journal log after this install from the
the new install ISO (12/6 insted of 8/6). It is still sitting as root.
Should I attach the current journal log?
Could there be something wrong with the ISO image? I did check the
sha512sum.

Comment 11 Thierry Vignaud 2016-06-16 18:16:22 CEST

Yes.
I would like to see the journal logs matching latest report.bug.xz

Comment 12 Bjarne Thomsen 2016-06-16 18:37:29 CEST

Created attachment 8008 [details]
Output from: journalctl -xb

This is the current journal log.

Thierry Vignaud 2016-06-17 14:11:54 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=17796

Thierry Vignaud 2016-06-17 14:15:51 CEST

Summary: It enters emergency mode after a reboot. Netinstall of June 8th => [GPT] It enters emergency mode after a reboot. Netinstall of June 8th

Comment 13 Bjarne Thomsen 2016-06-17 17:01:27 CEST

Interesting. I took the old Magei5 install DVD and installed from that.
It has the same problem with /dev/sda4 when I do an automatic install.
I have been running mga5 on this Intel Brix PC. I must have been using
btrfs on /dev/sda4.

Comment 14 Thierry Vignaud 2016-06-24 13:41:27 CEST

Can you try with drakx v17.45 when it lands on your favorite mirror?
You can do a net install using install/images/boot-nonfree.iso

You can check stage2 is up to date on your mirror by checking install/stage2/VERSION
eg:
http://distrib-coffee.ipsl.jussieu.fr/pub/linux/Mageia/distrib/cauldron/x86_64/install/images/boot-nonfree.iso
http://distrib-coffee.ipsl.jussieu.fr/pub/linux/Mageia/distrib/cauldron/x86_64/install/stage2/VERSION

Comment 15 Thierry Vignaud 2016-06-24 18:25:12 CEST

Just hinting on the autoinst keyword, I'm looking at your auto_inst.cfg.pl:
why do you provide _both_ :
- partitioning with autoallocate & clearall
- partitions with manually created partition

I would disable the autoallocate but that's harmless I think

That's said, we got:
* tell kernel add (sda 4 84361216 150080432) force_reboot= rebootNeeded=
(...)
/dev/sda4  84361216 234441614 150080399 71.6G Linux filesystem

That's the only fs that got such a difference.

I wonder what's going on with libparted.
I don't understand how mkfs.ext4 can format a fs bigger than the containing partition???

Can you try lowering the sda4 size in your auto_inst.cfg.pl?

Comment 16 Bjarne Thomsen 2016-06-24 21:02:59 CEST

I have forgotten the details, but I started the manual modification
when the default choice failed. I never discovered this problem
with mga5, as I used btrfs for /dev/sda4.
I did try to reduce the partition by a block or two. Then it worked.
Where is this auto_inst.cfg.pl located?
Is thet something I can do during installation?
Actually, your link
http://distrib-coffee.ipsl.jussieu.fr/pub/linux/Mageia/distrib/cauldron/x86_64/install/images/boot-nonfree.iso
is identical to the ISO:
Mageia-Cauldron-netinstall-nonfree-x86_64.iso
that I used (according to cmp).

Comment 17 Thierry Vignaud 2016-06-24 21:26:42 CEST

In comment #8, you said you did an automatic installation (which means using an auto_inst.cfg.pl file).
It looks like you meant something else then?
It really was a manual installation?

As for identical boot.iso, this is just my somewhat standard answer so that people can get latest stage1 (boot*.iso) & can check the actual stage2 version (which is way more important) so that they don't pick an outdated mirror.

Comment 18 Bjarne Thomsen 2016-06-24 22:07:01 CEST

I have now reconstructed what happened. As you know, I had been using btrfs for
/dev/sda4 mounted on /home in mga5. The installer asks me:
Use existing partitions?
I would like to try ext4 for /home, so the logical thing to do was
Erase and use entire disk?
I used to do that before UEFI. But now I get the counter question:
You MUST have a ESP FAT32 partition mounted in /boot/EFI!
After an erase? The only remaining option is
Custom disk partitioning?

I clear all partitions.
Create an ESP with mount point /boot/EFI
Create an ext4 partition with mount /
Create swap
Create an ext4 partition from the remaining space with mount point /home
I am now installing packages.

Comment 19 Bjarne Thomsen 2016-06-24 22:35:38 CEST

The same thing happened again. My standard solution is to use btrfs
or to reduce the size by a block, if ext4 is going to be used. But why?

Comment 20 Thomas Backlund 2016-06-24 22:41:35 CEST

Could be a rounding bug in the new e2fsprogs

CC: (none) => tmb

Thierry Vignaud 2016-06-24 22:54:55 CEST

Source RPM: (none) => e2fsprogs?

Comment 21 Thierry Vignaud 2016-06-24 23:00:51 CEST

But that would show up earlier, when mounting in installer, wouldn't it?
Unless mount doesn't complain but fsck does?

Comment 22 Bjarne Thomsen 2016-06-24 23:45:08 CEST

I am using the sliding bar in the installer. all the way to the right.
The unit is MB. I just reduced it by 1 unit.
At least it reproduces: Now it mounts sda4.

Comment 23 Thierry Vignaud 2016-06-25 07:41:30 CEST

e2fsprogs maintainers failed to register so he asked me to post this on his behalf:
" Hmmm.....  I can't replicate anything like this:

# dc
18760049 4096 *
p
76841160704
# truncate -s  76841160704 /u1/looptest.img
# losetup /dev/loop0 /u1/looptest.img
# /build/e2fsprogs/lib/ext2fs/tst_getsize /dev/loop0 4096
/dev/loop0 is device has 75040196 blocks.
# dc
75040196 4 /
p
18760049
q
# mke2fs -t ext4 /dev/loop0
mke2fs 1.43.1 (08-Jun-2016)
Discarding device blocks: done
Creating filesystem with 18760049 4k blocks and 4694016 inodes
Filesystem UUID: c9353cb2-d327-408c-b0d3-aebec0ce9eda
Superblock backups stored on blocks:
           32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
                  4096000, 7962624, 11239424

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

# e2fsck -fy /dev/loop0
e2fsck 1.43.1 (08-Jun-2016)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop0: 11/4694016 files (0.0% non-contiguous), 340738/18760049 blocks

So it looks like it's all working correctly....  This is 1.43.1 and
not 1.43, but I'm pretty sure this wasn't in 1.43, either.  And you
really want to get 1.43.1 to get a bunch of other bug fixes:

   http://e2fsprogs.sourceforge.net/e2fsprogs-release.html#1.43.1"

Comment 24 Thierry Vignaud 2016-06-25 07:41:57 CEST

BTW I forgot to tell him we really are running 1.43.1

Comment 25 Bjarne Thomsen 2016-06-25 10:25:10 CEST

I tried mkfs -t ext4 /dev/sda4
at the root prompt after the failed mount of /dev/sda4
This file system has the size of the physical partition.
I can now mount this (empty) ext4 file system.
mkfs works in the installer and it works after the boot
into the system. But what happens with the fs on the way?

Comment 26 Bjarne Thomsen 2016-06-25 12:21:51 CEST

Very interesting. I had the idea not to boot my disk after an install.
Instead I went directly into rescue mode on the install CD.
Now the ext4 fs on /dev/sda4 is clean, and I can mount the partition.
After a reboot it booted correctly from my disk.
I have a strong suspicion that /dev/sda4 was not correctly unmounted.

I have developed a habit of removing the USB DVD drive when all
partitions have been unmounted.
Should I leave the DVD drive in until after the boot from my disk?
I am trying to do that after yet another install.

Comment 27 Bjarne Thomsen 2016-06-25 13:14:59 CEST

I was wrong! The problem is still present when I leave the DVD drive attached.
Why does it work when I boot directly into rescue mode after an install?

Comment 28 Bjarne Thomsen 2016-06-25 15:21:09 CEST

It does not reproduce! Only once out of 3 install was the ext4 fs
in /dev/sda4 clean when I booted directly into rescue mode after the install.

Comment 29 Theodore Ts'o 2016-06-25 18:38:38 CEST

Copying in an e-mail message I just sent to Thierry, now that identity.mageia.org seems to be working properly (before it was telling me "come back later").

On Sat, Jun 25, 2016 at 07:43:27AM +0200, Thierry Vignaud wrote:
>
> Yep but that's with a loopback file.
> Couldn't there be different code paths hit in such a case versus a partition?

Well, in both cases it's a block device.  The key in any case would be
in how we detect the size of the partition.  And the ioctl to get a
block device size is a standard one.

It's worth trying to build tst_getsize, which is a test program in
lib/ext2fs, and see what it returns.  But the key though is that this
is the same function which gets used for mke2fs and e2fsck.

So if e2fsck is complaining that the size of the device is different
from the size of the file system, my guess is that either (a) the
device driver is a little buggy, and is returning different values at
different times --- for example, could it be that the version of the
kernel used in the installer is different from the version of the
kernel used when e2fsck is run?  Or, (b) the installer is explicitly
giving the size of the file system to mke2fs, and then using the -F
flag to prevent mke2fs from kvetching.

In other words, the exact same check is in mke2fs:

# mke2fs -t ext4 /dev/loop0 -b 4096 18760054 < /dev/null
mke2fs 1.43.1 (08-Jun-2016)
mke2fs: Filesystem larger than apparent device size.
Proceed anyway? (y,n)
# echo $?
1

So it will abort if stdin isn't available so the user can tell mke2fs
to proceed and do something insane.

# mke2fs -F -t ext4 /dev/loop0 -b 4096 18760054 < /dev/null
mke2fs 1.43.1 (08-Jun-2016)
Discarding device blocks: failed - Invalid argument
Creating filesystem with 18760054 4k blocks and 4694016 inodes
Filesystem UUID: 21e42521-9622-42f2-846e-44f52b8205bd
Superblock backups stored on blocks:
           32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
                  4096000, 7962624, 11239424

Allocating group tables: done
mke2fs: Attempt to write block to filesystem resulted in short write while zeroing block 18760038 at end of filesystem
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

# e2fsck /dev/loop0
e2fsck 1.43.1 (08-Jun-2016)
The filesystem size (according to the superblock) is 18760054 blocks
The physical size of the device is 18760049 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? cancelled!

/dev/loop0: ********** WARNING: Filesystem still has errors **********

As you can see mke2fs does the same check, using the exact same
library function found in ext2fs_get_device_size2().  The tst_getsize
function is simply a thin wrapper around this to query a device and
print the size in 1k blocks.

Something else you might try doing is to use blockdev --getsize64:

# blockdev  --getsize64 /dev/loop0
76841160704

This is an even thinner wrapper which only tries using the ioctl to
determine the device size.  (This is why I used a loop device; it's
the closest simulation to a partition).  ext2fs_get_device_size2()
will fall back to using a binary search if the ioctl is not available
(it will seek to successfully larger and smaller sizes and see if a
read of one byte returns an error), but in practice for Linux systems
the BLKGETSIZE64 ioctl should always be present for all block devices.

So the first think I would suggest is to try using mke2fs and e2fsck
directly, *without* getting the installer involved.  I'm almost 100%
sure the problem is either with the installer kernel or with the
installer scripts.  Failing that, there is almost certainly something
weird going on with that device, and I'd want to know a lot more about
the specific block device.  e.g., what does hdparm -i /dev/sdX,
blockdev --getpbsz /dev/sdX, blockdev --getalignoff /dev/sdX, blockdev
--getss /dev/sdX, blockdev --getiomin /dev/sdX, blockdev --getioopt
/dev/sdX, etc., return.

                                                - Ted

CC: (none) => tytso

Comment 30 Theodore Ts'o 2016-06-25 18:42:31 CEST

I see you've already tried running mke2fs manually it works when you're not using the install/rescue CD.  As I said, I'm almost sure the problem is with your distro install/rescue  kernel, and/or your installer scripts.

From comments 25-28, it sounds like the installer is doing something ***really*** weird.   I have to ask.  If you use some other installer for some other distro, (e.g., Debian) --- can you reproduce this at all?   That way maybe we can get some hints as to whether it is a hardware or installer specific problem.

Comment 31 Bjarne Thomsen 2016-06-25 20:23:21 CEST

I am installing ubuntu 14.04, and I have some questions about the mageia installer:
My disk is paritioned in this peculiar way:
empty space              1 MB
/dev/sda1   efi        313 MB
empty space              1 MB
/dev/sda2   ext4     47427 MB
empty space              1 MB
/dev/sda3   swap
empty space              1 MB
/dev/sda4   ext4     68095 MB
empty space              0 MB

What are these 1 MB empty spaces doing? NOTE: there is a 0 MB
empty space after /dev/sda4.

I am right now installing ubuntu with the very same partitions.

Comment 32 Bjarne Thomsen 2016-06-25 20:32:30 CEST

Correction: ubuntu 16.04. It works with ubuntu.

Comment 33 Bjarne Thomsen 2016-06-25 22:32:53 CEST

Starting from scratch in ubundu, I optained
1MB empty space in front of the efi partition.
Nothing between the remaining partition, and finally
0MB of empty space after /dev/sda4.

Maybe I should try Fedora 24, if it is out.

Comment 34 Theodore Ts'o 2016-06-25 22:59:19 CEST

The 1MB empty spaces are because by default the GPT (aka EFI) partition table aligns all partitions to 1MB (2048 512-byte sectors) boundaries. So whatever partitioning system that Mageia is using is doing something rather.... unusual if it is not creating file systems that are padded out to the full aligned sizes of the partitions. In fact, because the partition boundaries are aligned to 1MB (2048 sector) boundaries, normally most distro installers / partition utilities will create the partitions so they are multiples of 1MB. So the fact that you had file systems that were these odd numbers: 18760049 4k-blocks is part of the tip-off that something strange is going on.

In fact, I suspect what Mageia is doing is that it is creating the partition table with one set of sizes, and then later on, the partitions are getting resized a second time. This is happeneing either after the file system is mkfs'ed, or after the root file system has been mounted and keeping the partitions busy, such that when the partition ta1ble is modified, the kernel is reporting EBUSY. Some partition table utilities will report this:

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

If you then try creating a file system, it will be created using the old parition table, since the old partition table is still in effect. This means that the BLKGETSIZE64 ioctl will return the old partition sizes (for example). But then after you reboot, the new partition table, with the new starting offsets and new partition lengths, will be in effect, and Much Hilarity will then follow.

Barry Jackson 2016-06-25 23:53:53 CEST

CC: (none) => zen25000

Comment 35 Bjarne Thomsen 2016-06-26 00:26:31 CEST

I have now installed Fedora Workstation 24 with the same number of partitions
without any problems. It runs quite nicely.

But it is still strange that it only happens on this particular
hardware: a BIGABYTE BRIX Pro (Ultra Compact PC Kit).
Also, the problem dissapeared when I reduced the last partition by 1 MB,
and I had no problems with a btrfs on the last partition witout reducing
the size.

Comment 36 Theodore Ts'o 2016-06-26 02:21:35 CEST

Note that btrfs won't **notice** if the file system size is larger than the partition size.   People can draw their own conclusions about the maturity of btrfs and btrfs-progs from this observation.

(This reminds me back in the early days, when people were debating over whether xiafs and ext2fs was the better file system.   The argument was made that xiafs was the better file system because ext2fs was reporting consistency problems.  Turns out both file systems had the same bug, but ext2fs had enough checks that it was vocal when it noticed a discrepancy, while xiafs was silently corrupting file systems... and yet people assumed xiafs was more mature because it wasn't throwing errors.  :-)


# /bin/rm /u1/looptest.img ; truncate --size 134217728 /u1/looptest.img
# losetup /dev/loop0 /u1/looptest.img 
# blockdev --getsize64 /dev/loop0
134217728
# mkfs.btrfs /dev/loop0 
btrfs-progs v4.5.2
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM (128.00MiB) ...
Label:              (null)
UUID:               0bef556d-3353-4850-8357-fa2fb2206ea6
Node size:          16384
Sector size:        4096
Filesystem size:    128.00MiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP              40.00MiB
  System:           DUP              12.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1   128.00MiB  /dev/loop0

# btrfs check /dev/loop0
Checking filesystem on /dev/loop0
UUID: 0bef556d-3353-4850-8357-fa2fb2206ea6
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 114688 bytes used err is 0
total csum bytes: 0
total tree bytes: 114688
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 109471
file data blocks allocated: 0
 referenced 0
# losetup -d /dev/loop0
# truncate --size 125829120 /u1/looptest.img
# losetup /dev/loop0 /u1/looptest.img 
# blockdev --getsize64 /dev/loop0
125829120
# btrfs check /dev/loop0
Checking filesystem on /dev/loop0
UUID: 0bef556d-3353-4850-8357-fa2fb2206ea6
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 114688 bytes used err is 0
total csum bytes: 0
total tree bytes: 114688
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 109471
file data blocks allocated: 0
 referenced 0
#

Comment 37 Bjarne Thomsen 2016-06-26 12:02:35 CEST

For now I am installing mga6 in the partitions created as default by ubuntu.
ubuntu uses one big ext4 partition for /
The install works fine.

Comment 38 Bjarne Thomsen 2016-06-26 15:09:30 CEST

I had to do it. I partitioned sda with the ubuntu installer to have
the same number of partitions (4) as I created using the Mageia installer.
Then I instructed the Mageia installer to use the existing partitions.
After installation of Mageia6 I booted without any problems.

Comment 39 Thierry Vignaud 2016-06-26 20:04:56 CEST

(In reply to Theodore Ts'o from comment #29)
The command we run is the attached logs.:

* running: mkfs.ext4 -F -m 0 /dev/sda4

The fdisk call before formatting shows:
/dev/sda4  84361216 234441614 150080399 71.6G Linux filesystem

The fdisk call at end of installation still shows:
/dev/sda4  84361216 234441614 150080399 71.6G Linux filesystem

(In reply to Theodore Ts'o from comment #36)
@Pascal, @Anssi: it would be cleaner to round up end of partitions to 1Mb boundaries on GTP disks too, not just their start point.

CC: (none) => anssi.hannula, pterjan

Comment 40 Thierry Vignaud 2016-06-26 21:49:13 CEST

(In reply to Theodore Ts'o from comment #29)
The installer uses the same kernel as the one that'll be installed.
(except for certain cases where we'll install a special kernel flavor for eg: servers, ...). But in the generic case, this is true.

(In reply to Theodore Ts'o from comment #34)
> In fact, I suspect what Mageia is doing is that it is creating the partition
> table with one set of sizes, and then later on, the partitions are getting
> resized a second time.  This is happeneing either after the file system is
> mkfs'ed, or after the  root file system has been mounted and keeping the
> partitions busy, such that when the partition ta1ble is modified, the kernel
> is reporting EBUSY.  Some partition table utilities will report this:
> 
> Command (m for help): w
> The partition table has been altered!
> 
> Calling ioctl() to re-read partition table.
> 
> WARNING: Re-reading the partition table failed with error 16: Device or
> resource busy.
> The kernel still uses the old table. The new table will be used at
> the next reboot or after you run partprobe(8) or kpartx(8)
> Syncing disks.

Our installer chokes when it fails to tell the kernel to reread the partition table/ In such cases, it asks to reboot the machine.

Comment 41 Theodore Ts'o 2016-06-27 00:32:52 CEST

All I can say is that if you can get a reliable repro for this problem, it would be good to have a version of the installer which runs blockdev --getsize64 /dev/sdX *just before* mke2fs is run.   The fact that the bug reporter isn't able to reliably reproduce the problem, but is seeing that that it reproduces in a flaky fashion, makes it ***extremely*** unlikely that it is a mke2fs problem with respect to rounding.

So if you are sure it is an e2fsprogs problem, please give me a reliable repro which calls e2fsprogs in isolation, since I really don't have the time to debug mageia's installation scripts.

I really don't think there will be any difference between a partition and loopback device (unless the partition table is failing to be reread), but if you are so certain it's e2fsprogs and my demonstration that it works just fine with a loopback device, then you should be able to construct a reliable repro using a partitioning utility.  Give me the steps for a reliable repro, and I'll be happy to take it from there.

Thierry Vignaud 2016-07-01 16:07:16 CEST

Priority: Normal => High
Severity: normal => critical

Comment 42 Max Perl 2016-07-01 17:01:28 CEST

Hello all,
I tried to install Mageia 6 with the new sta1 DVD and got the same error as Bjarne. Therefore I can confirm the bug and that the bug is still unsolved in the new stabilisation DVD...

CC: (none) => max.augsburg

Comment 43 Bjarne Thomsen 2016-07-03 21:51:25 CEST

Yes, indeed. I deleted and created sda4 during an install from the netinstal iso of July 1st. I did it twice and both times the reboot went into emergency.
Three times I booted into recovery and made a mkfs.ext4 /dev/sda4, followed by
an install to existing partitions. All 3 times the reboot after install worked.
The problem is related to partitioning followed by mkfs.ext4.
What if it is done from the live DVD?

Comment 44 Bjarne Thomsen 2016-07-05 14:27:05 CEST

I tried GParted-live (amd64) from 3 weeks ago.
This Gnome Gui from debian had no problems partitioning and formatting /dev/sda4.
The mga6-netinstall-nonfree og July 5 worked well while installing on top of
these existing partitions.

Comment 45 Arne Spiegelhauer 2016-07-05 14:56:32 CEST

With the live PLASMA5 DVD, the same thing happens, however here it is easy to add a few debug printouts.
What seems to happen is this:
Installer decides to allocate all remaining sectors for the home partition. This is a problem since on a GPT disk, the last 33 sectors are used for a backup partition table.
c::disk_add_partition(), however, silently reduces the partition size by 33 sectors while the kernel is told (and apparently accepts) the full size.
The partition is then formatted according to the kernel values before these are refreshed for other reasons.
The following results, obtained at the point where the installer waits for confirmation for removing packages, is consistent with the above scenario:


blockdev --getsz reports the full partition size

fdisk warns about corrupt backup GPT table

fdisk reports the reduced partition size

fsck on the partition reports the full size and no errors

Comment 46 Bjarne Thomsen 2016-07-05 15:22:49 CEST

That makes sense. I had seen the warning from fdisk, but I did not know that
the backup partition table does not belong to the partition. This is then why there is an unused space in the beginning of the disk for the real partition
table?

Comment 47 Thierry Vignaud 2016-07-05 15:33:33 CEST

Humm, that's an hint.
Though in the provided install logs, the last partition ended at sector 234438863 whereas the disk has 234441648 sectors:

234441648-234438863 = 2785

So there should have been room
Still that points in the right direction

Status: NEW => ASSIGNED
Source RPM: e2fsprogs? => parted, drakx-installer-stage2

Comment 48 Thierry Vignaud 2016-07-05 15:50:48 CEST

Created attachment 8130 [details]
prevent using the last 33 sectors

Can you try this patch when you live test?
As root, just run the following commands in a terminal:
cd /usr/lib/libDrakX
patch -p2 < /where/it/was/downloaded/18666.patch

Comment 49 Thierry Vignaud 2016-07-05 15:51:15 CEST

Note that this is partial as I think 33 sectors is the minimal GPT size

Comment 50 Thierry Vignaud 2016-07-05 15:58:49 CEST

aka we must find the right libparted API in order to get the partition table size
Or we must ask back libparted which size does the partition really has

Comment 51 Thomas Backlund 2016-07-05 16:06:44 CEST

nope, according to spec, first usable sector is LBA34, and the same in reverse for the end of the disk...

So we need to protect 33 sectors in beginning and 33 at end, and then align partitions on 1M boundary and we should be all good...

Comment 52 Thierry Vignaud 2016-07-05 16:12:46 CEST

So if anyone being able to live reproduce this bug, could you try my patch in attachment #8130 [details] (see comment #48) ?

Thierry Vignaud 2016-07-05 16:12:57 CEST

Keywords: (none) => PATCH

Comment 53 Arne Spiegelhauer 2016-07-05 16:32:16 CEST

Yes, I have completed a successful install and boot with the patch applied

CC: (none) => gm2.asp

Comment 54 Mageia Robot 2016-07-05 16:42:10 CEST

commit 767048570e8c44061cb0d6faf689698d3313870c
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Tue Jul 5 16:25:28 2016 +0200

    prevent GPT partition to use the 33 last sectors
    
    Resolves: mga#18666, mga#17796
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=767048570e8c44061cb0d6faf689698d3313870c

 Bug links:
   Mageia
      https://bugs.mageia.org/18666
      https://bugs.mageia.org/17796

Comment 55 Thierry Vignaud 2016-07-05 16:43:54 CEST

Closing then (Aligning end of partitions is another issue)

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED

Comment 56 Thierry Vignaud 2016-07-06 13:44:39 CEST

*** Bug 18876 has been marked as a duplicate of this bug. ***

CC: (none) => LpSolit

Comment 57 Mageia Robot 2016-07-31 17:31:29 CEST

commit fa5d71d34b1193fea0e290bb0586d0ff02747366
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Wed Jul 27 17:20:33 2016 +0200

    first usable sector is LBA34 for GPT
    
    same rationale as in commit 767048570e8c44061cb0d6faf689698d3313870c for
    mga#18666
    
    this wasn't an issue as we later round partition on 1MB boundary but
    it's still cleaner/safer...
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=fa5d71d34b1193fea0e290bb0586d0ff02747366

Note You need to log in before you can comment on or make changes to this bug.