Bug 22032 - Resizing an encrypted XFS partition only resizes the LUKS container, not the XFS partition
Summary: Resizing an encrypted XFS partition only resizes the LUKS container, not the ...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard: (MGA6)
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2017-11-15 22:49 CET by Nicolas Pomarède
Modified: 2019-05-14 18:40 CEST (History)
6 users (show)

See Also:
Source RPM: drakx-installer-stage2, drakxtools
CVE:
Status comment:


Attachments
complete draks logs during the install, with the partitions creation (432.39 KB, application/x-compressed-tar)
2017-11-15 22:51 CET, Nicolas Pomarède
Details
Proposed fix (1.38 KB, text/plain)
2017-11-27 16:15 CET, Martin Whitaker
Details
report.bug from installation as requested above (161.05 KB, application/x-xz)
2019-04-14 11:39 CEST, Ulrich Beckmann
Details
Formatting during installation (63.12 KB, image/png)
2019-04-14 11:54 CEST, Ulrich Beckmann
Details
Mismatch of size physical volume/filesystem (105.54 KB, image/png)
2019-04-14 11:56 CEST, Ulrich Beckmann
Details
cli output of cryptsetup luksDump (147.65 KB, image/png)
2019-04-14 12:06 CEST, Ulrich Beckmann
Details
Proposed fix for second bug (1.83 KB, text/plain)
2019-04-14 19:32 CEST, Martin Whitaker
Details

Description Nicolas Pomarède 2017-11-15 22:49:53 CET
while installing a new mga6 laptop, I chose to use encryption for the "/" partition. Eveything went fine using the disk installer from Mageia iso image and I now have an encrypted partition.

But the strange thing is that this "/" partition is only ~7GB in size, while it was ~235 GB under installer and reported as 235 GB by lsblk.

df reports :

Sys. de fichiers            Taille Utilisé Dispo Uti% Monté sur
devtmpfs                      3,9G       0  3,9G   0% /dev
tmpfs                         3,9G       0  3,9G   0% /dev/shm
tmpfs                         3,9G    1,3M  3,9G   1% /run
/dev/mapper/crypt_nvme0n1p4   7,7G    6,9G  334M  96% /
tmpfs                         3,9G       0  3,9G   0% /sys/fs/cgroup
tmpfs                         3,9G    8,0K  3,9G   1% /tmp
/dev/nvme0n1p2                190M     62M  115M  35% /boot
/dev/nvme0n1p1                 50M    123K   50M   1% /boot/EFI
tmpfs                         789M     20K  789M   1% /run/user/1001


lsblk :

NAME                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb                   8:16   1  14,7G  0 disk
└─sdb1                8:17   1  14,7G  0 part  /run/media/test/KINGSTON
nvme0n1             259:0    0 238,5G  0 disk
├─nvme0n1p1         259:1    0    50M  0 part  /boot/EFI
├─nvme0n1p2         259:2    0   200M  0 part  /boot
├─nvme0n1p3         259:3    0     3G  0 part  [SWAP]
└─nvme0n1p4         259:4    0 235,3G  0 part
  └─crypt_nvme0n1p4 252:0    0 235,3G  0 crypt /


/etc/fstab :

/dev/mapper/crypt_nvme0n1p4 / ext4 noatime,acl 0 0
/dev/nvme0n1p2 /boot ext4 noatime,acl 1 2
/dev/nvme0n1p1 /boot/EFI vfat umask=000,iocharset=utf8 0 0
none /proc proc defaults 0 0
/dev/nvme0n1p3 swap swap defaults 0 0
Comment 1 Nicolas Pomarède 2017-11-15 22:51:09 CET
Created attachment 9793 [details]
complete draks logs during the install, with the partitions creation
Comment 2 Marja Van Waes 2017-11-16 00:07:04 CET
According to the logs, stage2's diskdrake tries GPT partitioning, but fdisk before and after only see /dev/sda with dos disklabel type

And there's a bunch of "modprobe: FATAL: Module <module name> not found" messages in the partitioning step, of which I don't know whether they matter.

Setting version to cauldron, because already released isos cannot be changed.
Adding "(MGA6)" to the whiteboard, because this issue was last seen with a Mageia6 iso.

Source RPM: (none) => drakx-installer-stage2
CC: (none) => marja11
Version: 6 => Cauldron
Assignee: bugsquad => mageiatools
Whiteboard: (none) => (MGA6)

Comment 3 Dave Hodgins 2017-11-16 09:54:54 CET
The modprobe errors are for hardware not present on the system, and not relevant
to the problem.

The lvm messages are just the installer checking to see if lvm has already
been used on the system, and installing the packages needed should there be
a decision to add lvm after installation. While luks encrypted block devices
do use the same device mapper module used by lvm, in this case lvm is not
being used.

Just fyi, a partition that uses luks encryption is treated by the kernel as a
block device, like a lvm volume group or a hard drive, that in theory can have
it's own partition table and then be divided up into multiple logical
partitions, though diskdrake hides this and assumed the luks device will only
contain one partition. By the same token, a hard drive does not have to have
a partition table, as the entire drive can be one file system, though most
partitioning tools, including diskdrake do not work with this.

It looks to me like the partition table entries were created, deleted, and
recreated. I see nothing in the log that would indicate why the mkfs.ext4
would have limited the size of the file system.

The only thing I can think of, is that with the deletion/recreation of the
partitions, is that an additional forcing the kernel to re-read the partition
table was needed. My experience with gpt partitioning is still rather limited
compared to mbr style partitioning. I'll try to recreate the problem, though
that may take a while.

Adding Thomas and Martin to the cc list.

CC: (none) => davidwhodgins, mageia, tmb

Ulrich Beckmann 2017-11-17 04:51:00 CET

CC: (none) => bequimao.de

Comment 4 Martin Whitaker 2017-11-18 14:34:29 CET
I have tried to reproduce this (in VirtualBox) without success.

From Nicolas's ddebug.log, it looks like he first created a 16384000 sector (7.8GB) encrypted root partition:

* GPT partitioning: (add, 4, 6658048, 16384000)

Some errors occurred when initialising this partition:

* running: dmsetup table crypt_nvme0n1p4  
* could not find the device 259:7 for mapper/crypt_nvme0n1p4

and

* fs::get::device2part: unknown device <</dev/nvme0n1p4>>
* crypttab: unknown device /dev/nvme0n1p4 for crypt_nvme0n1p4

but then the encrypted partition appears to have been created and formatted successfully:

* formatting device mapper/crypt_nvme0n1p4 (type ext4)
* running: mkfs.ext4 -F /dev/mapper/crypt_nvme0n1p4
* running: tune2fs -c0 -i0 /dev/mapper/crypt_nvme0n1p4
tune2fs 1.43.4 (31-Jan-2017)
Setting maximal mount count to -1
Setting interval between checks to 0 seconds
* running: blkid -o udev -p /dev/mapper/crypt_nvme0n1p4
* blkid gave: ext4 fb30a9b9-d973-4c68-b94e-5c31062c482d

But then that partition is deleted and a new much larger partition is added (presumably by Nicolas):

* GPT partitioning: (del, 4, , )
* GPT partitioning: (add, 4, 6658048, 493457408)

At that point, if it was still set as an encrypted partition, the process of setting up and formatting the encrypted partition should have been repeated, but it isn't. In my tests it is, so I can only surmise that the earlier errors have left the installer in a confused state.

Without a way to reproduce the bug, there's not much more I can do.
Comment 5 Nicolas Pomarède 2017-11-20 12:21:25 CET
hi

regarding he fact that the partition was first created using 7.8 GB, it's because it was the last partition of the SDD, so I wanted to use the whole space left, but when adding partition, you can only enter the size in MB, not the number of the last cylinder to use (whereas you can when doing "modify partition"), so it's not possible to enter a size in MB that will fill exactly the rest of the disc.

so I created this default partition, then I used "modify partition" , which tells me the value possible for "max cyclinder" and I entered this value to extend the partition to fill the whole SSD.

It might be at this point that the installer should reformat the crypted partition with the new size of 230 GB, but maybe it considered it was already formatted and didn't re-issue another mkfs.ext4 ?
Comment 6 Martin Whitaker 2017-11-20 13:15:49 CET
(In reply to Nicolas Pomarède from comment #5)
> so I created this default partition, then I used "modify partition" , which
> tells me the value possible for "max cyclinder" and I entered this value to
> extend the partition to fill the whole SSD.

I'm confused by this. I assume "modify partition" is what I see labelled as "Resize" - but there I only see values in MB, not in cylinders.

I did try using Resize when trying to reproduce the bug, but that also resulted in the encrypted partition being resized/reformatted.
Comment 7 Nicolas Pomarède 2017-11-20 13:37:35 CET
Sorry, the wording might not be the good ones as I don't have the installer running at the moment, but it was a 2 step process in my case : create the partition with encryption first, then resize it after to fill the whole SSD.
Comment 8 Martin Whitaker 2017-11-20 16:54:22 CET
I have now reproduced this bug in VirtualBox. It only occurs if you use a NVMe disk controller, not a SATA one. Steps to reproduce are:

1. Start with a GPT partitioned disk with 4 partitions. I don't think the size or type matters, but the number of pre-existing partitions does.
2. Boot from the 64-bit Classic Installer ISO in UEFI mode.
3. Choose Custom Partitioning.
4. Clear all existing partitions (use the "Clear all" button).
5. Select Expert Mode.
6. Create a new 50MB EFI system partition (/boot/EFI).
7. Create a new 200MB ext4 partition (/boot).
8. Create a new 3GB swap partition.
9. Create a new 8GB encrypted ext4 partition (/).

(this can also be done in non-UEFI mode - just create a BIOS boot partition instead of an EFI system partition).

When you create the encrypted partition, the partition table is written to disk, and the installer attempts to create and format the encrypted partition within the containing partition (see Dave's comment #3 for explanation). This process is only partially successful, but the only indication of that in the GUI is that in the Details box, under "Encrypted" you get the additional line "(to map on crypt_nvme0n1p4)" and a new button entitled "Use" appears on the right hand side. Clicking on the "Use" button does then bring up a pop-up window with the error message "cryptsetup failed".

If you now try to resize the encrypted partition, the containing partition will get resized, but the encrypted partition inside will not - the connection between the two has been lost. But this is only temporary - after rebooting all is well.
Comment 9 Nicolas Pomarède 2017-11-20 17:03:57 CET
This looks like the step I used when installing.
But why is the crypted partition formated while still in the partitioning part ?

What I mean is that once you're done partitioning everything, you press 'done', then a new windows appears with a list of all partitions, asking if you want to format them. Shouldn't the crypted partition be formatted only here and not in the previous steps where you just do create / resize partitions ?
Comment 10 Martin Whitaker 2017-11-20 17:35:16 CET
(In reply to Nicolas Pomarède from comment #9)
> This looks like the step I used when installing.
> But why is the crypted partition formated while still in the partitioning
> part ?

I've no idea. But that seems to be the way it has been designed to work. And if you don't start with four pre-existing partitions or if you aren't using a NVMe device, it does work.

I've never used encrypted partitions before. Hopefully someone who knows more about it will chip in.
Comment 11 Dave Hodgins 2017-11-20 19:00:27 CET
At that point, it isn't formatting the filesystem, it's formatting the
container for the encrypted block device. The steps involved in creating the
encrypted file system are ...

Allocate the space in the partition table, nvme0n1p4 in this case. This
is called the base device in luks terms.

cryptsetup luksFormat the base device, nvme0n1p4. Using the passphrase,
selected cipher, etc., some data is written to the device in encrypted form.
That data is needed later to verify the correct pass phrases has been 
provided, to avoid corrupting the data if an incorrect passphrase is entered.
This is done only once.

cryptsetup luksOpen the partition. This step makes the unencrypted version of
the data available as a new block device, crypt_nvme0n1p4 in this case.

The device crypt_nvme01n1p4 is then ready to be formatted with a file system
such as ext4, which is then ready for mounting.  As usual with diskdrake, this
can be done immediately, or since it's part of the install, at the end of the
partitioning step.

I am quite familiar with luks. I wrote http://clients.teksavvy.com/~davidwhodgins/Luks-Howto.html way back when, before cryptmount was
available. :-)

Now that the specific test case has been found, it should be possible to fix,
but as diskdrake is written in perl (which I find nearly impossible to read),
I can't help with figuring out exactly where the problem is.
Comment 12 Martin Whitaker 2017-11-21 16:51:16 CET
This bug is occurring because when the partition table changes, the kernel/udevd allocates different minor device numbers to the partitions, but the installer doesn't pick up the new numbers. partition_table::write() does finish with the following lines:

    # get major/minor again after writing the partition table so that we got them for dynamic devices
    # (eg: for SCSI like devices with kernel-2.6.28+):
    fs::get_major_minor([ get_normal_parts($hd) ]);

but fs::get_major_minor() only reads new numbers if it doesn't already have them. I was able to fix this like so:

--- a/perl-install/fs.pm
+++ b/perl-install/fs.pm
@@ -144,7 +144,7 @@ sub get_major_minor {
        eval {
            my (undef, $major, $minor) = devices::entry($_->{device});
            ($_->{major}, $_->{minor}) = ($major, $minor);
-       } if !$_->{major};
+       };
     }
 }
 
but maybe there's a reason for not doing that elsewhere.

Sadly, testing this fix exposed another problem, which I've written up as bug 22059.

Keywords: (none) => PATCH

Comment 13 Dave Hodgins 2017-11-21 22:07:02 CET
The production and start of testing of the Mageia 6.1 iso images is still
waiting for kernel 4.14, which is waiting for VirtualBox 5.2.2.

If possible, it would be nice to have the fixes for these two bugs included
in those iso images.
Comment 14 Martin Whitaker 2017-11-24 12:52:04 CET
N.B. The reason we already have major/minor numbers for the new partitions is that fsedit::add() also calls fs::get_major_minor(). This seems wrong - it can't reliably get correct values until the partition table is written. But maybe that's needed when the partition is being added in a LVM...
Comment 15 Dave Hodgins 2017-11-24 18:46:52 CET
What really makes it more complicated is that a lvm volume group can contain
physical volumes that are stored in a luks encrypted block device, or a
luks encrypted block device can be stored inside of an lvm physical group.

I've used diskdrake in the past to have a luks encrypted device in a lvm
pysical volume. It doesn't currently support lvm inside of luks.
Comment 16 Martin Whitaker 2017-11-27 16:15:05 CET
Created attachment 9804 [details]
Proposed fix

Here's a git formatted patch for the proposed fix.
Comment 17 Mageia Robot 2018-01-09 22:18:24 CET
commit eb497cece2871e59a2f981602e44f303826ffeab
Author: Martin Whitaker <mageia@...>
Date:   Mon Nov 27 10:20:34 2017 +0000

    Make fs::get_major_minor() unconditionally read the device numbers.
    
    Thus ensuring we get the correct device numbers after writing a
    partition table (mga#22032).
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=eb497cece2871e59a2f981602e44f303826ffeab
Comment 18 Ulrich Beckmann 2019-04-12 21:46:43 CEST
I have seen this issue before and think that it is a general error regardless of MBR, GPT, LVM partitioning or disk type, and still valid.

Reproducing in a VM (Qemu/KVM)
Choose the size of an encrypted root device 10 GB in expert mode.
The partition table and Luks headers are written to the disk.
Choose the option Resize to 16 GB.
The physical volume is now 16 GB, the filesystem remains at 10 GB.
My guess is that the step # cryptsetup resize ... is missing.

Ulrich
Comment 19 Martin Whitaker 2019-04-13 19:46:04 CEST
(In reply to Ulrich Beckmann from comment #18)
> Reproducing in a VM (Qemu/KVM)
> Choose the size of an encrypted root device 10 GB in expert mode.
> The partition table and Luks headers are written to the disk.
> Choose the option Resize to 16 GB.
> The physical volume is now 16 GB, the filesystem remains at 10 GB.
> My guess is that the step # cryptsetup resize ... is missing.

Sorry, can't reproduce. The relevant part of the installer log shows:

* running: cryptsetup luksClose crypt_sda5
* partition::dos::write sda
* tell kernel del (sda 5  ) force_reboot= rebootNeeded=
* tell kernel add (sda 5 1015808 32752822) force_reboot= rebootNeeded=
* running: udevadm settle
* running: cryptsetup luksOpen /dev/sda5 crypt_sda5 --key-file /tmp/.dmcrypt_key-396
* running: udevadm settle
* running: dmsetup table crypt_sda5
* running: blkid -o udev -p /dev/mapper/crypt_sda5
* blkid gave: ext4 de35c45d-2148-46c1-863c-3800ce7c055b 
* dmcrypt: found mapper/crypt_sda5 type ext4 with rootDevice sda5
* resize2fs /dev/mapper/crypt_sda5 to size 4096000 in block of 4096 bytes
* running: resize2fs -pf /dev/mapper/crypt_sda5 4096000
resize2fs 1.45.0 (6-Mar-2019)
Resizing the filesystem on /dev/mapper/crypt_sda5 to 4096000 (4k) blocks.
The filesystem on /dev/mapper/crypt_sda5 is now 4096000 (4k) blocks long.

Please attach the installer report.bug.xz from your test.
Comment 20 Martin Whitaker 2019-04-13 20:57:14 CEST
Original bug confirmed fixed on Mageia-7-beta3-x86_64 ISO using procedure from comment 8.
Comment 21 Dick Gevers 2019-04-13 22:21:41 CEST
(In reply to Martin Whitaker from comment #20)
> Original bug confirmed fixed on Mageia-7-beta3-x86_64 ISO using procedure
> from comment 8.

Consequently I close the bug. (Anyone disagreeing could reopen...)

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 22 Ulrich Beckmann 2019-04-14 11:39:36 CEST
Created attachment 10926 [details]
report.bug from installation as requested above

I tested with Mga7 beta 3 x86_64 from 2019-04-12, classical iso. Root filesystem was xfs. I did that to underline my assumption, that the issue is independent from filesystems.
Comment 23 Ulrich Beckmann 2019-04-14 11:54:03 CEST
Created attachment 10927 [details]
Formatting during installation
Comment 24 Ulrich Beckmann 2019-04-14 11:56:45 CEST
Created attachment 10928 [details]
Mismatch of size physical volume/filesystem

This mismatch is permanent. Screenshot taken after reboot.
Comment 25 Ulrich Beckmann 2019-04-14 12:06:26 CEST
Created attachment 10929 [details]
cli output of cryptsetup luksDump
Comment 26 Ulrich Beckmann 2019-04-14 12:12:08 CEST
You see under 
Data segment 0: crypt: length (whole device)

# xfs_growfs /
would now enlarge the filesystem to the complete extent.
The original error condition might be a fixed length.

Ulrich

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Comment 27 Martin Whitaker 2019-04-14 12:44:14 CEST
OK, so this is a different bug. If I select ext4, the ext4 partition is resized. If I select xfs, the xfs partition isn't resized.

Summary: Encrypted partition has wrong size in ext4 FS => Resizing an encrypted XFS partition only resizes the LUKS container, not the XFS partition

Comment 28 Martin Whitaker 2019-04-14 19:22:15 CEST
In fact the differentiating factor is whether or not the partition can be resized without loss of data. xfs_grow requires the partition to be mounted, but diskdrake does not allow mounted partitions to be resized, so will always perform a lossy resize on an xfs partition.

When performing a lossless resize, diskdrake does

  cryptsetup luksClose <partition>
  write partition table
  cryptsetup luksOpen <partition>
  resize partition

but when doing a lossy resize, it just does

  cryptsetup luksClose <partition>
  cryptsetup luksOpen <partition>

postponing the partition table write until it exits or until it formats the partition. This means that the encrypted partition mapper still retains the old size, as can be seen in /proc/partitions.

Source RPM: drakx-installer-stage2 => drakx-installer-stage2, drakxtools

Comment 29 Martin Whitaker 2019-04-14 19:32:54 CEST
Created attachment 10931 [details]
Proposed fix for second bug

Tested in diskdrake on a running system.
Comment 30 Martin Whitaker 2019-04-16 23:04:49 CEST
Tested in the installer, and pushed to git.
Comment 31 Ulrich Beckmann 2019-05-14 17:22:45 CEST
Tested now with the first round of Mageia 7 rc1, classical installer.

Works now! The encrypted xfs filesystem is now resized to the full extend.

Ulrich

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED

Comment 32 Morgan Leijström 2019-05-14 18:40:51 CEST
Thanks!
I think i have stumbled on this in the past but had no time to investigate then.

CC: (none) => fri


Note You need to log in before you can comment on or make changes to this bug.