Bug 20074 - Partitions with Type: Empty corrupt the partition table
Summary: Partitions with Type: Empty corrupt the partition table
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: release_blocker critical
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: 6sta2, PATCH
Depends on:
Blocks:
 
Reported: 2017-01-04 23:53 CET by Marja Van Waes
Modified: 2017-09-08 03:40 CEST (History)
16 users (show)

See Also:
Source RPM: drakxtools
CVE:
Status comment: Fixed in the pre-release QA Live isos from February 5, not yet in CI isos. Patches not in git, because they haven't been accepted


Attachments
Patch to prevent kernel rescanning part way though writing the partition table (6.33 KB, text/plain)
2017-01-22 01:55 CET, Martin Whitaker
Details
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (3.68 KB, text/plain)
2017-01-22 01:56 CET, Martin Whitaker
Details
Patch to preserve Empty partitions rather than treating them as free space (1.62 KB, text/plain)
2017-01-22 01:57 CET, Martin Whitaker
Details
Patch to preven Empty or BIOS_GRUB partitions being counted as free space (1.09 KB, text/plain)
2017-01-22 01:58 CET, Martin Whitaker
Details
Patch to add POD documentation for new partition table write API (1.34 KB, text/plain)
2017-01-28 18:19 CET, Martin Whitaker
Details
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v2) (2.61 KB, patch)
2017-01-30 12:28 CET, Thierry Vignaud
Details | Diff
Patch to add POD documentation for new partition table write API (v2) (1.52 KB, patch)
2017-01-30 12:31 CET, Thierry Vignaud
Details | Diff
Patch to prevent kernel rescanning part way though writing the partition table (v2) (6.21 KB, patch)
2017-01-30 16:36 CET, Thierry Vignaud
Details | Diff
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v3) (2.66 KB, patch)
2017-01-30 16:36 CET, Thierry Vignaud
Details | Diff
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v4) (2.68 KB, text/plain)
2017-02-04 17:09 CET, Martin Whitaker
Details
Patch to fix another partition table sync problem (1.42 KB, text/plain)
2017-02-04 21:03 CET, Martin Whitaker
Details
report.bug (23.79 KB, application/x-xz)
2017-02-21 07:19 CET, ben mcmonagle
Details

Description Marja Van Waes 2017-01-04 23:53:31 CET
+++ This bug was initially created as a clone of Bug #19935 +++
(Custom disk partitioning requests BIOS boot partition on non-GPT disk, creating one and continuing leads to a corrupted partition table and thus unusable disk.)

Martin Whitaker discovered that the corrupted partition table is a separate issue, see:
https://bugs.mageia.org/show_bug.cgi?id=19935#c27
> The real problem is that the BIOS boot partition is given the Empty
> partition type. In some places Empty partitions are assigned partition
> numbers, in other places they aren't, so the drakx view of partition numbers
> gets out of step with the kernel.
> 
> This doesn't just apply to drakx; when booting or when using the partx
> command to reload the kernel partition table, Empty partitions get assigned
> partition numbers, but when using the partprobe command to reload the kernel
> partition table, they don't.
Comment 1 Martin Whitaker 2017-01-09 22:54:29 CET
After a lot of experimentation, I think there are two distinct bugs:

1) The drakx code uses a partition type of Empty (0x00) to tag free regions (i.e. sectors that aren't in any partition. So when a real partition with a partition type of Empty is found, this gets treated as a free region, instead of being preserved as a real partition.

The cleanest fix for this (IMO) would be to use a special (and non-valid) partition type to tag free regions. But this would require quite widespread changes to the code. A simpler and safer alternative is to detect Empty partitions and convert their partition type to a special value. Simplest of all is to reuse the special 'BIOS_GRUB' value, as at the moment auto-allocation still requires/creates BIOS boot partitions on non-GPT disks (which get converted to Empty partitions when the partition table is written).

@Thierry, WDYT?

2) When a partition table is written to disk, something triggers the kernel to automatically reload its partition table cache (I assumed this was a udev rule, but running 'udevadm control --stop-exec-queue' doesn't seem to delay it). When one of the partitions is an Empty partition, the kernel then doesn't properly process subsequent requests to add or delete partitions from its cache (via the BLKPG ioctl). I've no idea what's really going on here (and the behaviour is quite erratic), but running 'udevadm settle' before issuing the add or delete requests seems to fix it. Of course it may just be the extra delay which is "fixing" the problem.
Comment 2 Martin Whitaker 2017-01-14 19:23:44 CET
More experimentation shows my 'udev settle' fix for the second issue is not reliable. Unless somebody has an idea for how to fix this, I think the only safe thing is to detect Empty partitions when reading a DOS partition table and refuse to continue, and to refuse to write DOS partition tables containing Empty partitions.

This would also require improvements to the partitioning wizard so that it no longer creates BIOS boot partitions for non-GPT disks.
Comment 3 Charles Edwards 2017-01-15 00:13:52 CET
Did you see my post Fri to <qa-discuss@ml.mageia.org> about re-starting the stage2
after|if you hit the 'BIOS boot' will allows Custom partitioning to complete wihout
a BIOS boot?

If you think it might offer insight I can include attachments both created during the same install:
report1.bug-----created when I hit the BIOS boot error
report2.bug-----created after re-starting stage2 and making it through Custom partitioning and am at the package selection stage

I have tested the i586 Mga6-stg2 install in this manner 7 or 8 times and the procedure has worked every time.
Comment 4 Martin Whitaker 2017-01-15 00:35:53 CET
(In reply to Charles Edwards from comment #3)
> Did you see my post Fri to <qa-discuss@ml.mageia.org> about re-starting the
> stage2 after|if you hit the 'BIOS boot' will allows Custom partitioning to
> complete wihout a BIOS boot?

I did, but there's an easy fix for the problem with Custom (manual) partitioning (https://bugs.mageia.org/attachment.cgi?id=8834 attached to bug 19935). The problem is that if you use any of the auto-partitioning schemes, they still create a BIOS boot partition, which gets translated into an Empty partition on a non-GPT disk. If you subsequently use diskdrake, it doesn't recognise Empty partitions, leading either to it crashing, or worse, corrupting the partition table.
Comment 5 Martin Whitaker 2017-01-22 01:53:16 CET
(In reply to Martin Whitaker from comment #1)
> 2) When a partition table is written to disk, something triggers the kernel
> to automatically reload its partition table cache (I assumed this was a udev
> rule, but running 'udevadm control --stop-exec-queue' doesn't seem to delay
> it). When one of the partitions is an Empty partition, the kernel then
> doesn't properly process subsequent requests to add or delete partitions
> from its cache (via the BLKPG ioctl). I've no idea what's really going on
> here (and the behaviour is quite erratic), but running 'udevadm settle'
> before issuing the add or delete requests seems to fix it. Of course it may
> just be the extra delay which is "fixing" the problem.

What's going on here is that if none of the partitions of a DOS-partitioned disk are currently mounted, the kernel automatically rescans the partition table after it has been written. There is then a race between the kernel performing this rescan and diskdrake telling it what changes have been made. Because diskdrake describes the changes as a sequence of add or delete operations, it needs to win the race (as that sequence is relative to the initial state of the partition table, not the newly rescanned state).

The automatic rescan appears to be triggered when the raw device is closed after being written to. This leads to a further problem. diskdrake currently writes the DOS partition information as a sequence of open/write/close operations. The first in the sequence writes the primary partition table, the second in the sequence writes the first extended partition table, and so on. Each step in the sequence triggers a rescan of the partition table, but because the extended partition tables form a linked list, rescanning when only part of the list has been written can lead to the kernel reading invalid information, which gets it even more confused.

There doesn't appear to be any automatic rescanning performed for GPT disks, so it's only DOS-partitioned disks that suffer from this bug.

Following is a sequence of patches that I believe fix all the above bugs (along with the reordering patch from bug 19935 which has already been applied). Having received no guidance, I've gone for the simplest solution for handling Empty partitions (as described in comment 1). For fixing the race, I've made diskdrake detect whether no partitions are mounted, and if so, rely on the automatic rescan rather than telling the kernel about the changes.

I've done a fair amount of testing on both DOS and GPT partitioned disks in VirtualBox without seeing any problems. I've left the behaviour for BSD/Mac/Sun partitioned disks unchanged, as I know nothing about those.

N.B. When testing this, people need to be aware that diskdrake preferentially displays the file system type it finds in a partition, rather than the base partition type. So if you change a partition type (or delete a partition and create a new one staring at the same sector) but don't format it, you'll still see the old file system type. This is particularly noticeable with Empty partitions, as they aren't formatted. It might be worth making diskdrake ignore any file system it finds in an Empty partition.

Keywords: (none) => PATCH

Comment 6 Martin Whitaker 2017-01-22 01:55:04 CET
Created attachment 8880 [details]
Patch to prevent kernel rescanning part way though writing the partition table
Comment 7 Martin Whitaker 2017-01-22 01:56:15 CET
Created attachment 8881 [details]
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table
Comment 8 Martin Whitaker 2017-01-22 01:57:15 CET
Created attachment 8882 [details]
Patch to preserve Empty partitions rather than treating them as free space
Comment 9 Martin Whitaker 2017-01-22 01:58:19 CET
Created attachment 8883 [details]
Patch to preven Empty or BIOS_GRUB partitions being counted as free space
Comment 10 Thierry Vignaud 2017-01-22 10:14:32 CET
Comment on attachment 8880 [details]
Patch to prevent kernel rescanning part way though writing the partition table

It might be worth to amend your commit chlog in order to describe more the actual changes:
- adding a start_write() API returning a file handle one the raw device
- write() takes that file handle as parameter
- end_write() closes the file handle

Or even better document it as POD in partition_table.pm

I just added some basic POD to it as an example
Comment 11 Thierry Vignaud 2017-01-22 10:24:50 CET
See "perldoc ./partition_table.pm" for testing

As for the 2 latest patches, I disagree with them & I would prefer to fix the real bug.

Where do you see that BIOS_GRUB part on MBR disk?
Under UEFI with a GPT & a MBR partitionned disk?
Comment 12 Martin Whitaker 2017-01-22 12:57:30 CET
(In reply to Thierry Vignaud from comment #10)
> I just added some basic POD to it as an example

Did you forget to attach this?

(In reply to Thierry Vignaud from comment #11)
> See "perldoc ./partition_table.pm" for testing
> 
> As for the 2 latest patches, I disagree with them & I would prefer to fix
> the real bug.

There are two real bugs:

1) If an Empty partition exists on a DOS partitioned disk, diskdrake will silently delete it (and its view of the partitions then gets out of sync with the kernel's view). This should be fixed, regardless of whether or not we fix the other bug. See comment 1 for alternatives. At this stage of mga6 development, I think the cleanest fix is too big a change, but it could go on the todo list for mga7.

2) The partitioning wizard still creates BIOS boot partitions (which translate to Empty) partitions on DOS partitioned disks. This also should be fixed, but if we fix (1) then it is at least harmless.

> Where do you see that BIOS_GRUB part on MBR disk?
> Under UEFI with a GPT & a MBR partitionned disk?

To reproduce:

1. Boot a Live DVD ISO in Virtual Box with an attached empty virtual hard disk.
2. Create and format a FAT32 partition at the beginning of the disk.
3. Run draklive-install, select custom partitioning, and click on the 'Auto allocate' button.

Unless you've applied the above patches, you won't see that a BIOS boot partition has been created, but fdisk will show it, e.g.

[root@linux live]# fdisk -l /dev/sda
Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot    Start      End  Sectors  Size Id Type
/dev/sda1  *        2048  8390655  8388608    4G  b W95 FAT32
/dev/sda2        8392671 41929649 33536979   16G  5 Extended
/dev/sda5        8392704  8401994     9291  4.5M  0 Empty
/dev/sda6        8404992 36772784 28367793 13.5G 83 Linux
/dev/sda7       36775936 41929649  5153714  2.5G 82 Linux swap / Solaris

I would expect the same behaviour if you choose the "Use available space" option. 

This occurs because the is_boot_bios_part_needed() can't identify the root device until a /boot or / partition have been allocated (and defaults to assuming a GPT disk), and the BIOS boot partition is allocated first.
Comment 13 Thierry Vignaud 2017-01-22 18:01:45 CET
1) See http://gitweb.mageia.org/software/drakx/log/?id=0a19f082255e3f581074ca82d25cce826f59d370

2) So this bug is specific to draklive-install ?
Comment 14 Marja Van Waes 2017-01-22 18:46:11 CET
(In reply to Thierry Vignaud from comment #13)

> 
> 2) So this bug is specific to draklive-install ?

No, see bug 20161 where I reproduced it twice with QA's current 32bit CI
Comment 15 Martin Whitaker 2017-01-22 19:18:33 CET
(In reply to Marja van Waes from comment #14)
> (In reply to Thierry Vignaud from comment #13)
> 
> > 
> > 2) So this bug is specific to draklive-install ?
> 
> No, see bug 20161 where I reproduced it twice with QA's current 32bit CI

And the other three bugs described here, which the patches address, affect diskdrake, draklive-install, and the classic installer.
Comment 16 Martin Whitaker 2017-01-28 18:19:44 CET
Created attachment 8901 [details]
Patch to add POD documentation for new partition table write API
Comment 17 Thierry Vignaud 2017-01-30 12:28:08 CET
Created attachment 8906 [details]
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v2)

I rewritten 2nd patch to be less intrusive
Also I feared about LVM and the like...

Attachment 8881 is obsolete: 0 => 1

Comment 18 Thierry Vignaud 2017-01-30 12:31:35 CET
Created attachment 8907 [details]
Patch to add POD documentation for new partition table write API (v2)

enhanced doc

Attachment 8901 is obsolete: 0 => 1

Comment 19 Thierry Vignaud 2017-01-30 16:36:39 CET
Created attachment 8908 [details]
Patch to prevent kernel rescanning part way though writing the partition table (v2)

simplify by moving copies of noop funcs into the base class
Comment 20 Thierry Vignaud 2017-01-30 16:36:46 CET
Created attachment 8909 [details]
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v3)

use the right parent class :-(
Comment 21 Rémi Verschelde 2017-02-01 08:36:32 CET
@ Martin, if you think the patch set looks good as is, could you try to apply it on a test ISO so that we can test how it turns out?
Comment 22 Martin Whitaker 2017-02-01 09:38:34 CET
(In reply to Rémi Verschelde from comment #21)
> @ Martin, if you think the patch set looks good as is, could you try to
> apply it on a test ISO so that we can test how it turns out?

I could, but don't we need to fix bug 20161 too? I could provide a patch for that as well, but really need an answer to my question (bug 20161, comment 17).
Comment 23 Martin Whitaker 2017-02-04 17:09:23 CET
Created attachment 8926 [details]
Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v4)

Fix comments - the default is to assume the kernel *doesn't* reread the partition table.

Attachment 8909 is obsolete: 0 => 1
Attachment 8906 is obsolete: 0 => 1

Comment 24 Martin Whitaker 2017-02-04 21:03:20 CET
Created attachment 8927 [details]
Patch to fix another partition table sync problem

Further testing using diskdrake in Vbox showed no problems when no partitions were mounted and the kernel was automatically rescanning the partition table, but showed there was still a problem when a partition was mounted and diskdrake was telling the kernel about the changes. This only manifested when there was an Empty partition present on a DOS partitioned disk. Deleting another partition would cause the kernel to also forget about the Empty partition (and renumber the remaining partitions).

By experimentation I found that if I removed the

  udevadm ctrl --stop-exec-queue

and

  udevadm ctrl --start-exec-queue

from tell_kernel(), this bug went away. I can't explain why this works - you'll need a kernel expert for that.
Comment 25 Martin Whitaker 2017-02-04 21:05:15 CET
(In reply to Rémi Verschelde from comment #21)
> @ Martin, if you think the patch set looks good as is, could you try to
> apply it on a test ISO so that we can test how it turns out?

I've rebuilt the Live ISOs in the QA mageia6-pretesting rsync directory to include all the above patches.
Comment 26 Marja Van Waes 2017-02-05 09:19:45 CET
(In reply to Martin Whitaker from comment #25)
> (In reply to Rémi Verschelde from comment #21)
> > @ Martin, if you think the patch set looks good as is, could you try to
> > apply it on a test ISO so that we can test how it turns out?
> 
> I've rebuilt the Live ISOs in the QA mageia6-pretesting rsync directory to
> include all the above patches.

Thanks, Martin.

I'm in Brussels for Fosdem, now, but intend to travel home soon to test the patches later today or this evening. However, if you do already have enough feedback from other testers, then please say so :-)
Comment 27 Marja Van Waes 2017-02-05 22:48:18 CET
With the 32bit live XFCE iso from last night:

Even with two empty partitions, one of them at the beginning of the disk, the other on the extended partition, there's no sign of partition table corruption :-)

Status comment: (none) => Fixed in the pre-release QA isos from February 5

Marja Van Waes 2017-02-06 16:05:29 CET

Status comment: Fixed in the pre-release QA isos from February 5 => Fixed in the pre-release QA Live isos from February 5, not yet in CI isos

Comment 28 Dave Hodgins 2017-02-09 20:10:42 CET
All of my testing with the pre-testing iso images indicates the patches do fix the problem.

Need clarification on whether the patches only apply to the installer, or whether diskdrake needs to be fixed too, and have the patches been applied in git, etc.

CC: (none) => davidwhodgins

William Kenney 2017-02-09 21:20:29 CET

CC: (none) => wilcal.int

Comment 29 Martin Whitaker 2017-02-10 00:49:22 CET
(In reply to Dave Hodgins from comment #28)
> Need clarification on whether the patches only apply to the installer, or
> whether diskdrake needs to be fixed too, and have the patches been applied
> in git, etc.

The patches apply to common code shared by the installer and diskdrake, so both are fixed. The only outstanding issue with diskdrake is the need to prevent Xfce/Thunar automounting partitions whilst diskdrake is working (see bug 20247).

The patches have not been applied in git. Thierry said in comment 11 that he disagreed with some of them (although he didn't explain why). Without his approval, I won't push anything.

Note that the patches attached to bug 20161 and bug 20164 also need to be applied.
Comment 30 Marja Van Waes 2017-02-11 20:52:47 CET
(In reply to Martin Whitaker from comment #29)
> (In reply to Dave Hodgins from comment #28)
> > Need clarification on whether the patches only apply to the installer, or
> > whether diskdrake needs to be fixed too, and have the patches been applied
> > in git, etc.
> 
> The patches apply to common code shared by the installer and diskdrake, so
> both are fixed. The only outstanding issue with diskdrake is the need to
> prevent Xfce/Thunar automounting partitions whilst diskdrake is working (see
> bug 20247).
> 
> The patches have not been applied in git. Thierry said in comment 11 that he
> disagreed with some of them (although he didn't explain why). Without his
> approval, I won't push anything.
> 
> Note that the patches attached to bug 20161 and bug 20164 also need to be
> applied.

@ martinw, tv, pterjan, ennael

Wouldn't it be better to let Martin create a separate git branch of software/drakx, so that it is easier for ennael to use the most recent proposed patches when building classical isos, and easier for pterjan and tv to review the patches and then merge them when found to be OK?

Is pushing a separate branch possible for martinw, or is sysadmin intervention needed for that?

@ pterjan

Since diskdrake is/was mostly your baby: would you have time to review all the patches? (Yeah, they're in several bug reports now :-( )

Status comment: Fixed in the pre-release QA Live isos from February 5, not yet in CI isos => Fixed in the pre-release QA Live isos from February 5, not yet in CI isos. Patches not in git, because they haven't been accepted

Comment 31 Dave Hodgins 2017-02-11 22:02:51 CET
When the patches are eventually applied to git, the packages rebuilt, and the iso images regenerated, we will have to do testing to ensure that has all been done properly.

Please go ahead and apply them to the main git repo, and let's get the rest of the work underway. Until that's done, sta2 is on hold, indefinitely.

If problems are found at during that testing, we'll deal with those problems then.
Comment 32 Martin Whitaker 2017-02-12 09:30:53 CET
(In reply to Dave Hodgins from comment #31)
> Please go ahead and apply them to the main git repo, and let's get the rest
> of the work underway. Until that's done, sta2 is on hold, indefinitely.

OK, I'll do this today. I don't like bypassing the maintainers, but lacking any response, what else can we do.
Comment 33 Anne Nicolas 2017-02-12 12:16:08 CET
or as a test you could create a temporary branch to apply all those patches. Then we can test it and at least give some feedbacks
Comment 34 Mageia Robot 2017-02-12 16:00:02 CET
commit 105d6bea5e69ee7011e5c468ba3cd05eb999b359
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 19:38:55 2017 +0000

    Ensure the kernel doesn't rescan a partially written partition table (mga#20074).
    
    When no partitions on a DOS-partitioned disk are mounted, the kernel
    automatically rescans the partition table when the file handle to the
    raw device is released. Currently the code opens and closes the raw
    device when writing the primary partition table and when writing each
    extended partition table segment. As the extended partition table
    segments form a linked list, this allows the kernel to get in and
    rescan the table when the list is not in a coherent state. This patch
    changes the code to open the raw device before writing the primary
    partition table and to close it only after writing the last extended
    partition table segment.
    
    The behaviour for other partition table types is unchanged.
    
    v2 (tvignaud): simplify by moving copies of noop funcs into the base class
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=105d6bea5e69ee7011e5c468ba3cd05eb999b359
Comment 35 Mageia Robot 2017-02-12 16:00:13 CET
commit d274ab5b4c48d0a91a535c976a9e7a9a78b56f05
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:00:20 2017 +0000

    Don't tell the kernel about partition table changes when it rescans them automatically (mga#20074).
    
    When no partitions on a DOS-partitioned disk are mounted, the kernel
    automatically rescans the partition table when it is written to disk.
    We shouldn't then try to update the kernel's view of the partition
    table, as the list of deltas we have recorded is relative to the
    previous state of the partition table, not the newly rescanned state.
    
    The behaviour for other partition table types is unchanged.
    
    v2 (tvignaud): just make base class assume the kernel doesn't reread, only mbr
    subclass overrides need_to_tell_kernel() in order to be smarter
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=d274ab5b4c48d0a91a535c976a9e7a9a78b56f05
Comment 36 Mageia Robot 2017-02-12 16:00:25 CET
commit 631a08ffe0557b1df214a9d08c7c01745eb18107
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:23:18 2017 +0000

    Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074).
    
    To minimise the changes this close to mga6 release (and until we fix
    the bug in partition auto-allocation that mistakenly creates BIOS boot
    partitions non-GPT disks), reuse the BIOS_GRUB flag for flagging Empty
    partitions, as the real partition ID (0x00) is used to flag free space.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=631a08ffe0557b1df214a9d08c7c01745eb18107
Comment 37 Mageia Robot 2017-02-12 16:00:33 CET
commit 9f0f880c31175e4941dde0c0facee61ca8b11ef4
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:33:43 2017 +0000

    Don't treat Empty or BIOS_GRUB partitions as free space (mga#20074).
    
    The subroutine isEmpty() is used to identify free space on the disk
    (not DOS "Empty" partitions). Because we use a string to flag Empty
    and BIOS_GRUB partitions, rather than a numeric value, the numeric
    equality operator gives false positives.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=9f0f880c31175e4941dde0c0facee61ca8b11ef4
Comment 38 Mageia Robot 2017-02-12 16:01:12 CET
commit 3a76f08fbdd7dc83dc260e6af4f482c600774576
Author: Martin Whitaker <mageia@...>
Date:   Sat Feb 4 17:11:10 2017 +0000

    Fix erratic behaviour when telling kernel to delete partitions (mga#20074).
    
    When telling the kernel about changes to a DOS partition table, if a
    partition was deleted on a disk that also contained an Empty partition,
    the kernel also removed the Empty partition from its cached partition
    table (and renumbered the other partitions).
    
    Experimentation showed that leaving the udev exec queue active whilst we
    were telling the kernel about the changes fixed this problem, although I
    don't have an explanation for why it does.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=3a76f08fbdd7dc83dc260e6af4f482c600774576
Comment 39 Martin Whitaker 2017-02-12 16:17:01 CET
(In reply to Anne Nicolas from comment #33)
> or as a test you could create a temporary branch to apply all those patches.
> Then we can test it and at least give some feedbacks

You're the boss ;-)

I've pushed all my fixes to the branch user/martinw/mga6 in software/drakx. This includes fixes for bug 16055, bug 19888, bug 19979, bug 20161, bug 20164, and bug 20247 as well as for this bug. I've also include Thierry's patch for bug 14699 so we can test it on the next round of Live ISOs.

Note that the fixes for bugs 14699 and 20247 are the only ones not on the last round of Live ISOs, but I have slightly modified one other fix because I found there was still a lingering race with the kernel updating its view of the partition table. I've tested all this locally, including tests with a mixed DOS and GPT system, and not observed any problems other than bug 20264.

I've put a source RPM built from the head of my branch on rabbit in

  /home/martinw/drakxtools-17.72-0.mga6.src.rpm

(using -0 to indicate this is a test build)

Note that these bugs affect diskdrake as well as the installer, so you'll want new RPMs as well as a patched stage 2 for the classic ISOs.
Comment 40 Anne Nicolas 2017-02-12 16:28:00 CET
Thanks a lot for all your work Martin :) I'll test it tonight in a build.
Comment 41 Marja Van Waes 2017-02-13 16:56:06 CET
pterjan told on IRC that he is way too busy to review the patches in http://gitweb.mageia.org/software/drakx/log/?h=user/martinw/mga6 any time soon.
Comment 42 Martin Whitaker 2017-02-20 22:38:12 CET
(In reply to Martin Whitaker from comment #5)
> What's going on here is that if none of the partitions of a DOS-partitioned
> disk are currently mounted, the kernel automatically rescans the partition
> table after it has been written. There is then a race between the kernel
> performing this rescan and diskdrake telling it what changes have been made.
> Because diskdrake describes the changes as a sequence of add or delete
> operations, it needs to win the race (as that sequence is relative to the
> initial state of the partition table, not the newly rescanned state).

I've finally tracked down what causes the kernel to rescan the partition table. Whenever udevd receives a change event for a raw disk device it is watching, it asks the kernel to reread the partition table by calling the BLKRRPART ioctl. BLKRRPART is only successful if none of the partitions on that device are currently mounted.

The /lib/udev/rules.d/60-block.rules file is what causes the raw disk devices to be watched by udevd. This file is not present in the cut-down system used to run the classic installer, so my previous fix for this issue needs to be modified to not apply when running the classic installer. This accounts for the "Internal error: Unknown device sdaX" bug seen when testing the Feb 16th CI ISOs (as reported in bug 20247 comment #21).

I'll push a fix for this to the user/martinw/mga6 branch as soon as I've tested it doesn't break the Live installer.
Comment 43 Anne Nicolas 2017-02-20 22:42:25 CET
(In reply to Martin Whitaker from comment #42)
> (In reply to Martin Whitaker from comment #5)

> 
> I'll push a fix for this to the user/martinw/mga6 branch as soon as I've
> tested it doesn't break the Live installer.

Thanks a lot Martin. I'll pick it to rebuild test stage2 and rebuild test isos
Comment 44 Mageia Robot 2017-02-21 00:14:47 CET
commit ca43ff9c8b9a53cb1d4d68173794556519bac257
Author: Martin Whitaker <mageia@...>
Date:   Mon Feb 20 21:42:27 2017 +0000

    Always tell the kernel about partition table changes when running the classic installer (mga#20074).
    
    The automatic rescan of the partition table is triggered by udevd. The
    udev rule that causes this is not present on the cut-down system that
    runs the classic installer.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=ca43ff9c8b9a53cb1d4d68173794556519bac257
Comment 45 ben mcmonagle 2017-02-21 07:19:22 CET
Created attachment 8972 [details]
report.bug

Mageia-6-sta2-x86_64-DVD.iso
build 20170220 12:49

still valid

install to HDD with no partition table.
partitioning option: "erase and use entire disk"

CC: (none) => westel

Comment 46 Thierry Vignaud 2017-02-22 01:48:17 CET
(In reply to Martin Whitaker from comment #42)
Nope the right fix is to include 60-block.rules in installer.
See http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list.xml#n73
Comment 47 Martin Whitaker 2017-02-22 09:45:40 CET
(In reply to Thierry Vignaud from comment #46)
> (In reply to Martin Whitaker from comment #42)
> Nope the right fix is to include 60-block.rules in installer.
> See
> http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list.
> xml#n73

I think the test for its existence should remain, in case someone removes it again in the future.

It would be better to directly test whether udevd is watching the devices, but I can't find any way to achieve that.
Comment 48 Rémi Verschelde 2017-02-22 09:55:39 CET
(In reply to Martin Whitaker from comment #47)
> I think the test for its existence should remain, in case someone removes it
> again in the future.

I agree.

So what's left to do is to add 60-block.rules to that list.xml Thierry linked in comment 46. Could one of you do it asap so that Anne can get the updated drakx for her upcoming classical ISOs?

Then let's hope the addition doesn't introduce regressions, but at least we'll have more consistency between live and classical ISOs, so that's a sane change to do anyway IMO.
Comment 49 Martin Whitaker 2017-02-22 10:07:45 CET
If Thierry has no objections, can we also merge my branch and do a proper release?
Comment 50 Nicolas Lécureuil 2017-02-22 10:09:20 CET
isos are near to be available for QA with your changes, i think that if tests are OK we will be able to merge ( don't shoot me thierry ;) )

CC: (none) => mageia

Comment 51 Nicolas Lécureuil 2017-02-22 10:11:58 CET
(In reply to Thierry Vignaud from comment #46)
> (In reply to Martin Whitaker from comment #42)
> Nope the right fix is to include 60-block.rules in installer.
> See
> http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list.
> xml#n73

how to check if this is available or not ? i can check on latest isos
Comment 52 Mageia Robot 2017-02-22 12:11:52 CET
commit bdac0595323a6417fb7365038c9777cde0060f58
Author: Nicolas Lécureuil <neoclust@...>
Date:   Wed Feb 22 12:11:45 2017 +0100

    - Add 60-block.rules in the installer (mga#20074)
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=bdac0595323a6417fb7365038c9777cde0060f58
Comment 53 Marja Van Waes 2017-02-22 13:06:45 CET
(In reply to Mageia Robot from comment #52)
> commit bdac0595323a6417fb7365038c9777cde0060f58
> Author: Nicolas Lécureuil <neoclust@...>
> Date:   Wed Feb 22 12:11:45 2017 +0100
> 
>     - Add 60-block.rules in the installer (mga#20074)
> ---
>  Commit Link:
>   
> http://gitweb.mageia.org/software/drakx/commit/
> ?id=bdac0595323a6417fb7365038c9777cde0060f58

Shouldn't that be committed to Martin's branch, too (since that is the one used for the isos)?
Comment 54 Rémi Verschelde 2017-02-22 13:09:41 CET
(In reply to Marja van Waes from comment #53)
> (In reply to Mageia Robot from comment #52)
> > commit bdac0595323a6417fb7365038c9777cde0060f58
> > Author: Nicolas Lécureuil <neoclust@...>
> > Date:   Wed Feb 22 12:11:45 2017 +0100
> > 
> >     - Add 60-block.rules in the installer (mga#20074)
> > ---
> >  Commit Link:
> >   
> > http://gitweb.mageia.org/software/drakx/commit/
> > ?id=bdac0595323a6417fb7365038c9777cde0060f58
> 
> Shouldn't that be committed to Martin's branch, too (since that is the one
> used for the isos)?

Personally I think we've given Martin's patches enough time for a preliminary review. The first tests have been successful, so I think we can merge to master, and if some of the patches need to be improved or reverted, that's not catastrophic either; commits don't cost much :P
Comment 55 Marja Van Waes 2017-02-22 13:13:46 CET
(In reply to Rémi Verschelde from comment #54)
> (In reply to Marja van Waes from comment #53)

> > 
> > Shouldn't that be committed to Martin's branch, too (since that is the one
> > used for the isos)?
> 
> Personally I think we've given Martin's patches enough time for a
> preliminary review. The first tests have been successful, so I think we can
> merge to master, and if some of the patches need to be improved or reverted,
> that's not catastrophic either; commits don't cost much :P

I'm fine with merging now, before building new isos.

That would make all translation updates available in the isos, too :-)
Comment 56 Samuel Verschelde 2017-02-22 13:30:23 CET
Agreed too
Comment 57 Nicolas Lécureuil 2017-02-22 14:01:53 CET
(In reply to Marja van Waes from comment #53)
> (In reply to Mageia Robot from comment #52)
> > commit bdac0595323a6417fb7365038c9777cde0060f58
> > Author: Nicolas Lécureuil <neoclust@...>
> > Date:   Wed Feb 22 12:11:45 2017 +0100
> > 
> >     - Add 60-block.rules in the installer (mga#20074)
> > ---
> >  Commit Link:
> >   
> > http://gitweb.mageia.org/software/drakx/commit/
> > ?id=bdac0595323a6417fb7365038c9777cde0060f58
> 
> Shouldn't that be committed to Martin's branch, too (since that is the one
> used for the isos)?

i mailed martin about my commit :)

Anne's isos are done against martin's branch ( + my commit ) ( for the next iso )
Comment 58 Mageia Robot 2017-02-22 14:19:37 CET
commit c1affe2ea3ec6d38f153d06ae2503a43430403f2
Author: Nicolas Lécureuil <neoclust@...>
Date:   Wed Feb 22 12:11:45 2017 +0100

    - Add 60-block.rules in the installer (mga#20074)
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=c1affe2ea3ec6d38f153d06ae2503a43430403f2
Comment 59 Thierry Vignaud 2017-02-22 14:56:04 CET
I don't have much time (one baby and another one is coming) whereas Martin has obviously spent some time on testing and finding solutions, so I think we should merge his work.

But there's a couple small things that annoy me:
==================================================
- I'm a bit disappointed we've both auto_allocate_bios_boot_parts()
  and fs::any::is_boot_bios_part_needed() but Martin seems to have
  usage cases for relaxing the later so...
- I don't think it's an issue to mount ESP when not in EFI
  (we do mount windows/... partitions so why not ESP).
  Just my 2 cents
- A real issue IMHO:
  we now abuse 'BIOS_GRUB' flag for empty partitions:
  * "Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074)"
  I would prefer we fix the root issue instead of the symptoms...
  Obviously harder so maybe for a later cleanup


Then here's more real peer review:
==================================

1) About "Fix auto-allocation of BIOS boot partitions (mga#20161, mga#19888)": I find a little bit ugly to do a loop on disk when we really are looking at a disk I fear there's a risk of allocating several partitions if a bug would happen later.
I think auto_allocate_bios_boot_parts() shoud first find the right device then allocate the partition outside the loop.
Also I would make more obvious the test for missing parameter.
Last but not least, the code doesn't pass with 'use strict' pragma (undeclared variables, check with perl_checker)
So I suggests sg like this (untested):

sub auto_allocate_bios_boot_parts {
    my ($all_hds, $o_hd) = @_;
    return if !$o_hd;
    # only proceed with the selected device:
    my ($hd) = grep { $o_hd->{device} ne $_->{device} } @{$all_hds->{hds}};
    # only proceed if the selected device is GPT:
    return if ($hd->{pt_table_type} || partition_table::default_type($hd)) ne 'gpt';
    # check if a BIOS boot partition already exists
    my @parts = map { partition_table::get_normal_parts($_) } $hd;
    return if any { isBIOS_GRUB($_) } @parts;
    # try to allocate a BIOS boot partition
    my $suggest = { mntpoint => "", size => MB(1), pt_type => 'BIOS_GRUB', ratio => 1, maxsize => MB(2) };
    allocatePartitions($all_hds, [ $suggest ], $hd);
}

I think it's clearer/more obvious this way.


2) About "Allow a BIOS boot partition to be located on any disk (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks would be relaxed but it introduces a new predicate: "all disks are (or will be) GPT"
That's too much a restriction IMHO.
We should support a system with GPT disks & MBR disks. We just have to only consider GPT disks when checking whether a new ESP is needed

3) About "Exclude x11-driver-video-vmware from unneeded packages (mga#19979)", as I said in the bug report, we still have a very old issue b/c unlike what tmb though, this is not preventing some packages to be removed by draklive-install (whether or not they actually could be removed on the installed system). 
So a later urpme --auto-orphan will still offer to remove them
Obviously this is not a block for this merge but in eventually draklive-install should:
- tell urpmi there's packages that needs to be kept (what a manual "urpmi x11-driver-video-vmware" would do)
- do not blindly keep some packages, but do so according to the installed HW
See https://bugs.mageia.org/show_bug.cgi?id=19979#c4

4) About "Always tell the kernel about partition table changes when running the classic installer", we should remove the added test for the udev rule as it's now included in installer, so the commit should only update the comment but for "This file is not present in the cut-down system used to run the classic installer, so we always need to tell the kernel in that case." that's no more true

9) One last thing though: I would prefer user/martinw/mga6 branch to be rebased on current master before being merged (we tend to prefer a linear history for drakx). You'll need to fix */NEWS conflict though...

============================================
Again, thanks Martin for the hard work.

Though, as noted in my above (1) point, you should run perl_checker on your code before committing, it would catch such things as new variables without declaration ("my $foo"), ...
If you use emacs, look at top of /usr/share/doc/perl_checker/perl_checker.html for how to use perl_checker within emacs.
If you use vim, see /usr/bin/perl_checker-vim
(though I don't think anybody has used perl_checker in vim for 10 years...)
See also /usr/share/doc/perl-MDK-Common/tutorial.html
It really helps in finding stupid mistakes in changes before pushing though from a quick check, your other commits should be OK.

Also, I should share with you the perl-Parted bindings I worked on but I never have time to test more of partition_table::gpt with it, so it's stupid it's resting in its place for 2 years...
It would enables us to write all parts in a GPT at once w/o udev actions on each part....
Comment 60 Thierry Vignaud 2017-02-22 14:57:35 CET
Argh, obviously my suggestions should have been:

sub auto_allocate_bios_boot_parts {
    my ($all_hds, $o_hd) = @_;
    return if !$o_hd;
    # only proceed with the selected device:
    my ($hd) = grep { $o_hd->{device} ne $_->{device} } @{$all_hds->{hds}};
    return if !$hd;
    # only proceed if the selected device is GPT:
    return if ($hd->{pt_table_type} || partition_table::default_type($hd)) ne 'gpt';
    # check if a BIOS boot partition already exists
    my @parts = map { partition_table::get_normal_parts($_) } $hd;
    return if any { isBIOS_GRUB($_) } @parts;
    # try to allocate a BIOS boot partition
    my $suggest = { mntpoint => "", size => MB(1), pt_type => 'BIOS_GRUB', ratio => 1, maxsize => MB(2) };
    allocatePartitions($all_hds, [ $suggest ], $hd);
}
Comment 61 Martin Whitaker 2017-02-23 09:46:40 CET
(In reply to Thierry Vignaud from comment #59)

Thanks for taking the time to review this Thierry. Some responses to your points:

> - A real issue IMHO:
>   we now abuse 'BIOS_GRUB' flag for empty partitions:
>   * "Preserve DOS "Empty" partitions instead of treating them as free space
> (mga#20074)"
>   I would prefer we fix the root issue instead of the symptoms...
>   Obviously harder so maybe for a later cleanup

I agree, but as I've mentioned before, a clean solution will touch a lot more of the code, risking introducing new bugs, so I don't want to attempt this until we've got Mageia-6 out.


> 1) About "Fix auto-allocation of BIOS boot partitions (mga#20161,
> mga#19888)": I find a little bit ugly to do a loop on disk when we really
> are looking at a disk I fear there's a risk of allocating several partitions
> if a bug would happen later.
> I think auto_allocate_bios_boot_parts() shoud first find the right device
> then allocate the partition outside the loop.

I believe there is still a potential use case where there isn't a selected disk (bug 16055 comment 12) and auto-allocation will spread across multiple disks. As in that case we don't know which disk should be the boot device, my solution was to ensure there was a BIOS boot partition on all GPT disks, so the user could have a free choice where to install the boot loader. A bit wasteful maybe, but BIOS boot partitions are very small. With your code, a BIOS boot partition won't get allocated in this case.


> 2) About "Allow a BIOS boot partition to be located on any disk
> (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks
> would be relaxed but it introduces a new predicate: "all disks are (or will
> be) GPT"
> That's too much a restriction IMHO.
> We should support a system with GPT disks & MBR disks. We just have to only
> consider GPT disks when checking whether a new ESP is needed

I think this is a misunderstanding. is_boot_bios_part_needed() will only ever return true if *all* disks are GPT - so in a mixed GPT and MBR system, the user will not be forced to create a BIOS boot partition because grub2 can be installed on the MBR disk(s).


> 4) About "Always tell the kernel about partition table changes when running
> the classic installer", we should remove the added test for the udev rule as
> it's now included in installer, so the commit should only update the comment
> but for "This file is not present in the cut-down system used to run the
> classic installer, so we always need to tell the kernel in that case."
> that's no more true

I still think the extra check should be kept for safety (in case somebody removes that rule file again in the future), but yes, the comment now needs to be updated.


> 9) One last thing though: I would prefer user/martinw/mga6 branch to be
> rebased on current master before being merged (we tend to prefer a linear
> history for drakx). You'll need to fix */NEWS conflict though...

Will do.


> Though, as noted in my above (1) point, you should run perl_checker on your
> code before committing, it would catch such things as new variables without
> declaration ("my $foo"), ...

When I run 'make test_pms' I get thousands of messages like:

File "/usr/share/perl_checker/fake_packages/Gtk3/WebKit2.pm", line 124, character 25-25
you should not have a space here

which makes it hard to see the real problems! No doubt there's a simple fix for this, but there's always been more urgent things to look at...


> Also, I should share with you the perl-Parted bindings I worked on but I
> never have time to test more of partition_table::gpt with it, so it's stupid
> it's resting in its place for 2 years...
> It would enables us to write all parts in a GPT at once w/o udev actions on
> each part....

Yes, I'll take a look at that, although again that looks like one to save for mga7.
Comment 62 Thierry Vignaud 2017-02-23 16:18:48 CET
(In reply to Martin Whitaker from comment #61)
> > 1) About "Fix auto-allocation of BIOS boot partitions (mga#20161,
> > mga#19888)": I find a little bit ugly to do a loop on disk when we really
> > are looking at a disk I fear there's a risk of allocating several partitions
> > if a bug would happen later.
> 
> I believe there is still a potential use case where there isn't a selected
> disk (bug 16055 comment 12) and auto-allocation will spread across multiple
> disks. As in that case we don't know which disk should be the boot device,
> my solution was to ensure there was a BIOS boot partition on all GPT disks,
> so the user could have a free choice where to install the boot loader. A bit
> wasteful maybe, but BIOS boot partitions are very small. With your code, a
> BIOS boot partition won't get allocated in this case.

Humm...
I don't like much creating BIOS boot partition on all GPT disks but if text mode doesn't provide a selected disk, we've no choice. And it's 1MB only.

But I'm worried about which 1Mb part grub2 will use and if the user removes the disk where resides the BIOS boot partition chosen by grub2 then there'll be some sysadmin job...

At least, fix the bugs found by perl_checker then..

> > 2) About "Allow a BIOS boot partition to be located on any disk
> > (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks
> > would be relaxed but it introduces a new predicate: "all disks are (or will
> > be) GPT"
> > That's too much a restriction IMHO.
> > We should support a system with GPT disks & MBR disks. We just have to only
> > consider GPT disks when checking whether a new ESP is needed
> 
> I think this is a misunderstanding. is_boot_bios_part_needed() will only
> ever return true if *all* disks are GPT - so in a mixed GPT and MBR system,
> the user will not be forced to create a BIOS boot partition because grub2
> can be installed on the MBR disk(s).

And I still disagree: does someone actually tested it?
B/c under UEFI, we only allow to install the bootloader... on the ESP...
Under UEFI, we don't allow the user to pick where to install the bootloader...
Thus we'll never pass eg /dev/sdb to grub2-install.
We'll just call grub2-install. Which will try to install on ESP, which means on a GPT partitioned disk, which means it may fail if there's no BOOT BIOS
How is that supposed to work if there's no ?

> I still think the extra check should be kept for safety (in case somebody
> removes that rule file again in the future), but yes, the comment now needs
> to be updated.

I disagree. This is like having both belt and suspenders.
We've to be more confident :-)
I would rather add a comment in install/share/list.xml so that it doesn't disappear :-)
I've started explained why stuff is included but it should be done too when someone adds a new element
I don't like adding random check for some stuff.
Either we check for every thing that's in list.xml or we don't.
Up to now, we properly maintained it, we don't blindly remove things, so I think we're in the clear.

> > Though, as noted in my above (1) point, you should run perl_checker on your
> > code before committing, it would catch such things as new variables without
> > declaration ("my $foo"), ...
> 
> When I run 'make test_pms' I get thousands of messages like:

This is why I advertise you use perl_checker from within your preferred "IDe" (aka emacs or vim). This way you can go directly to the right line
Else you can manually run "perl_checker --restrict-to-files"
The rule of thumb is to only look at perl_checker output regarding the file you're altering.
Also you can compare its output before/after your changes.
We can discuss perl_checker privately if you want (mail).
Comment 63 Martin Whitaker 2017-02-24 01:23:42 CET
(In reply to Thierry Vignaud from comment #62)
> > I think this is a misunderstanding. is_boot_bios_part_needed() will only
> > ever return true if *all* disks are GPT - so in a mixed GPT and MBR system,
> > the user will not be forced to create a BIOS boot partition because grub2
> > can be installed on the MBR disk(s).
> 
> And I still disagree: does someone actually tested it?

Yes, me ;-)

> B/c under UEFI, we only allow to install the bootloader... on the ESP...
> Under UEFI, we don't allow the user to pick where to install the
> bootloader...
> Thus we'll never pass eg /dev/sdb to grub2-install.
> We'll just call grub2-install. Which will try to install on ESP, which means
> on a GPT partitioned disk, which means it may fail if there's no BOOT BIOS
> How is that supposed to work if there's no ?

The BIOS boot partition is only needed for non-UEFI boot. In that case, when you get to the boot loader install screen, you can use the drop-down menu to select which disk grub2 gets installed on. And from my tests, it does appear to be clever enough to only offer you the choice of disks where it can actually be installed (either DOS partitioned or GPT partitioned with a BIOS boot partition).


> > I still think the extra check should be kept for safety (in case somebody
> > removes that rule file again in the future), but yes, the comment now needs
> > to be updated.
> 
> I disagree. This is like having both belt and suspenders.
> We've to be more confident :-)

I won't fight you on this, but I'm all for belt and braces! In my job, bugs can't be easily fixed, so I likely have a different perspective on things.
Comment 64 ben mcmonagle 2017-02-24 02:06:12 CET
for :

Mageia-6-sta2-x86_64-DVD.
(Wed Feb 22 15:48:28 CET 2017)

installed to HDD with no pre-existing partitions in legacy bios system

chose "use free space".

no partition error advised.

fixed for me
Comment 65 Martin Whitaker 2017-02-24 23:56:58 CET
Unfortunately, adding the 60-block.rules file to stage 2 does appear to have caused problems. Many testers are seeing the

"I cannot read the partition table of device sda, it's too corrupted for me:("

message when the installer first examines the disks. I see this on one of my systems here, although only intermittently (twice in eight tries). I suspect what is happening is that when the installer examines the partition table on the disk, that triggers the udev rule and causes the kernel to rescan the partition table. Unfortunately this coincides with the installer reading /proc/partitions to verify the kernel has the same view of the partitions, which can catch /proc/partitions in a partially updated state.

After patching the ISO to remove the 60-block.rules file from stage 2, I saw no failures in 40 tries of starting the installer.

Fixing this any other way will be hard. 'udevadm control --stop-exec-queue' does not stop udevd from calling the BLKRRPART ioctl.
Comment 66 Nicolas Lécureuil 2017-02-24 23:59:32 CET
maybe we can revert for sta2 and see with you and thierry what is best
Comment 67 Martin Whitaker 2017-02-25 00:11:26 CET
(In reply to Nicolas Lécureuil from comment #66)
> maybe we can revert for sta2 and see with you and thierry what is best

I agree with that.
Comment 68 Thierry Vignaud 2017-02-25 08:21:38 CET
Yeah, I'm reverting the offending commit

Martin, please:
1) your code for 'use strict' (missing "my", see perl_checker)
See http://gitweb.mageia.org/software/drakx/tree/perl-install/fsedit.pm?h=user/martinw/mga6#n558
2) then rebase over origin/master
Comment 69 Mageia Robot 2017-02-25 08:40:57 CET
commit 72f762b72f81f033a2cf999cb2353d611e4ba44d
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Sat Feb 25 08:07:36 2017 +0100

    Revert "- Add 60-block.rules in the installer (mga#20074)"
    
    This reverts commit bdac0595323a6417fb7365038c9777cde0060f58.
    
    Many testers are seeing the
    
    "I cannot read the partition table of device sda, it's too corrupted for me:("
    
    message when the installer first examines the disks.
    It's likely that when the installer examines the partition table on the
    disk, that triggers the udev rule and causes the kernel to rescan the
    partition table. Unfortunately this coincides with the installer reading
    /proc/partitions to verify the kernel has the same view of the
    partitions, which can catch /proc/partitions in a partially updated state.
    
    After patching the ISO to remove the 60-block.rules file from stage 2,
    no failures happen in the installer.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=72f762b72f81f033a2cf999cb2353d611e4ba44d
Comment 70 Mageia Robot 2017-02-25 15:15:37 CET
commit a5473711818b2519552561b33c3f181ab1bbfde6
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 19:38:55 2017 +0000

    Ensure the kernel doesn't rescan a partially written partition table (mga#20074).
    
    When no partitions on a DOS-partitioned disk are mounted, the kernel
    automatically rescans the partition table when the file handle to the
    raw device is released. Currently the code opens and closes the raw
    device when writing the primary partition table and when writing each
    extended partition table segment. As the extended partition table
    segments form a linked list, this allows the kernel to get in and
    rescan the table when the list is not in a coherent state. This patch
    changes the code to open the raw device before writing the primary
    partition table and to close it only after writing the last extended
    partition table segment.
    
    The behaviour for other partition table types is unchanged.
    
    v2 (tvignaud): simplify by moving copies of noop funcs into the base class
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=a5473711818b2519552561b33c3f181ab1bbfde6
Comment 71 Mageia Robot 2017-02-25 15:15:43 CET
commit 6f0880e11d1f83ca239c823cece00fc7d034dddb
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:00:20 2017 +0000

    Don't tell the kernel about partition table changes when it rescans them automatically (mga#20074).
    
    When no partitions on a DOS-partitioned disk are mounted, the kernel
    automatically rescans the partition table when it is written to disk.
    We shouldn't then try to update the kernel's view of the partition
    table, as the list of deltas we have recorded is relative to the
    previous state of the partition table, not the newly rescanned state.
    
    The behaviour for other partition table types is unchanged.
    
    v2 (tvignaud): just make base class assume the kernel doesn't reread, only mbr
    subclass overrides need_to_tell_kernel() in order to be smarter
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=6f0880e11d1f83ca239c823cece00fc7d034dddb
Comment 72 Mageia Robot 2017-02-25 15:15:52 CET
commit 0d79e34dad3062575746f97c613f0e643a2d8b29
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:23:18 2017 +0000

    Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074).
    
    To minimise the changes this close to mga6 release (and until we fix
    the bug in partition auto-allocation that mistakenly creates BIOS boot
    partitions non-GPT disks), reuse the BIOS_GRUB flag for flagging Empty
    partitions, as the real partition ID (0x00) is used to flag free space.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=0d79e34dad3062575746f97c613f0e643a2d8b29
Comment 73 Mageia Robot 2017-02-25 15:15:57 CET
commit 4b3e4de0e72b4034613c9834f7a60c4822828d6b
Author: Martin Whitaker <mageia@...>
Date:   Sat Jan 21 20:33:43 2017 +0000

    Don't treat Empty or BIOS_GRUB partitions as free space (mga#20074).
    
    The subroutine isEmpty() is used to identify free space on the disk
    (not DOS "Empty" partitions). Because we use a string to flag Empty
    and BIOS_GRUB partitions, rather than a numeric value, the numeric
    equality operator gives false positives.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=4b3e4de0e72b4034613c9834f7a60c4822828d6b
Comment 74 Mageia Robot 2017-02-25 15:16:03 CET
commit 3643baad41e7237b9d6acbe524220cde85961772
Author: Martin Whitaker <mageia@...>
Date:   Sat Feb 4 17:11:10 2017 +0000

    Fix erratic behaviour when telling kernel to delete partitions (mga#20074).
    
    When telling the kernel about changes to a DOS partition table, if a
    partition was deleted on a disk that also contained an Empty partition,
    the kernel also removed the Empty partition from its cached partition
    table (and renumbered the other partitions).
    
    Experimentation showed that leaving the udev exec queue active whilst we
    were telling the kernel about the changes fixed this problem, although I
    don't have an explanation for why it does.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=3643baad41e7237b9d6acbe524220cde85961772
Comment 75 Mageia Robot 2017-02-25 15:16:22 CET
commit 37a48c57d0c295e29049d23d1d37a800e1ee24af
Author: Martin Whitaker <mageia@...>
Date:   Mon Feb 20 21:42:27 2017 +0000

    Always tell the kernel about partition table changes when running the classic installer (mga#20074).
    
    The automatic rescan of the partition table is triggered by udevd. The
    udev rule that causes this is not present on the cut-down system that
    runs the classic installer.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=37a48c57d0c295e29049d23d1d37a800e1ee24af
Comment 76 Thierry Vignaud 2017-02-27 08:31:15 CET
Closing

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 77 Mageia Robot 2017-09-07 21:15:12 CEST
commit 102339802fe667e53d517f522d24bb9e4c652019
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Sun Jul 2 16:39:00 2017 +0200

    include part of udev's hwdb (mga#20327)
    
    Let's hope it fixes touchpad in installer.
    
    Such a missing file might also explain why adding the udev rule wasn't
    enough for mga#20074.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=102339802fe667e53d517f522d24bb9e4c652019

 Bug links:
   Mageia
      https://bugs.mageia.org/20327
      https://bugs.mageia.org/20074
Comment 78 Nicolas Lécureuil 2017-09-07 22:06:11 CEST
is this valid for mageia 6 ? ( in case we do a mga 6.1 later )
Comment 79 Thierry Vignaud 2017-09-08 03:40:22 CEST
As noted, this is just a hint that a similar missing file could have helped this file. Nothing was done regarding _this_ bug report.

Note You need to log in before you can comment on or make changes to this bug.