+++ This bug was initially created as a clone of Bug #19935 +++ (Custom disk partitioning requests BIOS boot partition on non-GPT disk, creating one and continuing leads to a corrupted partition table and thus unusable disk.) Martin Whitaker discovered that the corrupted partition table is a separate issue, see: https://bugs.mageia.org/show_bug.cgi?id=19935#c27 > The real problem is that the BIOS boot partition is given the Empty > partition type. In some places Empty partitions are assigned partition > numbers, in other places they aren't, so the drakx view of partition numbers > gets out of step with the kernel. > > This doesn't just apply to drakx; when booting or when using the partx > command to reload the kernel partition table, Empty partitions get assigned > partition numbers, but when using the partprobe command to reload the kernel > partition table, they don't.
After a lot of experimentation, I think there are two distinct bugs: 1) The drakx code uses a partition type of Empty (0x00) to tag free regions (i.e. sectors that aren't in any partition. So when a real partition with a partition type of Empty is found, this gets treated as a free region, instead of being preserved as a real partition. The cleanest fix for this (IMO) would be to use a special (and non-valid) partition type to tag free regions. But this would require quite widespread changes to the code. A simpler and safer alternative is to detect Empty partitions and convert their partition type to a special value. Simplest of all is to reuse the special 'BIOS_GRUB' value, as at the moment auto-allocation still requires/creates BIOS boot partitions on non-GPT disks (which get converted to Empty partitions when the partition table is written). @Thierry, WDYT? 2) When a partition table is written to disk, something triggers the kernel to automatically reload its partition table cache (I assumed this was a udev rule, but running 'udevadm control --stop-exec-queue' doesn't seem to delay it). When one of the partitions is an Empty partition, the kernel then doesn't properly process subsequent requests to add or delete partitions from its cache (via the BLKPG ioctl). I've no idea what's really going on here (and the behaviour is quite erratic), but running 'udevadm settle' before issuing the add or delete requests seems to fix it. Of course it may just be the extra delay which is "fixing" the problem.
More experimentation shows my 'udev settle' fix for the second issue is not reliable. Unless somebody has an idea for how to fix this, I think the only safe thing is to detect Empty partitions when reading a DOS partition table and refuse to continue, and to refuse to write DOS partition tables containing Empty partitions. This would also require improvements to the partitioning wizard so that it no longer creates BIOS boot partitions for non-GPT disks.
Did you see my post Fri to <qa-discuss@ml.mageia.org> about re-starting the stage2 after|if you hit the 'BIOS boot' will allows Custom partitioning to complete wihout a BIOS boot? If you think it might offer insight I can include attachments both created during the same install: report1.bug-----created when I hit the BIOS boot error report2.bug-----created after re-starting stage2 and making it through Custom partitioning and am at the package selection stage I have tested the i586 Mga6-stg2 install in this manner 7 or 8 times and the procedure has worked every time.
(In reply to Charles Edwards from comment #3) > Did you see my post Fri to <qa-discuss@ml.mageia.org> about re-starting the > stage2 after|if you hit the 'BIOS boot' will allows Custom partitioning to > complete wihout a BIOS boot? I did, but there's an easy fix for the problem with Custom (manual) partitioning (https://bugs.mageia.org/attachment.cgi?id=8834 attached to bug 19935). The problem is that if you use any of the auto-partitioning schemes, they still create a BIOS boot partition, which gets translated into an Empty partition on a non-GPT disk. If you subsequently use diskdrake, it doesn't recognise Empty partitions, leading either to it crashing, or worse, corrupting the partition table.
(In reply to Martin Whitaker from comment #1) > 2) When a partition table is written to disk, something triggers the kernel > to automatically reload its partition table cache (I assumed this was a udev > rule, but running 'udevadm control --stop-exec-queue' doesn't seem to delay > it). When one of the partitions is an Empty partition, the kernel then > doesn't properly process subsequent requests to add or delete partitions > from its cache (via the BLKPG ioctl). I've no idea what's really going on > here (and the behaviour is quite erratic), but running 'udevadm settle' > before issuing the add or delete requests seems to fix it. Of course it may > just be the extra delay which is "fixing" the problem. What's going on here is that if none of the partitions of a DOS-partitioned disk are currently mounted, the kernel automatically rescans the partition table after it has been written. There is then a race between the kernel performing this rescan and diskdrake telling it what changes have been made. Because diskdrake describes the changes as a sequence of add or delete operations, it needs to win the race (as that sequence is relative to the initial state of the partition table, not the newly rescanned state). The automatic rescan appears to be triggered when the raw device is closed after being written to. This leads to a further problem. diskdrake currently writes the DOS partition information as a sequence of open/write/close operations. The first in the sequence writes the primary partition table, the second in the sequence writes the first extended partition table, and so on. Each step in the sequence triggers a rescan of the partition table, but because the extended partition tables form a linked list, rescanning when only part of the list has been written can lead to the kernel reading invalid information, which gets it even more confused. There doesn't appear to be any automatic rescanning performed for GPT disks, so it's only DOS-partitioned disks that suffer from this bug. Following is a sequence of patches that I believe fix all the above bugs (along with the reordering patch from bug 19935 which has already been applied). Having received no guidance, I've gone for the simplest solution for handling Empty partitions (as described in comment 1). For fixing the race, I've made diskdrake detect whether no partitions are mounted, and if so, rely on the automatic rescan rather than telling the kernel about the changes. I've done a fair amount of testing on both DOS and GPT partitioned disks in VirtualBox without seeing any problems. I've left the behaviour for BSD/Mac/Sun partitioned disks unchanged, as I know nothing about those. N.B. When testing this, people need to be aware that diskdrake preferentially displays the file system type it finds in a partition, rather than the base partition type. So if you change a partition type (or delete a partition and create a new one staring at the same sector) but don't format it, you'll still see the old file system type. This is particularly noticeable with Empty partitions, as they aren't formatted. It might be worth making diskdrake ignore any file system it finds in an Empty partition.
Keywords: (none) => PATCH
Created attachment 8880 [details] Patch to prevent kernel rescanning part way though writing the partition table
Created attachment 8881 [details] Patch to avoid race by not telling kernel about changes when it will automatically rescan the table
Created attachment 8882 [details] Patch to preserve Empty partitions rather than treating them as free space
Created attachment 8883 [details] Patch to preven Empty or BIOS_GRUB partitions being counted as free space
Comment on attachment 8880 [details] Patch to prevent kernel rescanning part way though writing the partition table It might be worth to amend your commit chlog in order to describe more the actual changes: - adding a start_write() API returning a file handle one the raw device - write() takes that file handle as parameter - end_write() closes the file handle Or even better document it as POD in partition_table.pm I just added some basic POD to it as an example
See "perldoc ./partition_table.pm" for testing As for the 2 latest patches, I disagree with them & I would prefer to fix the real bug. Where do you see that BIOS_GRUB part on MBR disk? Under UEFI with a GPT & a MBR partitionned disk?
(In reply to Thierry Vignaud from comment #10) > I just added some basic POD to it as an example Did you forget to attach this? (In reply to Thierry Vignaud from comment #11) > See "perldoc ./partition_table.pm" for testing > > As for the 2 latest patches, I disagree with them & I would prefer to fix > the real bug. There are two real bugs: 1) If an Empty partition exists on a DOS partitioned disk, diskdrake will silently delete it (and its view of the partitions then gets out of sync with the kernel's view). This should be fixed, regardless of whether or not we fix the other bug. See comment 1 for alternatives. At this stage of mga6 development, I think the cleanest fix is too big a change, but it could go on the todo list for mga7. 2) The partitioning wizard still creates BIOS boot partitions (which translate to Empty) partitions on DOS partitioned disks. This also should be fixed, but if we fix (1) then it is at least harmless. > Where do you see that BIOS_GRUB part on MBR disk? > Under UEFI with a GPT & a MBR partitionned disk? To reproduce: 1. Boot a Live DVD ISO in Virtual Box with an attached empty virtual hard disk. 2. Create and format a FAT32 partition at the beginning of the disk. 3. Run draklive-install, select custom partitioning, and click on the 'Auto allocate' button. Unless you've applied the above patches, you won't see that a BIOS boot partition has been created, but fdisk will show it, e.g. [root@linux live]# fdisk -l /dev/sda Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000 Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 8390655 8388608 4G b W95 FAT32 /dev/sda2 8392671 41929649 33536979 16G 5 Extended /dev/sda5 8392704 8401994 9291 4.5M 0 Empty /dev/sda6 8404992 36772784 28367793 13.5G 83 Linux /dev/sda7 36775936 41929649 5153714 2.5G 82 Linux swap / Solaris I would expect the same behaviour if you choose the "Use available space" option. This occurs because the is_boot_bios_part_needed() can't identify the root device until a /boot or / partition have been allocated (and defaults to assuming a GPT disk), and the BIOS boot partition is allocated first.
1) See http://gitweb.mageia.org/software/drakx/log/?id=0a19f082255e3f581074ca82d25cce826f59d370 2) So this bug is specific to draklive-install ?
(In reply to Thierry Vignaud from comment #13) > > 2) So this bug is specific to draklive-install ? No, see bug 20161 where I reproduced it twice with QA's current 32bit CI
(In reply to Marja van Waes from comment #14) > (In reply to Thierry Vignaud from comment #13) > > > > > 2) So this bug is specific to draklive-install ? > > No, see bug 20161 where I reproduced it twice with QA's current 32bit CI And the other three bugs described here, which the patches address, affect diskdrake, draklive-install, and the classic installer.
Created attachment 8901 [details] Patch to add POD documentation for new partition table write API
Created attachment 8906 [details] Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v2) I rewritten 2nd patch to be less intrusive Also I feared about LVM and the like...
Attachment 8881 is obsolete: 0 => 1
Created attachment 8907 [details] Patch to add POD documentation for new partition table write API (v2) enhanced doc
Attachment 8901 is obsolete: 0 => 1
Created attachment 8908 [details] Patch to prevent kernel rescanning part way though writing the partition table (v2) simplify by moving copies of noop funcs into the base class
Created attachment 8909 [details] Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v3) use the right parent class :-(
@ Martin, if you think the patch set looks good as is, could you try to apply it on a test ISO so that we can test how it turns out?
(In reply to Rémi Verschelde from comment #21) > @ Martin, if you think the patch set looks good as is, could you try to > apply it on a test ISO so that we can test how it turns out? I could, but don't we need to fix bug 20161 too? I could provide a patch for that as well, but really need an answer to my question (bug 20161, comment 17).
Created attachment 8926 [details] Patch to avoid race by not telling kernel about changes when it will automatically rescan the table (v4) Fix comments - the default is to assume the kernel *doesn't* reread the partition table.
Attachment 8909 is obsolete: 0 => 1 Attachment 8906 is obsolete: 0 => 1
Created attachment 8927 [details] Patch to fix another partition table sync problem Further testing using diskdrake in Vbox showed no problems when no partitions were mounted and the kernel was automatically rescanning the partition table, but showed there was still a problem when a partition was mounted and diskdrake was telling the kernel about the changes. This only manifested when there was an Empty partition present on a DOS partitioned disk. Deleting another partition would cause the kernel to also forget about the Empty partition (and renumber the remaining partitions). By experimentation I found that if I removed the udevadm ctrl --stop-exec-queue and udevadm ctrl --start-exec-queue from tell_kernel(), this bug went away. I can't explain why this works - you'll need a kernel expert for that.
(In reply to Rémi Verschelde from comment #21) > @ Martin, if you think the patch set looks good as is, could you try to > apply it on a test ISO so that we can test how it turns out? I've rebuilt the Live ISOs in the QA mageia6-pretesting rsync directory to include all the above patches.
(In reply to Martin Whitaker from comment #25) > (In reply to Rémi Verschelde from comment #21) > > @ Martin, if you think the patch set looks good as is, could you try to > > apply it on a test ISO so that we can test how it turns out? > > I've rebuilt the Live ISOs in the QA mageia6-pretesting rsync directory to > include all the above patches. Thanks, Martin. I'm in Brussels for Fosdem, now, but intend to travel home soon to test the patches later today or this evening. However, if you do already have enough feedback from other testers, then please say so :-)
With the 32bit live XFCE iso from last night: Even with two empty partitions, one of them at the beginning of the disk, the other on the extended partition, there's no sign of partition table corruption :-)
Status comment: (none) => Fixed in the pre-release QA isos from February 5
Status comment: Fixed in the pre-release QA isos from February 5 => Fixed in the pre-release QA Live isos from February 5, not yet in CI isos
All of my testing with the pre-testing iso images indicates the patches do fix the problem. Need clarification on whether the patches only apply to the installer, or whether diskdrake needs to be fixed too, and have the patches been applied in git, etc.
CC: (none) => davidwhodgins
CC: (none) => wilcal.int
(In reply to Dave Hodgins from comment #28) > Need clarification on whether the patches only apply to the installer, or > whether diskdrake needs to be fixed too, and have the patches been applied > in git, etc. The patches apply to common code shared by the installer and diskdrake, so both are fixed. The only outstanding issue with diskdrake is the need to prevent Xfce/Thunar automounting partitions whilst diskdrake is working (see bug 20247). The patches have not been applied in git. Thierry said in comment 11 that he disagreed with some of them (although he didn't explain why). Without his approval, I won't push anything. Note that the patches attached to bug 20161 and bug 20164 also need to be applied.
(In reply to Martin Whitaker from comment #29) > (In reply to Dave Hodgins from comment #28) > > Need clarification on whether the patches only apply to the installer, or > > whether diskdrake needs to be fixed too, and have the patches been applied > > in git, etc. > > The patches apply to common code shared by the installer and diskdrake, so > both are fixed. The only outstanding issue with diskdrake is the need to > prevent Xfce/Thunar automounting partitions whilst diskdrake is working (see > bug 20247). > > The patches have not been applied in git. Thierry said in comment 11 that he > disagreed with some of them (although he didn't explain why). Without his > approval, I won't push anything. > > Note that the patches attached to bug 20161 and bug 20164 also need to be > applied. @ martinw, tv, pterjan, ennael Wouldn't it be better to let Martin create a separate git branch of software/drakx, so that it is easier for ennael to use the most recent proposed patches when building classical isos, and easier for pterjan and tv to review the patches and then merge them when found to be OK? Is pushing a separate branch possible for martinw, or is sysadmin intervention needed for that? @ pterjan Since diskdrake is/was mostly your baby: would you have time to review all the patches? (Yeah, they're in several bug reports now :-( )
Status comment: Fixed in the pre-release QA Live isos from February 5, not yet in CI isos => Fixed in the pre-release QA Live isos from February 5, not yet in CI isos. Patches not in git, because they haven't been accepted
When the patches are eventually applied to git, the packages rebuilt, and the iso images regenerated, we will have to do testing to ensure that has all been done properly. Please go ahead and apply them to the main git repo, and let's get the rest of the work underway. Until that's done, sta2 is on hold, indefinitely. If problems are found at during that testing, we'll deal with those problems then.
(In reply to Dave Hodgins from comment #31) > Please go ahead and apply them to the main git repo, and let's get the rest > of the work underway. Until that's done, sta2 is on hold, indefinitely. OK, I'll do this today. I don't like bypassing the maintainers, but lacking any response, what else can we do.
or as a test you could create a temporary branch to apply all those patches. Then we can test it and at least give some feedbacks
commit 105d6bea5e69ee7011e5c468ba3cd05eb999b359 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 19:38:55 2017 +0000 Ensure the kernel doesn't rescan a partially written partition table (mga#20074). When no partitions on a DOS-partitioned disk are mounted, the kernel automatically rescans the partition table when the file handle to the raw device is released. Currently the code opens and closes the raw device when writing the primary partition table and when writing each extended partition table segment. As the extended partition table segments form a linked list, this allows the kernel to get in and rescan the table when the list is not in a coherent state. This patch changes the code to open the raw device before writing the primary partition table and to close it only after writing the last extended partition table segment. The behaviour for other partition table types is unchanged. v2 (tvignaud): simplify by moving copies of noop funcs into the base class --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=105d6bea5e69ee7011e5c468ba3cd05eb999b359
commit d274ab5b4c48d0a91a535c976a9e7a9a78b56f05 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:00:20 2017 +0000 Don't tell the kernel about partition table changes when it rescans them automatically (mga#20074). When no partitions on a DOS-partitioned disk are mounted, the kernel automatically rescans the partition table when it is written to disk. We shouldn't then try to update the kernel's view of the partition table, as the list of deltas we have recorded is relative to the previous state of the partition table, not the newly rescanned state. The behaviour for other partition table types is unchanged. v2 (tvignaud): just make base class assume the kernel doesn't reread, only mbr subclass overrides need_to_tell_kernel() in order to be smarter --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=d274ab5b4c48d0a91a535c976a9e7a9a78b56f05
commit 631a08ffe0557b1df214a9d08c7c01745eb18107 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:23:18 2017 +0000 Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074). To minimise the changes this close to mga6 release (and until we fix the bug in partition auto-allocation that mistakenly creates BIOS boot partitions non-GPT disks), reuse the BIOS_GRUB flag for flagging Empty partitions, as the real partition ID (0x00) is used to flag free space. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=631a08ffe0557b1df214a9d08c7c01745eb18107
commit 9f0f880c31175e4941dde0c0facee61ca8b11ef4 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:33:43 2017 +0000 Don't treat Empty or BIOS_GRUB partitions as free space (mga#20074). The subroutine isEmpty() is used to identify free space on the disk (not DOS "Empty" partitions). Because we use a string to flag Empty and BIOS_GRUB partitions, rather than a numeric value, the numeric equality operator gives false positives. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=9f0f880c31175e4941dde0c0facee61ca8b11ef4
commit 3a76f08fbdd7dc83dc260e6af4f482c600774576 Author: Martin Whitaker <mageia@...> Date: Sat Feb 4 17:11:10 2017 +0000 Fix erratic behaviour when telling kernel to delete partitions (mga#20074). When telling the kernel about changes to a DOS partition table, if a partition was deleted on a disk that also contained an Empty partition, the kernel also removed the Empty partition from its cached partition table (and renumbered the other partitions). Experimentation showed that leaving the udev exec queue active whilst we were telling the kernel about the changes fixed this problem, although I don't have an explanation for why it does. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=3a76f08fbdd7dc83dc260e6af4f482c600774576
(In reply to Anne Nicolas from comment #33) > or as a test you could create a temporary branch to apply all those patches. > Then we can test it and at least give some feedbacks You're the boss ;-) I've pushed all my fixes to the branch user/martinw/mga6 in software/drakx. This includes fixes for bug 16055, bug 19888, bug 19979, bug 20161, bug 20164, and bug 20247 as well as for this bug. I've also include Thierry's patch for bug 14699 so we can test it on the next round of Live ISOs. Note that the fixes for bugs 14699 and 20247 are the only ones not on the last round of Live ISOs, but I have slightly modified one other fix because I found there was still a lingering race with the kernel updating its view of the partition table. I've tested all this locally, including tests with a mixed DOS and GPT system, and not observed any problems other than bug 20264. I've put a source RPM built from the head of my branch on rabbit in /home/martinw/drakxtools-17.72-0.mga6.src.rpm (using -0 to indicate this is a test build) Note that these bugs affect diskdrake as well as the installer, so you'll want new RPMs as well as a patched stage 2 for the classic ISOs.
Thanks a lot for all your work Martin :) I'll test it tonight in a build.
pterjan told on IRC that he is way too busy to review the patches in http://gitweb.mageia.org/software/drakx/log/?h=user/martinw/mga6 any time soon.
(In reply to Martin Whitaker from comment #5) > What's going on here is that if none of the partitions of a DOS-partitioned > disk are currently mounted, the kernel automatically rescans the partition > table after it has been written. There is then a race between the kernel > performing this rescan and diskdrake telling it what changes have been made. > Because diskdrake describes the changes as a sequence of add or delete > operations, it needs to win the race (as that sequence is relative to the > initial state of the partition table, not the newly rescanned state). I've finally tracked down what causes the kernel to rescan the partition table. Whenever udevd receives a change event for a raw disk device it is watching, it asks the kernel to reread the partition table by calling the BLKRRPART ioctl. BLKRRPART is only successful if none of the partitions on that device are currently mounted. The /lib/udev/rules.d/60-block.rules file is what causes the raw disk devices to be watched by udevd. This file is not present in the cut-down system used to run the classic installer, so my previous fix for this issue needs to be modified to not apply when running the classic installer. This accounts for the "Internal error: Unknown device sdaX" bug seen when testing the Feb 16th CI ISOs (as reported in bug 20247 comment #21). I'll push a fix for this to the user/martinw/mga6 branch as soon as I've tested it doesn't break the Live installer.
(In reply to Martin Whitaker from comment #42) > (In reply to Martin Whitaker from comment #5) > > I'll push a fix for this to the user/martinw/mga6 branch as soon as I've > tested it doesn't break the Live installer. Thanks a lot Martin. I'll pick it to rebuild test stage2 and rebuild test isos
commit ca43ff9c8b9a53cb1d4d68173794556519bac257 Author: Martin Whitaker <mageia@...> Date: Mon Feb 20 21:42:27 2017 +0000 Always tell the kernel about partition table changes when running the classic installer (mga#20074). The automatic rescan of the partition table is triggered by udevd. The udev rule that causes this is not present on the cut-down system that runs the classic installer. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=ca43ff9c8b9a53cb1d4d68173794556519bac257
Created attachment 8972 [details] report.bug Mageia-6-sta2-x86_64-DVD.iso build 20170220 12:49 still valid install to HDD with no partition table. partitioning option: "erase and use entire disk"
CC: (none) => westel
(In reply to Martin Whitaker from comment #42) Nope the right fix is to include 60-block.rules in installer. See http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list.xml#n73
(In reply to Thierry Vignaud from comment #46) > (In reply to Martin Whitaker from comment #42) > Nope the right fix is to include 60-block.rules in installer. > See > http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list. > xml#n73 I think the test for its existence should remain, in case someone removes it again in the future. It would be better to directly test whether udevd is watching the devices, but I can't find any way to achieve that.
(In reply to Martin Whitaker from comment #47) > I think the test for its existence should remain, in case someone removes it > again in the future. I agree. So what's left to do is to add 60-block.rules to that list.xml Thierry linked in comment 46. Could one of you do it asap so that Anne can get the updated drakx for her upcoming classical ISOs? Then let's hope the addition doesn't introduce regressions, but at least we'll have more consistency between live and classical ISOs, so that's a sane change to do anyway IMO.
If Thierry has no objections, can we also merge my branch and do a proper release?
isos are near to be available for QA with your changes, i think that if tests are OK we will be able to merge ( don't shoot me thierry ;) )
CC: (none) => mageia
(In reply to Thierry Vignaud from comment #46) > (In reply to Martin Whitaker from comment #42) > Nope the right fix is to include 60-block.rules in installer. > See > http://gitweb.mageia.org/software/drakx/tree/perl-install/install/share/list. > xml#n73 how to check if this is available or not ? i can check on latest isos
commit bdac0595323a6417fb7365038c9777cde0060f58 Author: Nicolas Lécureuil <neoclust@...> Date: Wed Feb 22 12:11:45 2017 +0100 - Add 60-block.rules in the installer (mga#20074) --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=bdac0595323a6417fb7365038c9777cde0060f58
(In reply to Mageia Robot from comment #52) > commit bdac0595323a6417fb7365038c9777cde0060f58 > Author: Nicolas Lécureuil <neoclust@...> > Date: Wed Feb 22 12:11:45 2017 +0100 > > - Add 60-block.rules in the installer (mga#20074) > --- > Commit Link: > > http://gitweb.mageia.org/software/drakx/commit/ > ?id=bdac0595323a6417fb7365038c9777cde0060f58 Shouldn't that be committed to Martin's branch, too (since that is the one used for the isos)?
(In reply to Marja van Waes from comment #53) > (In reply to Mageia Robot from comment #52) > > commit bdac0595323a6417fb7365038c9777cde0060f58 > > Author: Nicolas Lécureuil <neoclust@...> > > Date: Wed Feb 22 12:11:45 2017 +0100 > > > > - Add 60-block.rules in the installer (mga#20074) > > --- > > Commit Link: > > > > http://gitweb.mageia.org/software/drakx/commit/ > > ?id=bdac0595323a6417fb7365038c9777cde0060f58 > > Shouldn't that be committed to Martin's branch, too (since that is the one > used for the isos)? Personally I think we've given Martin's patches enough time for a preliminary review. The first tests have been successful, so I think we can merge to master, and if some of the patches need to be improved or reverted, that's not catastrophic either; commits don't cost much :P
(In reply to Rémi Verschelde from comment #54) > (In reply to Marja van Waes from comment #53) > > > > Shouldn't that be committed to Martin's branch, too (since that is the one > > used for the isos)? > > Personally I think we've given Martin's patches enough time for a > preliminary review. The first tests have been successful, so I think we can > merge to master, and if some of the patches need to be improved or reverted, > that's not catastrophic either; commits don't cost much :P I'm fine with merging now, before building new isos. That would make all translation updates available in the isos, too :-)
Agreed too
(In reply to Marja van Waes from comment #53) > (In reply to Mageia Robot from comment #52) > > commit bdac0595323a6417fb7365038c9777cde0060f58 > > Author: Nicolas Lécureuil <neoclust@...> > > Date: Wed Feb 22 12:11:45 2017 +0100 > > > > - Add 60-block.rules in the installer (mga#20074) > > --- > > Commit Link: > > > > http://gitweb.mageia.org/software/drakx/commit/ > > ?id=bdac0595323a6417fb7365038c9777cde0060f58 > > Shouldn't that be committed to Martin's branch, too (since that is the one > used for the isos)? i mailed martin about my commit :) Anne's isos are done against martin's branch ( + my commit ) ( for the next iso )
commit c1affe2ea3ec6d38f153d06ae2503a43430403f2 Author: Nicolas Lécureuil <neoclust@...> Date: Wed Feb 22 12:11:45 2017 +0100 - Add 60-block.rules in the installer (mga#20074) --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=c1affe2ea3ec6d38f153d06ae2503a43430403f2
I don't have much time (one baby and another one is coming) whereas Martin has obviously spent some time on testing and finding solutions, so I think we should merge his work. But there's a couple small things that annoy me: ================================================== - I'm a bit disappointed we've both auto_allocate_bios_boot_parts() and fs::any::is_boot_bios_part_needed() but Martin seems to have usage cases for relaxing the later so... - I don't think it's an issue to mount ESP when not in EFI (we do mount windows/... partitions so why not ESP). Just my 2 cents - A real issue IMHO: we now abuse 'BIOS_GRUB' flag for empty partitions: * "Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074)" I would prefer we fix the root issue instead of the symptoms... Obviously harder so maybe for a later cleanup Then here's more real peer review: ================================== 1) About "Fix auto-allocation of BIOS boot partitions (mga#20161, mga#19888)": I find a little bit ugly to do a loop on disk when we really are looking at a disk I fear there's a risk of allocating several partitions if a bug would happen later. I think auto_allocate_bios_boot_parts() shoud first find the right device then allocate the partition outside the loop. Also I would make more obvious the test for missing parameter. Last but not least, the code doesn't pass with 'use strict' pragma (undeclared variables, check with perl_checker) So I suggests sg like this (untested): sub auto_allocate_bios_boot_parts { my ($all_hds, $o_hd) = @_; return if !$o_hd; # only proceed with the selected device: my ($hd) = grep { $o_hd->{device} ne $_->{device} } @{$all_hds->{hds}}; # only proceed if the selected device is GPT: return if ($hd->{pt_table_type} || partition_table::default_type($hd)) ne 'gpt'; # check if a BIOS boot partition already exists my @parts = map { partition_table::get_normal_parts($_) } $hd; return if any { isBIOS_GRUB($_) } @parts; # try to allocate a BIOS boot partition my $suggest = { mntpoint => "", size => MB(1), pt_type => 'BIOS_GRUB', ratio => 1, maxsize => MB(2) }; allocatePartitions($all_hds, [ $suggest ], $hd); } I think it's clearer/more obvious this way. 2) About "Allow a BIOS boot partition to be located on any disk (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks would be relaxed but it introduces a new predicate: "all disks are (or will be) GPT" That's too much a restriction IMHO. We should support a system with GPT disks & MBR disks. We just have to only consider GPT disks when checking whether a new ESP is needed 3) About "Exclude x11-driver-video-vmware from unneeded packages (mga#19979)", as I said in the bug report, we still have a very old issue b/c unlike what tmb though, this is not preventing some packages to be removed by draklive-install (whether or not they actually could be removed on the installed system). So a later urpme --auto-orphan will still offer to remove them Obviously this is not a block for this merge but in eventually draklive-install should: - tell urpmi there's packages that needs to be kept (what a manual "urpmi x11-driver-video-vmware" would do) - do not blindly keep some packages, but do so according to the installed HW See https://bugs.mageia.org/show_bug.cgi?id=19979#c4 4) About "Always tell the kernel about partition table changes when running the classic installer", we should remove the added test for the udev rule as it's now included in installer, so the commit should only update the comment but for "This file is not present in the cut-down system used to run the classic installer, so we always need to tell the kernel in that case." that's no more true 9) One last thing though: I would prefer user/martinw/mga6 branch to be rebased on current master before being merged (we tend to prefer a linear history for drakx). You'll need to fix */NEWS conflict though... ============================================ Again, thanks Martin for the hard work. Though, as noted in my above (1) point, you should run perl_checker on your code before committing, it would catch such things as new variables without declaration ("my $foo"), ... If you use emacs, look at top of /usr/share/doc/perl_checker/perl_checker.html for how to use perl_checker within emacs. If you use vim, see /usr/bin/perl_checker-vim (though I don't think anybody has used perl_checker in vim for 10 years...) See also /usr/share/doc/perl-MDK-Common/tutorial.html It really helps in finding stupid mistakes in changes before pushing though from a quick check, your other commits should be OK. Also, I should share with you the perl-Parted bindings I worked on but I never have time to test more of partition_table::gpt with it, so it's stupid it's resting in its place for 2 years... It would enables us to write all parts in a GPT at once w/o udev actions on each part....
Argh, obviously my suggestions should have been: sub auto_allocate_bios_boot_parts { my ($all_hds, $o_hd) = @_; return if !$o_hd; # only proceed with the selected device: my ($hd) = grep { $o_hd->{device} ne $_->{device} } @{$all_hds->{hds}}; return if !$hd; # only proceed if the selected device is GPT: return if ($hd->{pt_table_type} || partition_table::default_type($hd)) ne 'gpt'; # check if a BIOS boot partition already exists my @parts = map { partition_table::get_normal_parts($_) } $hd; return if any { isBIOS_GRUB($_) } @parts; # try to allocate a BIOS boot partition my $suggest = { mntpoint => "", size => MB(1), pt_type => 'BIOS_GRUB', ratio => 1, maxsize => MB(2) }; allocatePartitions($all_hds, [ $suggest ], $hd); }
(In reply to Thierry Vignaud from comment #59) Thanks for taking the time to review this Thierry. Some responses to your points: > - A real issue IMHO: > we now abuse 'BIOS_GRUB' flag for empty partitions: > * "Preserve DOS "Empty" partitions instead of treating them as free space > (mga#20074)" > I would prefer we fix the root issue instead of the symptoms... > Obviously harder so maybe for a later cleanup I agree, but as I've mentioned before, a clean solution will touch a lot more of the code, risking introducing new bugs, so I don't want to attempt this until we've got Mageia-6 out. > 1) About "Fix auto-allocation of BIOS boot partitions (mga#20161, > mga#19888)": I find a little bit ugly to do a loop on disk when we really > are looking at a disk I fear there's a risk of allocating several partitions > if a bug would happen later. > I think auto_allocate_bios_boot_parts() shoud first find the right device > then allocate the partition outside the loop. I believe there is still a potential use case where there isn't a selected disk (bug 16055 comment 12) and auto-allocation will spread across multiple disks. As in that case we don't know which disk should be the boot device, my solution was to ensure there was a BIOS boot partition on all GPT disks, so the user could have a free choice where to install the boot loader. A bit wasteful maybe, but BIOS boot partitions are very small. With your code, a BIOS boot partition won't get allocated in this case. > 2) About "Allow a BIOS boot partition to be located on any disk > (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks > would be relaxed but it introduces a new predicate: "all disks are (or will > be) GPT" > That's too much a restriction IMHO. > We should support a system with GPT disks & MBR disks. We just have to only > consider GPT disks when checking whether a new ESP is needed I think this is a misunderstanding. is_boot_bios_part_needed() will only ever return true if *all* disks are GPT - so in a mixed GPT and MBR system, the user will not be forced to create a BIOS boot partition because grub2 can be installed on the MBR disk(s). > 4) About "Always tell the kernel about partition table changes when running > the classic installer", we should remove the added test for the udev rule as > it's now included in installer, so the commit should only update the comment > but for "This file is not present in the cut-down system used to run the > classic installer, so we always need to tell the kernel in that case." > that's no more true I still think the extra check should be kept for safety (in case somebody removes that rule file again in the future), but yes, the comment now needs to be updated. > 9) One last thing though: I would prefer user/martinw/mga6 branch to be > rebased on current master before being merged (we tend to prefer a linear > history for drakx). You'll need to fix */NEWS conflict though... Will do. > Though, as noted in my above (1) point, you should run perl_checker on your > code before committing, it would catch such things as new variables without > declaration ("my $foo"), ... When I run 'make test_pms' I get thousands of messages like: File "/usr/share/perl_checker/fake_packages/Gtk3/WebKit2.pm", line 124, character 25-25 you should not have a space here which makes it hard to see the real problems! No doubt there's a simple fix for this, but there's always been more urgent things to look at... > Also, I should share with you the perl-Parted bindings I worked on but I > never have time to test more of partition_table::gpt with it, so it's stupid > it's resting in its place for 2 years... > It would enables us to write all parts in a GPT at once w/o udev actions on > each part.... Yes, I'll take a look at that, although again that looks like one to save for mga7.
(In reply to Martin Whitaker from comment #61) > > 1) About "Fix auto-allocation of BIOS boot partitions (mga#20161, > > mga#19888)": I find a little bit ugly to do a loop on disk when we really > > are looking at a disk I fear there's a risk of allocating several partitions > > if a bug would happen later. > > I believe there is still a potential use case where there isn't a selected > disk (bug 16055 comment 12) and auto-allocation will spread across multiple > disks. As in that case we don't know which disk should be the boot device, > my solution was to ensure there was a BIOS boot partition on all GPT disks, > so the user could have a free choice where to install the boot loader. A bit > wasteful maybe, but BIOS boot partitions are very small. With your code, a > BIOS boot partition won't get allocated in this case. Humm... I don't like much creating BIOS boot partition on all GPT disks but if text mode doesn't provide a selected disk, we've no choice. And it's 1MB only. But I'm worried about which 1Mb part grub2 will use and if the user removes the disk where resides the BIOS boot partition chosen by grub2 then there'll be some sysadmin job... At least, fix the bugs found by perl_checker then.. > > 2) About "Allow a BIOS boot partition to be located on any disk > > (mga#20161)", a previous commit says is_boot_bios_part_needed()'s checks > > would be relaxed but it introduces a new predicate: "all disks are (or will > > be) GPT" > > That's too much a restriction IMHO. > > We should support a system with GPT disks & MBR disks. We just have to only > > consider GPT disks when checking whether a new ESP is needed > > I think this is a misunderstanding. is_boot_bios_part_needed() will only > ever return true if *all* disks are GPT - so in a mixed GPT and MBR system, > the user will not be forced to create a BIOS boot partition because grub2 > can be installed on the MBR disk(s). And I still disagree: does someone actually tested it? B/c under UEFI, we only allow to install the bootloader... on the ESP... Under UEFI, we don't allow the user to pick where to install the bootloader... Thus we'll never pass eg /dev/sdb to grub2-install. We'll just call grub2-install. Which will try to install on ESP, which means on a GPT partitioned disk, which means it may fail if there's no BOOT BIOS How is that supposed to work if there's no ? > I still think the extra check should be kept for safety (in case somebody > removes that rule file again in the future), but yes, the comment now needs > to be updated. I disagree. This is like having both belt and suspenders. We've to be more confident :-) I would rather add a comment in install/share/list.xml so that it doesn't disappear :-) I've started explained why stuff is included but it should be done too when someone adds a new element I don't like adding random check for some stuff. Either we check for every thing that's in list.xml or we don't. Up to now, we properly maintained it, we don't blindly remove things, so I think we're in the clear. > > Though, as noted in my above (1) point, you should run perl_checker on your > > code before committing, it would catch such things as new variables without > > declaration ("my $foo"), ... > > When I run 'make test_pms' I get thousands of messages like: This is why I advertise you use perl_checker from within your preferred "IDe" (aka emacs or vim). This way you can go directly to the right line Else you can manually run "perl_checker --restrict-to-files" The rule of thumb is to only look at perl_checker output regarding the file you're altering. Also you can compare its output before/after your changes. We can discuss perl_checker privately if you want (mail).
(In reply to Thierry Vignaud from comment #62) > > I think this is a misunderstanding. is_boot_bios_part_needed() will only > > ever return true if *all* disks are GPT - so in a mixed GPT and MBR system, > > the user will not be forced to create a BIOS boot partition because grub2 > > can be installed on the MBR disk(s). > > And I still disagree: does someone actually tested it? Yes, me ;-) > B/c under UEFI, we only allow to install the bootloader... on the ESP... > Under UEFI, we don't allow the user to pick where to install the > bootloader... > Thus we'll never pass eg /dev/sdb to grub2-install. > We'll just call grub2-install. Which will try to install on ESP, which means > on a GPT partitioned disk, which means it may fail if there's no BOOT BIOS > How is that supposed to work if there's no ? The BIOS boot partition is only needed for non-UEFI boot. In that case, when you get to the boot loader install screen, you can use the drop-down menu to select which disk grub2 gets installed on. And from my tests, it does appear to be clever enough to only offer you the choice of disks where it can actually be installed (either DOS partitioned or GPT partitioned with a BIOS boot partition). > > I still think the extra check should be kept for safety (in case somebody > > removes that rule file again in the future), but yes, the comment now needs > > to be updated. > > I disagree. This is like having both belt and suspenders. > We've to be more confident :-) I won't fight you on this, but I'm all for belt and braces! In my job, bugs can't be easily fixed, so I likely have a different perspective on things.
for : Mageia-6-sta2-x86_64-DVD. (Wed Feb 22 15:48:28 CET 2017) installed to HDD with no pre-existing partitions in legacy bios system chose "use free space". no partition error advised. fixed for me
Unfortunately, adding the 60-block.rules file to stage 2 does appear to have caused problems. Many testers are seeing the "I cannot read the partition table of device sda, it's too corrupted for me:(" message when the installer first examines the disks. I see this on one of my systems here, although only intermittently (twice in eight tries). I suspect what is happening is that when the installer examines the partition table on the disk, that triggers the udev rule and causes the kernel to rescan the partition table. Unfortunately this coincides with the installer reading /proc/partitions to verify the kernel has the same view of the partitions, which can catch /proc/partitions in a partially updated state. After patching the ISO to remove the 60-block.rules file from stage 2, I saw no failures in 40 tries of starting the installer. Fixing this any other way will be hard. 'udevadm control --stop-exec-queue' does not stop udevd from calling the BLKRRPART ioctl.
maybe we can revert for sta2 and see with you and thierry what is best
(In reply to Nicolas Lécureuil from comment #66) > maybe we can revert for sta2 and see with you and thierry what is best I agree with that.
Yeah, I'm reverting the offending commit Martin, please: 1) your code for 'use strict' (missing "my", see perl_checker) See http://gitweb.mageia.org/software/drakx/tree/perl-install/fsedit.pm?h=user/martinw/mga6#n558 2) then rebase over origin/master
commit 72f762b72f81f033a2cf999cb2353d611e4ba44d Author: Thierry Vignaud <thierry.vignaud@...> Date: Sat Feb 25 08:07:36 2017 +0100 Revert "- Add 60-block.rules in the installer (mga#20074)" This reverts commit bdac0595323a6417fb7365038c9777cde0060f58. Many testers are seeing the "I cannot read the partition table of device sda, it's too corrupted for me:(" message when the installer first examines the disks. It's likely that when the installer examines the partition table on the disk, that triggers the udev rule and causes the kernel to rescan the partition table. Unfortunately this coincides with the installer reading /proc/partitions to verify the kernel has the same view of the partitions, which can catch /proc/partitions in a partially updated state. After patching the ISO to remove the 60-block.rules file from stage 2, no failures happen in the installer. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=72f762b72f81f033a2cf999cb2353d611e4ba44d
commit a5473711818b2519552561b33c3f181ab1bbfde6 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 19:38:55 2017 +0000 Ensure the kernel doesn't rescan a partially written partition table (mga#20074). When no partitions on a DOS-partitioned disk are mounted, the kernel automatically rescans the partition table when the file handle to the raw device is released. Currently the code opens and closes the raw device when writing the primary partition table and when writing each extended partition table segment. As the extended partition table segments form a linked list, this allows the kernel to get in and rescan the table when the list is not in a coherent state. This patch changes the code to open the raw device before writing the primary partition table and to close it only after writing the last extended partition table segment. The behaviour for other partition table types is unchanged. v2 (tvignaud): simplify by moving copies of noop funcs into the base class --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=a5473711818b2519552561b33c3f181ab1bbfde6
commit 6f0880e11d1f83ca239c823cece00fc7d034dddb Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:00:20 2017 +0000 Don't tell the kernel about partition table changes when it rescans them automatically (mga#20074). When no partitions on a DOS-partitioned disk are mounted, the kernel automatically rescans the partition table when it is written to disk. We shouldn't then try to update the kernel's view of the partition table, as the list of deltas we have recorded is relative to the previous state of the partition table, not the newly rescanned state. The behaviour for other partition table types is unchanged. v2 (tvignaud): just make base class assume the kernel doesn't reread, only mbr subclass overrides need_to_tell_kernel() in order to be smarter --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=6f0880e11d1f83ca239c823cece00fc7d034dddb
commit 0d79e34dad3062575746f97c613f0e643a2d8b29 Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:23:18 2017 +0000 Preserve DOS "Empty" partitions instead of treating them as free space (mga#20074). To minimise the changes this close to mga6 release (and until we fix the bug in partition auto-allocation that mistakenly creates BIOS boot partitions non-GPT disks), reuse the BIOS_GRUB flag for flagging Empty partitions, as the real partition ID (0x00) is used to flag free space. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=0d79e34dad3062575746f97c613f0e643a2d8b29
commit 4b3e4de0e72b4034613c9834f7a60c4822828d6b Author: Martin Whitaker <mageia@...> Date: Sat Jan 21 20:33:43 2017 +0000 Don't treat Empty or BIOS_GRUB partitions as free space (mga#20074). The subroutine isEmpty() is used to identify free space on the disk (not DOS "Empty" partitions). Because we use a string to flag Empty and BIOS_GRUB partitions, rather than a numeric value, the numeric equality operator gives false positives. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=4b3e4de0e72b4034613c9834f7a60c4822828d6b
commit 3643baad41e7237b9d6acbe524220cde85961772 Author: Martin Whitaker <mageia@...> Date: Sat Feb 4 17:11:10 2017 +0000 Fix erratic behaviour when telling kernel to delete partitions (mga#20074). When telling the kernel about changes to a DOS partition table, if a partition was deleted on a disk that also contained an Empty partition, the kernel also removed the Empty partition from its cached partition table (and renumbered the other partitions). Experimentation showed that leaving the udev exec queue active whilst we were telling the kernel about the changes fixed this problem, although I don't have an explanation for why it does. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=3643baad41e7237b9d6acbe524220cde85961772
commit 37a48c57d0c295e29049d23d1d37a800e1ee24af Author: Martin Whitaker <mageia@...> Date: Mon Feb 20 21:42:27 2017 +0000 Always tell the kernel about partition table changes when running the classic installer (mga#20074). The automatic rescan of the partition table is triggered by udevd. The udev rule that causes this is not present on the cut-down system that runs the classic installer. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=37a48c57d0c295e29049d23d1d37a800e1ee24af
Closing
Status: NEW => RESOLVEDResolution: (none) => FIXED
commit 102339802fe667e53d517f522d24bb9e4c652019 Author: Thierry Vignaud <thierry.vignaud@...> Date: Sun Jul 2 16:39:00 2017 +0200 include part of udev's hwdb (mga#20327) Let's hope it fixes touchpad in installer. Such a missing file might also explain why adding the udev rule wasn't enough for mga#20074. --- Commit Link: http://gitweb.mageia.org/software/drakx/commit/?id=102339802fe667e53d517f522d24bb9e4c652019 Bug links: Mageia https://bugs.mageia.org/20327 https://bugs.mageia.org/20074
is this valid for mageia 6 ? ( in case we do a mga 6.1 later )
As noted, this is just a hint that a similar missing file could have helped this file. Nothing was done regarding _this_ bug report.