Bug 33787 - Diskdrake do not handle swap fully - should do, or warn and explain. (critical)
Summary: Diskdrake do not handle swap fully - should do, or warn and explain. (critical)
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: High critical
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL: https://forums.mageia.org/en/viewtopi...
Whiteboard: MGA9TOO
Keywords: FOR_ERRATA9
: 22748 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-11-22 00:52 CET by Morgan Leijström
Modified: 2024-12-19 10:22 CET (History)
3 users (show)

See Also:
Source RPM: drakxtools-18.65-1.mga9
CVE:
Status comment:


Attachments

Description Morgan Leijström 2024-11-22 00:52:58 CET
Description of problem:
dsikdrake do not maintain VERY important aspects of swap.
That easily leads to unbootable system.

It should either warn user before doing the action, or then required handling be implemented.

Note that the installer do set everything correctly, even working for hibernate+restore.


Version: drakxtools-18.65-1.mga9 and earlier, for years
I have hit this several times but not until now understood the mechanism.


How reproducible: always.
Examples here https://forums.mageia.org/en/viewtopic.php?t=15513#p90696
and two in Bug 22748, which i close now because this is better, fresh.


Steps to Reproduce:

One scenario:
You have swap already, like set up by the installer.
Then you want to have the swap on another disk, so you run diskdrake,
delete the swap, and make swap on the other disk.
-> Next boot ends at dracut prompt. THIS IS SEVERE
I.e not even maintenance prompt - hard to fix even for somewhat advanced user.
Reason: /etc/dracut.conf.d/51-mageia-resume.conf still point to removed swap.

It would be good if someone can describe good way to fix this for average users.
I can think easiest is to boot a Mageia Live and simply delete 51-mageia-resume.conf, so the main system can boot to desktop and continue fixing from there?

Other scenario:
You do not have swap.
You use diskdrake to create swap.
It seem to work, and iven i.e Plasma offer hibernation option.
Going into hibernation the session is saved.
Problem: it is not restored, as the file /etc/dracut.conf.d/51-mageia-resume.conf do not exist.  -> User lost that session.

Workaround: 
1) manually create that file with correct content
  On the system I write this on (using LVM) it is one line:
add_device+=" /dev/vg-republic/lv_swap "
  and then run: dracut -f --regenerate-all
2) update the relevant line in /etc/default/grub- Example:
GRUB_CMDLINE_LINUX_DEFAULT="splash quiet noiswmd resume=/dev/vg-republic/lv_swap audit=0 vga=791"
  and then run: update-grub2

---

Complementary fixing: Is there some way to tweak kernel/whatever so the system skip trying to use swap instead of aborting to dracut prompt?
Morgan Leijström 2024-11-22 00:55:14 CET

Priority: Normal => High
Keywords: (none) => FOR_ERRATA9
Whiteboard: (none) => MGA9TOO

Comment 1 Morgan Leijström 2024-11-22 00:56:27 CET
*** Bug 22748 has been marked as a duplicate of this bug. ***
Florian Hubold 2024-11-22 01:28:20 CET

CC: (none) => doktor5000

Stephen Germany 2024-11-22 01:52:32 CET

CC: (none) => stephengermany

Comment 2 Florian Hubold 2024-11-22 02:05:56 CET
(In reply to Morgan Leijström from comment #0)

> Complementary fixing: Is there some way to tweak kernel/whatever so the
> system skip trying to use swap instead of aborting to dracut prompt?

Those are 2 issues AFAICT. kernel handling resume= cmdline parameter should still continue booting as if hibernation data is not present or corrupt, like for a regular fresh boot.

For the initrd, if a device is passed with add_device that is not present during boot, I believe there's no option to have something like "nofail" option (used for filesystems in /etc/fstab that are not essential to the system and which should not fail the boot).

Also dracut itself is only used to build the initrd, it is not involved during boot.

Apart from that see bug 12305 for some more context information.

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=12305

Comment 3 Lewis Smith 2024-11-24 21:39:42 CET
 https://forums.mageia.org/en/viewtopic.php?t=15513#p90692

morgano wrote:
    It fails me to find where, except fstab (which is correct) the path to swap is recorded.
    And how to correct it.
    After correcting that, I guess one need to run dracut -f to fix the kernel command line parameter resume= ?

doctor5000 replied:

There are basically two other places apart from fstab. One is the UUID in /etc/dracut.conf.d/51-mageia-resume.conf where it also needs to be updated manually, afterwards you need to run:
    dracut -f --regenerate-all

The other place is in /etc/default/grub where the resume= option is normally set to swap by default and it needs to be updated there as well, afterwards
run update-grub2 to actually write the bootloader config, as otherwise you'd still be able to boot but not be able to resume
-----------------------------------------------
It seems that these issues are what diskdrake should address.
-------------------------------------------------------------

Messing with swap on a running system is dicey.
Safer ? :
End all applications.

Re-boot.
 # swapoff [-a] [specialfile...]
disables swapping on the specified devices and files.
The device or file used is given by the specialfile parameter. It may
be of the form -L label or -U uuid to indicate a device by label or uuid.
When the -a flag is given, swapping is disabled on all known swap devices and
files (as found in /proc/swaps or /etc/fstab)

Use diskdrake to delete the old swap partition
[? Re-boot]
Use diskdrake to define the new swap partition.
[?  # swapon]
Re-boot.
Check fstab, /etc/dracut.conf.d/51-mageia-resume.conf, /etc/default/grub
We know now that the last two may not be done, so do doctor5000's fixes; and
Re-boot

Assigning this mageiatools.

Assignee: bugsquad => mageiatools

Comment 4 Morgan Leijström 2024-11-25 13:42:17 CET
Adding docteam to add some kind of heads-up on the documentation of diskdrake in case the problem is not solved for Mageia 10.

CC: (none) => doc-bugs

Comment 5 Morgan Leijström 2024-12-15 21:56:17 CET
Another variant of this:
I had a system with two swap partitions.
Using diskdrake I removed the one *not* after "resume=" in kernel command line.
I checked /etc/fstab got correctly updated

But booting drops to dracut shell saying it can not find that removed partition.

(Workaround: boot a live system and recreate that partition, same name/UUID whatever was used.  Then booted to diskdrake, removed partition again, and executed dracut -f --regenerate-all)

This was an extra swap partition in LVM on LUKs encrypted PV, all set up by installer+diskdrake.

I wonder if this hits also when removing other kinds of partitions.
Comment 6 Morgan Leijström 2024-12-19 10:22:45 CET
(In reply to Florian Hubold from comment #2)
> kernel handling resume= cmdline parameter should
> still continue booting as if hibernation data is not present or corrupt,
> like for a regular fresh boot.

It seems this was fixed and tested on regular partitions in Bug 12305 Comment 39, and pushed.  But not tested with LVM nor RAID or encryption.

Note You need to log in before you can comment on or make changes to this bug.