Description of problem: 1) About to hibernate, it decides there is not enough room - But there really should be! 2) When hibernating fail, system usually returns to desktop, but that fails here, seem to be amdgpu problems. feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: Creating image: feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: Need to copy 939437 pages feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: Normal pages needed: 939437 + 1024, available pages: 873467 feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: Not enough free memory feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: Error -12 creating image Strange it thought it was too little space, I had only Plasma and small apps up, RAM 8GB (minus some for GPU), 16GB swap, confirmed with swapon. feb 09 22:06:10 hp1.tribun systemd-sleep[6506]: Failed to put system to sleep. System resumed again: Cannot allocate memory feb 09 22:06:10 hp1.tribun kernel: PM: hibernation: hibernation exit feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun systemd[1]: Started Load/Save RF Kill Switch Status. feb 09 22:06:10 hp1.tribun systemd[1]: Reached target Bluetooth Support. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:10 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32774. feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32786. feb 09 22:06:11 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: TLB flush failed for PASID 32786. etc, etc many lines feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card0/device/devcoredump/data feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx timeout, signaled seq=70285, emitted seq=70288 feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Process Xwayland pid 2751 thread Xwayland:cs0 pid 2769 ... feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume feb 09 22:06:21 hp1.tribun kernel: [drm] PCIE GART of 1024M enabled. feb 09 22:06:21 hp1.tribun kernel: [drm] PTB located at 0x000000F43FC00000 feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: VRAM is lost due to GPU reset! feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: PSP is resuming... feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0x400000 from 0xf43f800000 for PSP TMR feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available feb 09 22:06:21 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:153 vmid:0 pasid:0) feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x00001080c0032000 from IH client 0x1b (UTCL2) feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000D32 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPG (0x6) feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: resume of IP block <gfx_v9_0> failed -110 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -110 feb 09 22:06:22 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: GPU Recovery Failed: -110 ... feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card0/device/devcoredump/data feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx timeout, signaled seq=70288, emitted seq=70288 feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: Process Xwayland pid 2751 thread Xwayland:cs0 pid 2769 feb 09 22:06:32 hp1.tribun kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!. Source: 1 ... it never gets up. I used REISUB to shut down in blind. How reproducible: Always on this system. The only system so far I have installed mga10 on. Mga 10 installed a couple days ago, fully updated. This is a new machine to me, it have not seen another Linux. Steps to Reproduce: 1. Boot to desktop 2. Tell it to hibernate (I used Plasma menu) 3. black screen, user can not save = work lost Suspend-resume and shutting down works. [morgan@hp1 ~]$ inxi -SMCG System: Host: hp1.tribun Kernel: 6.18.9-desktop-1.mga10 arch: x86_64 bits: 64 Desktop: KDE Plasma v: 6.5.5 Distro: Mageia 10 Machine: Type: Laptop System: HP product: HP Pavilion Laptop 15-cw0xxx v: N/A serial: <superuser required> Mobo: HP model: 84E7 v: 99.30 serial: <superuser required> Firmware: UEFI vendor: AMI v: F.50 date: 11/18/2022 CPU: Info: quad core model: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx bits: 64 type: MT MCP cache: L2: 2 MiB Speed (MHz): avg: 1617 min/max: 1600/2000 cores: 1: 1617 2: 1617 3: 1617 4: 1617 5: 1617 6: 1617 7: 1617 8: 1617 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Mobile Series] driver: amdgpu v: kernel Device-2: Lite-On HP Wide Vision HD Camera driver: uvcvideo type: USB Display: wayland server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9 compositor: kwin_wayland driver: X: loaded: amdgpu,v4l dri: radeonsi gpu: amdgpu resolution: 1920x1080~60Hz API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast platforms: gbm,wayland,x11,surfaceless,device API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.3.5 renderer: AMD Radeon Vega 8 Graphics (radeonsi raven ACO DRM 3.64 6.18.9-desktop-1.mga10) API: Vulkan v: 1.4.335 drivers: radv,llvmpipe surfaces: N/A Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo de: kscreen-console,kscreen-doctor wl: wayland-info x11: xdpyinfo, xprop, xrandr
Maybe space is not contiguous. 6.18.8 was affected?
CC: (none) => ghibomgx
Cant kernel handle hibernating to several small swap partitions/files? Anyhow, this swap is all one large swap partition, as usual in my systems set up by diskdrake as one of the lv in LVM together with /home and /, the vg using only one pv, encrupted by LUKS. Yes same problem with 6.18.8. Only tested desktop flavour. Going to try linus flavour. Thought of trying kernel 6.12 but it seems removed from repos. BTW If it is intended to be cleaned completely also the virtualbox-kernel and xtables-addons for 6.12 should be removed! BTW, it seem the first commnads of REISUB works, all way to Sync, Unmount i dont know, but it do not Reboot at R.
(In reply to Morgan Leijström from comment #2) > Cant kernel handle hibernating to several small swap partitions/files? > > Anyhow, this swap is all one large swap partition, as usual in my systems > set up by diskdrake as one of the lv in LVM together with /home and /, the > vg using only one pv, encrupted by LUKS. > > Yes same problem with 6.18.8. Only tested desktop flavour. > Going to try linus flavour. > > Thought of trying kernel 6.12 but it seems removed from repos. > BTW If it is intended to be cleaned completely also the virtualbox-kernel > and xtables-addons for 6.12 should be removed! I've not asked to be cleaned for 6.12, but it was (probably by some mistake in the complex list), so can't upload it anymore. Regarding the swap partition, you can have as many as swap partitions as you want but they won't be combined into the single huge "virtual swap partition" where to distribute the hibernation. On hibernation only one is used as candidate where to hibernate. The one is choosen from the list of swap areas to have enough free space to hold the RAM or the one specified with "resume=" on the boot cmdline. If a swap is already partitally used (i.e. your system is already swapping) it will reduce the amount of space available for the hibernation, and when there is not enough the hibernation will fail. There is not a way to privilege a swap partition to be used exclusively for hibernation. I thought I was weird but it seems so: by definition there is not a "safe" way to do so. You can "preserve" a swap partition large enough to be inserted (using swapon) to the system on the latest phases before hibernation, but if the system will be fast enough to "swap" there and fill that newer swap area before you can issue the hibernation, it will fail too. In the past there was an external patchset called "tux-on-ice" where the hibernation partition could be specified as exclusive (and not used by swap), but it's no longer maintained since several years.
Thank you for the explanation on swap. So what goes wrong here? "Normal pages needed: 939437 + 1024, available pages: 873467" Even shortly after boot to desktop, only launched Firefox, konsole, Mageiawelcome, nextcloud-client, KDEconnect and whatever normal parts of Plasma. RAM: 8G, Swap partition: 16G. Next i will try to make an obscenely big swap. --- (In reply to Morgan Leijström from comment #2) > BTW, it seem the first commnads of REISUB works, all way to Sync, Unmount i > dont know, but it do not Reboot at R. Typo: should wrote: " Reboot at B" But what is going on really? Testing REISUB while observing "journalctl -f" run as toot in konsole at desktop:For every letter in REISUB: kernel: sysrq: This sysrq operation have been disbled Except "S" that it confirms perform Emergency Sync. Why are the other commends blocked? How is REISUB to be performed?
For the swap, you might create for instance a 32GB swap, unlisted or commented in /etc/fstab, but listed in resume= at boot cmdline (check also initramfs don't have hardcoded resume lines with different partitions). For the magic sysrq, I think it's enabled. zcat /proc/config.gz | grep MAGIC_SYSRQ returns it's enabled in kernel. Current value can be get with: sysctl kernel.sysrq or cat /proc/sys/kernel/sysrq You might need to set to 1, e.g. with sysctl -w kernel.sysrq=1. If it's a different value, then it's rewritten by somewhere else, e.g. in newer systemd default, but that's another bug.
(In reply to Giuseppe Ghibò from comment #5) > For the swap, you might create for instance a 32GB swap, unlisted or > commented in /etc/fstab, but listed in resume= at boot cmdline (check also > initramfs don't have hardcoded resume lines with different partitions). > of course that 32GB swap should be enabled on demand using swapon <swap partition>.
Yep, kernel.sysrq was set to 16 -> Bug 35116 - Magic Sysrq commands are mostly disabled (REISUB)
BTW, there is also here a 6.18.9-2.2, which will contains some of the fixes of the next 6.18.10: https://download.copr.fedorainfracloud.org/results/ghibo/mageia10-bonus/mageia-cauldron-x86_64/10112137-kernel/
I used diskdrake to create an additional swap 32GB, not listed in fstab, and added in kernel boot command line. Result: same fail as before with slightly different number of pages (lower!?) available. Now i swapoff -a, used diskdrake to remove both swap partitions, rebooted and tried to make a fresh giant 100 GB swap using diskdrake, fail: feb 10 22:42:13 hp1.tribun diskdrake[4638]: running: lvm2 lvcreate --size 103729152k -n lv_swap100 vg-hp1 feb 10 22:42:13 hp1.tribun drakconf[4858]: Rounding up size to full physical extent <98,93 GiB feb 10 22:42:13 hp1.tribun diskdrake[4638]: error: lvcreate failed: Aborting. Failed to wipe start of new LV. ...whatever that means - apart from I have no swap at all until fixed. -> for another day...
Now it is past midnight... ;-) [root@hp1 ~]# lvm2 lvcreate --verbose --size 103729152k -n lv_swap100 vg-hp1 Rounding up size to full physical extent <98,93 GiB Creating logical volume lv_swap100 Archiving volume group "vg-hp1" metadata (seqno 19). Activating logical volume vg-hp1/lv_swap100. activation/volume_list configuration setting not defined: Checking only host tags for vg-hp1/lv_swap100. Creating vg--hp1-lv_swap100 Loading table for vg--hp1-lv_swap100 (253:3). Resuming vg--hp1-lv_swap100 (253:3). Wiping known signatures on logical volume vg-hp1/lv_swap100. Found existing signature on /dev/vg-hp1/lv_swap100 at offset 4086: LABEL="(null)" UUID="f2f09f30-1d7d-431e-a50c-dae48ff00009" TYPE="swap" USAGE="other" WARNING: swap signature detected on /dev/vg-hp1/lv_swap100 at offset 4086. Wipe it? [y/n]: y Accepted input: [y] Wiping swap signature on /dev/vg-hp1/lv_swap100. Initializing 4,00 KiB of logical volume vg-hp1/lv_swap100 with value 0. Logical volume "lv_swap100" created. Creating volume group backup "/etc/lvm/backup/vg-hp1" (seqno 20). Wonder why signature got bad :( The M.2 SSD is almost new, and I had also recently performed RAM check. I have in the past had problems when removing swap using diskdrake (reported several years ago), so maybe this is another variant. OOPS that made an ext4 filesystem... Fired up diskdrake, removed it and it could create the 100 GB swap. Rebooted. Realise i have to edit kernel boot command line, did so, rebooted. Back to desktop, verify kernel boot command line: BOOT_IMAGE=/vmlinuz-6.18.9-desktop-1.mga10 root=/dev/mapper/vg--hp1-lv_hp1root ro noiswmd audit=0 resume=/dev/vg-hp1/lv_swap100 vga=791 Verified swap is big and active, and not used at all: [root@hp1 ~]# swapon NAME TYPE SIZE USED PRIO /dev/dm-3 partition 102,5G 0B -2 [root@hp1 ~]# LC_ALL=C free total used free shared buff/cache available Mem: 6993600 2521500 3589920 59568 1185740 4472100 Swap: 107519996 0 107519996 So less than 3 GB RAM used running: Plasma, Firefox, gkrellm, kwrite, konsole, kdeconnect, nextcloud-client Fail during hibernation like before: feb 10 23:56:46 hp1.tribun kernel: PM: hibernation: Need to copy 1052576 pages feb 10 23:56:46 hp1.tribun kernel: PM: hibernation: Normal pages needed: 1052576 + 1024, available pages: 760325 feb 10 23:56:46 hp1.tribun kernel: PM: hibernation: Not enough free memory It must calculate pages wrong.
Same fail with similar page counts for both 6.18.9-1 linus flavour, and kernel-server-6.18.9-2.2.mga10-1-1.mga10
Just to try something i changed kernel boot line resume=/dev/vg-hp1/lv_swap100 to resume=/dev/mapper/vg--hp1-lv_swap100 -> no change in result. I guess page size is 4K. Needed (1052576 + 1024)×4096/1024^3 = 4.0 Gbyte I guess that is including video RAM and maybe more in addition to the 2,5G used RAM that the free command reports. But still, image is supposed to be compressed? Available 760325×4096/1024^3 = 2.9 Gbyte Active swap is over 100 Gbyte and no byte used before hibernation...
Comment 0 : 16 GB unused swap -> available pages: 873467 Comment 10: 102 GB unused swap -> available pages: 760325 What, where, how does it measure available pages? Are there other ways, a command, to check available pages? Except the swap partition, this system have partitions: /dev/mapper/vg--hp1-lv_hp1root 29G 22G 5,9G 79% / /dev/sda2 458M 285M 144M 67% /boot /dev/mapper/vg--hp1-lv_hp1home 96G 612M 95G 1% /home /dev/sda1 98M 162K 98M 1% /boot/EFI
what if the swap partition is not in a LVM?
I "always" use this setup with one encrypted pv, used for LVM containing /, /home, swap. And it works also so hibernating Mageia 9 using backport kernel 6.18 on two other laptops i have. Would be interesting to try swap outside LVM, but there is no space left on drive outside LVM. I could mabe use a fast USB just to try, but i borked the system and got tired of this temprarily: Tried to investigate further by changing to ridiculously small swap and see what happens (available pages should be small - or kernel really measures wrong) but diskdrake fouled up so system is not bootable. (it now looks for the old name and timeouts to debug shell, maybe it fouled up fstab, or LVM) Reported diskdrake on various similar problems before, but with no developer handling it, it is futile. I can probably fix it if i try, - LUKS and LVM complicating it a bit, I can boot on a USB, to work - but got tired of this now and gotta $work... Anyhow, I am thinking we have two indications: 1) approximately same number of pages available regardless of swap size 2) after removing swap, it can not be created because of "debris" I am thinking either: A) There is a physical problem at a point in swap partition. B) diskdrake and tools together set it up wrongly, I.e not formatting it correctly I have no more time for this now. I plan to wait for next beta ISO and then make a fresh install (to test installer), and that time create also the swap during install (for this bug I had created swap after install as I at install time thought maybe I should make a swap file instead. Maybe I will also tell it to check blocks on disk. Or i first heck whole disk while booted on a USB. But I also want to test the swap when created. Is there some tool to exercise the swap partition? I unmount it and perform full read-write check, in similar way kernel use it? - How? (If i just try to launch many app i will have other problems when nearing full, such as OOM killer) If then hibernation works, good. But it also means i will not test the amdgpu problem. (I will NOT play with changing swap, this machine is intended for field work use in a couple weeks)
(In reply to Morgan Leijström from comment #15) > I "always" use this setup with one encrypted pv, used for LVM containing /, > /home, swap. And it works also so hibernating Mageia 9 using backport > kernel 6.18 on two other laptops i have. > I was just wondering if there was been a mga10 setup (e.g. rd.lvm=0 at boot) where lvm was disabled at boot. BTW, here is the twin of the 6.18.9-2.2 for mga9: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/10112138-kernel/
OK test kernel desktop 6.18.9-2.2 mga9 on my Thinkpad T510 Mageia 9 system Partitioning is similar, swap in LVM on LUKS, notable difference is this swap is smaller than RAM [ettan@localhost ~]$ LC_ALL=C free -h total used free shared buff/cache available Mem: 7,6Gi 2,2Gi 4,0Gi 38Mi 1,5Gi 5,2Gi Swap: 5,9Gi 0B 5,9Gi Part of journal when hibernating: feb 12 15:12:59 localhost kernel: PM: hibernation: Creating image: feb 12 15:12:59 localhost kernel: PM: hibernation: Need to copy 841018 pages feb 12 15:12:59 localhost kernel: PM: hibernation: Normal pages needed: 841018 + 1024, available pages: 1219966 And it hibernates and restores successfully. --- Pondering if maybe I should try to repair this system in this bug (broke in Comment 15) for continued testing about what really goes wrong. --- @Giuseppe, I note the naming scheme of that mga9 kernel is same as mga9 standard kernel. Do you intend to release a 6.18 as normal update on mga9?
(In reply to Morgan Leijström from comment #17) > OK test kernel desktop 6.18.9-2.2 mga9 on my Thinkpad T510 Mageia 9 system > Partitioning is similar, swap in LVM on LUKS, notable difference is this > swap is smaller than RAM > > [ettan@localhost ~]$ LC_ALL=C free -h > total used free shared buff/cache > available > Mem: 7,6Gi 2,2Gi 4,0Gi 38Mi 1,5Gi > 5,2Gi > Swap: 5,9Gi 0B 5,9Gi > > Part of journal when hibernating: > feb 12 15:12:59 localhost kernel: PM: hibernation: Creating image: > feb 12 15:12:59 localhost kernel: PM: hibernation: Need to copy 841018 pages > feb 12 15:12:59 localhost kernel: PM: hibernation: Normal pages needed: > 841018 + 1024, available pages: 1219966 > > And it hibernates and restores successfully. So, it could be that the lvm modules is not available at boot in your mga10 setup? what is your boot cmdline in mga10? cat /proc/cmdline? > > --- > > Pondering if maybe I should try to repair this system in this bug (broke in > Comment 15) for continued testing about what really goes wrong. > > --- > > @Giuseppe, > I note the naming scheme of that mga9 kernel is same as mga9 standard kernel. > Do you intend to release a 6.18 as normal update on mga9? Not at the moment, there should be a 6.6.124|5 at some point. The naming scheme of the 6.18.9-2.2.mga9 was keepen as mga10 (i.e. oldnaming scheme where you can quickly install|remove up and down, smootly and that's different than in 6.6.120). It was a quick build, and I've not renamed as -stabletesting, nor had time to be adapted a few things to be customized specifically for mga9 (e.g. mga9 and mga10 differs on "dmesg" -> in mga9 "dmesg" works as plain user, in mga10 it requires root privileges, bug #37771).
(In reply to Giuseppe Ghibò from comment #18) > So, it could be that the lvm modules is not available at boot in your mga10 > setup? It is the hibernation that fail because it for unknown reason thinks there is too little space. All partitions /, /home, swap are in LVM, and working before I attempt to hibernate. An dnormal boot with / in encrypted LVM is without problems. I'll try to repair the system using 10beta1 installer ISO and remove and recreate swap, keeping the rest, see how that goes. --- Any idea about the other part of the problem: amdgpu not returning to useable state after kernel decide it can not hibernate? Description
> Description -> Comment 0
I ended up reinstalling... (First I tried to reuse partitions but diskdrake stopped while creating swap after i deleted the existing one...) * So: this is now a full fresh install using 10b1round1 with updates over wifi during install, Plasma, all default packages * Same machine. __Fresh partitioning: /boot/EFI /boot LUKS-LVM-{/, /home, swap} [ettan@hp1 ~]$ LC_ALL=C free -h total used free shared buff/cache available Mem: 6.7Gi 2.3Gi 3.5Gi 55Mi 1.2Gi 4.4Gi Swap: 11Gi 0B 11Gi Now there is another fault: it saves no image! then shuts off, loosing session. feb 12 22:06:10 hp1.tribun systemd-logind[1208]: The system will hibernate now! feb 12 22:06:10 hp1.tribun kwin_wayland[2615]: Failed to delay sleep: The operation inhibition has been requested for is already running feb 12 22:06:11 hp1.tribun plasmashell[2828]: error: "Misslyckades ansluta till PipeWire-sammanhang" 0 feb 12 22:06:11 hp1.tribun systemd[1]: Reached target Sleep. feb 12 22:06:11 hp1.tribun systemd[1]: Starting System Hibernate... feb 12 22:06:11 hp1.tribun systemd[1]: session-c2.scope: Unit now frozen-by-parent. feb 12 22:06:11 hp1.tribun systemd[1]: user.slice: Unit now frozen. feb 12 22:06:11 hp1.tribun systemd[1]: user-1000.slice: Unit now frozen-by-parent. feb 12 22:06:11 hp1.tribun systemd-sleep[4204]: Successfully froze unit 'user.slice'. feb 12 22:06:11 hp1.tribun systemd[1]: user@1000.service: Unit now frozen-by-parent. feb 12 22:06:11 hp1.tribun systemd-sleep[4204]: Performing sleep operation 'hibernate'... feb 12 22:06:11 hp1.tribun kernel: PM: Image not found (code -16) feb 12 22:06:11 hp1.tribun kernel: PM: hibernation: hibernation entry -- Boot a9c65d032c87428ca808fbe6ae46720b -- feb 12 22:09:15 hp1.tribun kernel: Linux version 6.18.9-desktop-1.mga10 (iurt@ecosse.mageia.org) (gcc (Mageia 15.2.0-1.mga10) 15.2.0, GNU ld (GNU Binutils) 2.45.1) #1 SMP PREE> feb 12 22:09:15 hp1.tribun kernel: Command line: BOOT_IMAGE=/vmlinuz-6.18.9-desktop-1.mga10 root=/dev/mapper/vg--hp1-lv_root ro splash quiet noiswmd resume=/dev/vg-hp1/lv_swap>
(In reply to Morgan Leijström from comment #21) > (First I tried to reuse partitions but diskdrake stopped while creating swap Bug 16117 - Installer and diskdrake fail adding swap in existing LVM after removing earlier swap there
If you have an external disk, you might do an attempt "on the fly" wihout LVM: a) create a swap partition with mkswap /dev/sdX1 b) swapon /dev/sdX1 c) add resume=/dev/sdX1 to boot cmdline
Dont I need to add it to fstab too? -- Now tested same on my to-be new workstation (I bought my son´s old gaming rig) AMD Ryzen 5 2600X, Nvidia GTX 1070Ti, 32G RAM, 2TB m.2 mga10b1r1 CI 64 installer, updates during install, Plasma Same partitioning scheme, (just bigger partitions) Hibernate -> exact same last lines in journal, such as: feb 13 09:42:41 hallen.tribun systemd-sleep[3793]: Performing sleep operation 'hibernate'... feb 13 09:42:41 hallen.tribun kernel: PM: Image not found (code -16) So it is not problematic hardware. I see no trace of it even trying to create an image. There are no lines like ... kernel: PM: hibernation: Creating image: ... kernel: PM: hibernation: Need to copy xxxxx pages ... kernel: PM: hibernation: Normal pages needed: xxxxx + xxxx, available pages: xxxxx [ettan@hallen ~]$ LC_ALL=C free -h total used free shared buff/cache available Mem: 31Gi 2.5Gi 27Gi 74Mi 1.3Gi 28Gi Swap: 39Gi 0B 39Gi [ettan@hallen ~]$ Next, I will reinstall: make swap outside the encrypted LVM (It is quick to reinstall on this machine, and it is only for testing yet.)
(In reply to Morgan Leijström from comment #24) > Dont I need to add it to fstab too? fstab if you want to make in permanently, otherwise: resume=/dev/... (or resume=UUID) would tell where to resume from once hibernated. swapon /dev/... would enable the swap "on the fly", which can be even just 1 seconds before hibernation.
Same machine as Comment 24 Tried one fresh install with similar partitioning but swap as plain partition in disk (outside LVM and LUKS) And then also one fresh install with * Installer default partitioning * using whole drive. (it stupidly set 4 GB swap when i have 32 G RAM and 2TB drive, but whatever it does not matter for this test, but it do need another bug to be opened.) Hibernation is trigged by the selection in launch (Mageia logo) menu in Plasma. In both cases same net result as in Comment 24: it do not even try to make an image, then do not find it and just shuts down, session lost. We cant release this. Apart from hibernating, it works, the little i have used on this machine. For reference: [root@hallen ~]# inxi -SMCGN System: Host: hallen.tribun Kernel: 6.18.9-desktop-1.mga10 arch: x86_64 bits: 64 Console: pty pts/0 Distro: Mageia 10 Machine: Type: Desktop Mobo: Micro-Star model: B450 TOMAHAWK (MS-7C02) v: 1.0 serial: I716329884 Firmware: UEFI vendor: American Megatrends LLC. v: 1.J8 date: 09/10/2025 CPU: Info: 6-core model: AMD Ryzen 5 2600X bits: 64 type: MT MCP cache: L2: 3 MiB Speed (MHz): avg: 3200 min/max: 2200/3600 cores: 1: 3200 2: 3200 3: 3200 4: 3200 5: 3200 6: 3200 7: 3200 8: 3200 9: 3200 10: 3200 11: 3200 12: 3200 Graphics: Device-1: NVIDIA GP104 [GeForce GTX 1070 Ti] driver: nvidia v: 580.126.09 Display: unspecified server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9 driver: X: loaded: nvidia,v4l gpu: nvidia,nvidia-nvswitch resolution: 3840x2160~60Hz API: EGL v: 1.5 drivers: nvidia,swrast platforms: gbm,x11,surfaceless,device API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 580.126.09 renderer: NVIDIA GeForce GTX 1070 Ti/PCIe/SSE2 API: Vulkan v: 1.4.335 drivers: nvidia,llvmpipe surfaces: N/A Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo de: kscreen-console,kscreen-doctor gpu: nvidia-settings,nvidia-smi wl: wayland-info x11: xdpyinfo, xprop, xrandr Network: Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet driver: r8169
Priority: Normal => release_blockerSummary: Fail hibernate (no space but there is), and then fail reverting to desktop (amdgpu fail) => Fail hibernating (two ways), also fail reverting to desktop (amdgpu fail)
Further testing on same machine as Comment 26. (That is now becoming my new workstation still running mga9, - but i swap drives for testing this bug.) I freed an used SSD drive, so i replaced the m.2 drive previously used in testing: Still same problem: it seems to forget creating the image and subsequently in journal comes "kernel: PM: Image not found (code -16)", it powers off and session is lost. I did this also without updates (just plain 10b1r1 installer ISO, and with mually setting it up simply: /boot/ESP:100MB, /:18GB ext4, swap:40GB. No LVN nor LUKS. And that for a Plasma only and one Xfce only install. Always ends in "kernel: PM: Image not found (code -16)" - the real problem is that it never tries to prepare that image. - Why does it "forget" to do that? So, how do Mageia 9 work on this machine then? Plain install mga9 CI IOS without updates, Xfce: At hibernation it stops with black screen and blinking LED:s (kernel panic i think that means.) After next boot i look in journal and find "kernel: PM: Image not found (code -22)" (note 22 not 16 as in mga10), and also here no trace it ever tried making that image. So problem is not new. but different. Next i performed a full update of Mageia 9 * Now hibernation works :-) * So this is how it is supposed to look: (i replaced several similar lines with "..." to shorten it) feb 14 09:42:52 localhost systemd-logind[891]: The system will hibernate now! feb 14 09:42:57 localhost systemd-logind[891]: Delay lock is active (UID 1000/ettan, PID 13390/light-locker) but inhibitor timeout is reached. feb 14 09:42:57 localhost systemd[1]: Reached target sleep.target. feb 14 09:42:57 localhost systemd[1]: Starting systemd-hibernate.service... feb 14 09:42:57 localhost systemd-sleep[13888]: Entering sleep state 'hibernate'... feb 14 09:42:57 localhost kernel: PM: hibernation: hibernation entry feb 14 09:43:52 localhost kernel: Filesystems sync: 0.022 seconds feb 14 09:43:52 localhost kernel: Freezing user space processes feb 14 09:43:52 localhost kernel: Freezing user space processes completed (elapsed 0.001 seconds) feb 14 09:43:52 localhost kernel: OOM killer disabled. feb 14 09:43:52 localhost kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] ... feb 14 09:43:52 localhost kernel: PM: hibernation: Marking nosave pages: [mem 0xbf000000-0xffffffff] feb 14 09:43:52 localhost kernel: PM: hibernation: Basic memory bitmaps created feb 14 09:43:52 localhost kernel: PM: hibernation: Preallocating image memory feb 14 09:43:52 localhost kernel: PM: hibernation: Allocated 497808 pages for snapshot feb 14 09:43:52 localhost kernel: PM: hibernation: Allocated 1991232 kbytes in 1.26 seconds (1580.34 MB/s) feb 14 09:43:52 localhost kernel: Freezing remaining freezable tasks feb 14 09:43:52 localhost kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds) feb 14 09:43:52 localhost kernel: printk: Suspending console(s) (use no_console_suspend to debug) feb 14 09:43:52 localhost kernel: serial 00:04: disabled feb 14 09:43:52 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 09:43:52 localhost kernel: ACPI: PM: Preparing to enter system sleep state S4 feb 14 09:43:52 localhost kernel: ACPI: PM: Saving platform NVS memory feb 14 09:43:52 localhost kernel: Disabling non-boot CPUs ... feb 14 09:43:52 localhost kernel: smpboot: CPU 1 is now offline ... feb 14 09:43:52 localhost kernel: smpboot: CPU 11 is now offline feb 14 09:43:52 localhost kernel: PM: hibernation: Creating image: feb 14 09:43:52 localhost kernel: PM: hibernation: Need to copy 632744 pages feb 14 09:43:52 localhost kernel: PM: hibernation: Normal pages needed: 632744 + 1024, available pages: 7735915 This is the point i believe logging is stopped, it saves image to disk, then it power off. Following lines is after power on, restoring the system. Next lines have same timestamps as last shutdown time, that is normal. feb 14 09:43:52 localhost kernel: ACPI: PM: Restoring platform NVS memory feb 14 09:43:52 localhost kernel: AMD-Vi: Virtual APIC enabled feb 14 09:43:52 localhost kernel: AMD-Vi: Virtual APIC enabled feb 14 09:43:52 localhost kernel: Enabling non-boot CPUs ... feb 14 09:43:52 localhost kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 feb 14 09:43:52 localhost kernel: ACPI: \_PR_.C002: Found 2 idle states feb 14 09:43:52 localhost kernel: CPU1 is up ... feb 14 09:43:52 localhost kernel: smpboot: Booting Node 0 Processor 11 APIC 0xd feb 14 09:43:52 localhost kernel: ACPI: \_PR_.C00B: Found 2 idle states feb 14 09:43:52 localhost kernel: CPU11 is up feb 14 09:43:52 localhost kernel: ACPI: PM: Waking up from system sleep state S4 feb 14 09:43:52 localhost kernel: usb usb1: root hub lost power or was reset feb 14 09:43:52 localhost kernel: usb usb2: root hub lost power or was reset feb 14 09:43:52 localhost kernel: usb usb3: root hub lost power or was reset feb 14 09:43:52 localhost kernel: usb usb4: root hub lost power or was reset feb 14 09:43:52 localhost kernel: serial 00:04: activated feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo: fault 01 [WRITE] at 0000000000327000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -> ... ( Crap, nouveau fouls up ) ... feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo: PBDMA0: 80064000 [GPPTR PBPTR PBENTRY SIGNATURE] ch 2 [01ffb9e000 Xorg[13099]] subc 0 mthd 0000 data 00000000 feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo:000000:0002:0002:[Xorg[13099]] errored - disabling channel feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: Xorg[13099]: channel 2 killed! feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo: fault 01 [WRITE] at 000000ff00070000 engine 1f [PHYSICAL] client 07 [HUB/HOST_CPU] reason 0d [REGION_VIOLATION> feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo: PBDMA0: 80040000 [PBENTRY SIGNATURE] ch 3 [01ff693000 xfwm4[13336]] subc 0 mthd 0000 data 00000000 feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: fifo:000000:0003:0003:[xfwm4[13336]] errored - disabling channel feb 14 09:43:52 localhost kernel: nouveau 0000:26:00.0: Xorg[13099]: channel 3 killed! feb 14 09:43:52 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 09:43:52 localhost kernel: ata6: SATA link down (SStatus 0 SControl 330) feb 14 09:43:52 localhost kernel: ata9: SATA link down (SStatus 0 SControl 300) feb 14 09:43:52 localhost kernel: ata5: SATA link down (SStatus 0 SControl 330) feb 14 09:43:52 localhost kernel: ata2: SATA link down (SStatus 0 SControl 300) feb 14 09:43:52 localhost kernel: usb 1-6: reset full-speed USB device number 2 using xhci_hcd feb 14 09:43:52 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) feb 14 09:43:52 localhost kernel: ata1.00: Entering active power mode feb 14 09:43:52 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 09:43:52 localhost kernel: sd 0:0:0:0: [sda] Starting disk feb 14 09:43:52 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 09:43:52 localhost kernel: ata1.00: configured for UDMA/133 feb 14 09:43:52 localhost kernel: ata1.00: Enabling discard_zeroes_data feb 14 09:43:52 localhost kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd feb 14 09:43:52 localhost kernel: PM: hibernation: Basic memory bitmaps freed feb 14 09:43:52 localhost kernel: OOM killer enabled. feb 14 09:43:52 localhost kernel: Restarting tasks ... done. feb 14 09:43:52 localhost kernel: PM: hibernation: hibernation exit feb 14 09:43:52 localhost systemd-sleep[13888]: System returned from sleep state. feb 14 09:43:52 localhost systemd[1]: systemd-hibernate.service: Deactivated successfully. feb 14 09:43:52 localhost systemd[1]: Finished systemd-hibernate.service. feb 14 09:43:52 localhost systemd[1]: systemd-hibernate.service: Consumed 1.538s CPU time. feb 14 09:43:52 localhost systemd[1]: Reached target hibernate.target. feb 14 09:43:52 localhost systemd[1]: Stopped target sleep.target. feb 14 09:43:52 localhost systemd-logind[891]: Operation 'sleep' finished. feb 14 09:43:52 localhost systemd[1]: Stopped target hibernate.target. feb 14 09:43:52 localhost systemd[1]: Started systemd-coredump@1-13931-0.service. feb 14 09:43:52 localhost ifplugd(enp34s0)[1231]: Link beat lost. feb 14 09:43:53 localhost systemd-coredump[13933]: [🡕] Process 13099 (Xorg) of user 0 dumped core. ... So: restoring basically works too - but Xorg session dumps core due to nouveau, and user session is lost then :-( User logs in to a non-restored Xfce desktop. Anyway so far, Mageia 10 is a regression from Mageia 9 regarding hibernation here. So, next try is switching graphics driver so this Mageia9 system works on this machine. Once Mageia 9 works to hibernate & restore, i will upgrade it to Mageia 10, stay tuned...
BTW that fail that fail in Comment 27 on nouveau (at least seen here on on Mageia 9 for this card, kernel desktop 6.6.120) should be fixed. Do we have a bug on that? I see system use modesetting, set up automatically at install. [ettan@localhost ~]$ inxi -SMCGa System: Host: localhost Kernel: 6.6.120-desktop-1.mga9 arch: x86_64 bits: 64 compiler: gcc v: 12.3.0 clocksource: tsc avail: hpet,acpi_pm parameters: BOOT_IMAGE=/boot/vmlinuz-6.6.120-desktop-1.mga9 root=UUID=8b0d0611-9295-480a-8637-378dfe8cf4b9 ro splash quiet noiswmd resume=UUID=6bba3648-8a45-42d9-8094-a59f8f4c3b73 audit=0 vga=791 Desktop: Xfce v: 4.18.1 tk: Gtk v: 3.24.36 wm: xfwm4 v: 4.18.0 with: xfce4-panel tools: light-locker vt: 1 dm: LightDM v: 1.32.0 Distro: Mageia 9 Machine: Type: Desktop Mobo: Micro-Star model: B450 TOMAHAWK (MS-7C02) v: 1.0 serial: <superuser required> uuid: <superuser required> UEFI: American Megatrends LLC. v: 1.J8 date: 09/10/2025 CPU: Info: model: AMD Ryzen 5 2600X bits: 64 type: MT MCP arch: Zen+ gen: 1+ level: v3 note: check built: 2018-21 process: GF 12nm family: 0x17 (23) model-id: 8 stepping: 2 microcode: 0x800820E Topology: cpus: 1x dies: 1 clusters: 1 cores: 6 threads: 12 tpc: 2 smt: enabled cache: L1: 576 KiB desc: d-6x32 KiB; i-6x64 KiB L2: 3 MiB desc: 6x512 KiB L3: 16 MiB desc: 2x8 MiB Speed (MHz): avg: 2404 min/max: 2200/3600 boost: enabled scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 2404 2: 2404 3: 2404 4: 2404 5: 2404 6: 2404 7: 2404 8: 2404 9: 2404 10: 2404 11: 2404 12: 2404 bogomips: 86403 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm Vulnerabilities: Type: gather_data_sampling status: Not affected Type: indirect_target_selection status: Not affected Type: itlb_multihit status: Not affected Type: l1tf status: Not affected Type: mds status: Not affected Type: meltdown status: Not affected Type: mmio_stale_data status: Not affected Type: reg_file_data_sampling status: Not affected Type: retbleed mitigation: untrained return thunk; SMT vulnerable Type: spec_rstack_overflow mitigation: Safe RET Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization Type: spectre_v2 mitigation: Retpolines; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected Type: srbds status: Not affected Type: tsa status: Not affected Type: tsx_async_abort status: Not affected Type: vmscape mitigation: IBPB before exit to userspace Graphics: Device-1: NVIDIA GP104 [GeForce GTX 1070 Ti] vendor: ASUSTeK driver: nouveau v: kernel alternate: nvidia_drm,nvidia_current non-free: 550.xx+ status: current (as of 2024-09; EOL~2026-12-xx) arch: Pascal code: GP10x process: TSMC 16nm built: 2016-2021 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s ports: active: DP-1 empty: DP-2, DVI-D-1, HDMI-A-1, HDMI-A-2 bus-ID: 26:00.0 chip-ID: 10de:1b82 class-ID: 0300 temp: 38.0 C Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 compositor: xfwm4 v: 4.18.0 driver: X: loaded: modesetting,v4l dri: nouveau gpu: nouveau display-ID: :0.0 screens: 1 Screen-1: 0 s-res: 3840x2160 s-dpi: 96 s-size: 1016x571mm (40.00x22.48") s-diag: 1165mm (45.88") Monitor-1: DP-1 model: Philips PHL 436M6VBP serial: 7968 built: 2018 res: 3840x2160 hz: 60 dpi: 104 gamma: 1.2 size: 941x529mm (37.05x20.83") diag: 1080mm (42.5") ratio: 16:9 modes: max: 3840x2160 min: 640x480 API: OpenGL v: 4.3 vendor: mesa v: 25.0.7 glx-v: 1.4 es-v: 3.2 direct-render: yes renderer: NV134 device-ID: 10de:1b82 memory: 7.78 GiB unified: no API: EGL Message: EGL data requires eglinfo. Check --recommends.
(In reply to Morgan Leijström from comment #28) > BTW that fail that fail in Comment 27 on nouveau (at least seen here on on > Mageia 9 for this card, kernel desktop 6.6.120) should be fixed. Do we have > a bug on that? if that's old card for nouveau you probably need probably the extra nv firmware from older drivers, to be extracted with a the script from github.
Only old in Nvidia business eyes. That card is new in my philosophy. Of more problems I may switch in my AMD card from my old workstation.
Strangely, after having switched to nvidia proprietary, the system have forgot how to hibernate: feb 14 11:08:41 localhost systemd-logind[1029]: The system will hibernate now! feb 14 11:08:46 localhost systemd-logind[1029]: Delay lock is active (UID 1000/ettan, PID 2380/light-locker) but inhibitor timeout is reached. feb 14 11:08:46 localhost systemd[1]: Reached target sleep.target. feb 14 11:08:46 localhost systemd[1]: Starting systemd-hibernate.service... feb 14 11:08:46 localhost systemd-sleep[2800]: Entering sleep state 'hibernate'... feb 14 11:08:46 localhost kernel: PM: hibernation: hibernation entry feb 14 11:08:46 localhost acpid[997]: client 1610[0:0] has disconnected feb 14 11:08:46 localhost acpid[997]: client 1610[0:0] has disconnected 1) does not create image 2) does not detect image is missing And when i power it on it booted to black screen after grub. I shut it down by REISUB. Above journal snippet is from absolute end of "# journalctl -b-1" Next I will put this drive in my old workstation, will see after lunch and some walk with my wife in the sun :-)
(In reply to Giuseppe Ghibò from comment #29) > (In reply to Morgan Leijström from comment #28) > > > BTW that fail that fail in Comment 27 on nouveau (at least seen here on on > > Mageia 9 for this card, kernel desktop 6.6.120) should be fixed. Do we have > > a bug on that? > > if that's old card for nouveau you probably need probably the extra nv > firmware from older drivers, to be extracted with a the script from github. That firmware only for very old card, even older than those supported by the nvidia470 driver.
(In reply to Morgan Leijström from comment #31) > Strangely, after having switched to nvidia proprietary, the system have > forgot how to hibernate: There are two further configs in the nvidia proprietary: /etc/modprobe.d/display-driver.conf #options nvidia-current NVreg_DynamicPowerManagement=0x02 #options nvidia-current NVreg_PreserveVideoMemoryAllocations=1 #options nvidia-current NVreg_TemporaryFilePath=/var/tmp which you may uncomment and see if getting better (or worst) at power management.
(In reply to Giuseppe Ghibò from comment #33) OK I uncommented the three options you suggested, and now for the first time my session did not get lost. More specifically it decided not to cut power and instea dit returned to desktop, successfully: feb 14 14:10:52 localhost kernel: PM: hibernation: Marking nosave pages: [mem 0xbf000000-0xffffffff] feb 14 14:10:52 localhost kernel: PM: hibernation: Basic memory bitmaps created feb 14 14:10:52 localhost kernel: PM: hibernation: Preallocating image memory feb 14 14:10:52 localhost kernel: PM: hibernation: Allocated 558628 pages for snapshot feb 14 14:10:52 localhost kernel: PM: hibernation: Allocated 2234512 kbytes in 1.28 seconds (1745.71 MB/s) feb 14 14:10:52 localhost kernel: Freezing remaining freezable tasks feb 14 14:10:52 localhost kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds) feb 14 14:10:52 localhost kernel: printk: Suspending console(s) (use no_console_suspend to debug) feb 14 14:10:52 localhost kernel: serial 00:04: disabled feb 14 14:10:52 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 14:10:52 localhost kernel: NVRM: GPU 0000:26:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README. feb 14 14:10:52 localhost kernel: nvidia 0000:26:00.0: PM: pci_pm_freeze(): nv_pmops_freeze+0x0/0x40 [nvidia] returns -5 feb 14 14:10:52 localhost kernel: nvidia 0000:26:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xe0 returns -5 feb 14 14:10:52 localhost kernel: nvidia 0000:26:00.0: PM: failed to freeze async: error -5 feb 14 14:10:52 localhost kernel: serial 00:04: activated feb 14 14:10:52 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 14:10:52 localhost kernel: ata6: SATA link down (SStatus 0 SControl 330) feb 14 14:10:52 localhost kernel: ata2: SATA link down (SStatus 0 SControl 300) feb 14 14:10:52 localhost kernel: ata5: SATA link down (SStatus 0 SControl 330) feb 14 14:10:52 localhost kernel: ata9: SATA link down (SStatus 0 SControl 300) feb 14 14:10:52 localhost kernel: fbcon: Taking over console feb 14 14:10:52 localhost kernel: PM: hibernation: Basic memory bitmaps freed feb 14 14:10:52 localhost kernel: OOM killer enabled. feb 14 14:10:52 localhost kernel: Restarting tasks ... feb 14 14:10:52 localhost kernel: Console: switching to colour frame buffer device 128x48 feb 14 14:10:52 localhost kernel: done. feb 14 14:10:52 localhost kernel: PM: hibernation: hibernation exit feb 14 14:10:52 localhost systemd-sleep[2465]: Failed to put system to sleep. System resumed again: Input/output error feb 14 14:10:52 localhost rtkit-daemon[1027]: Supervising 2 threads of 1 processes of 1 users. feb 14 14:10:52 localhost acpid[1000]: client connected from 1597[0:0] feb 14 14:10:52 localhost acpid[1000]: 1 client rule loaded feb 14 14:10:52 localhost rtkit-daemon[1027]: Successfully made thread 2509 of process 2113 owned by '1000' RT at priority 5. feb 14 14:10:52 localhost rtkit-daemon[1027]: Supervising 3 threads of 1 processes of 1 users. feb 14 14:10:52 localhost acpid[1000]: client connected from 1597[0:0] feb 14 14:10:52 localhost acpid[1000]: 1 client rule loaded feb 14 14:10:52 localhost rtkit-daemon[1027]: Supervising 2 threads of 1 processes of 1 users. feb 14 14:10:52 localhost rtkit-daemon[1027]: Successfully made thread 2510 of process 2113 owned by '1000' RT at priority 5. feb 14 14:10:52 localhost rtkit-daemon[1027]: Supervising 3 threads of 1 processes of 1 users. feb 14 14:10:52 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) feb 14 14:10:52 localhost kernel: ata1.00: Entering active power mode feb 14 14:10:52 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 14:10:52 localhost kernel: sd 0:0:0:0: [sda] Starting disk feb 14 14:10:52 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 14:10:52 localhost kernel: ata1.00: configured for UDMA/133 feb 14 14:10:52 localhost kernel: ata1.00: Enabling discard_zeroes_data feb 14 14:10:52 localhost systemd[1]: systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE feb 14 14:10:52 localhost systemd[1]: systemd-hibernate.service: Failed with result 'exit-code'. feb 14 14:10:52 localhost systemd[1]: Failed to start systemd-hibernate.service. feb 14 14:10:52 localhost systemd[1]: Dependency failed for hibernate.target. feb 14 14:10:52 localhost systemd[1]: hibernate.target: Job hibernate.target/start failed with result 'dependency'. feb 14 14:10:52 localhost systemd[1]: systemd-hibernate.service: Consumed 1.773s CPU time. feb 14 14:10:52 localhost systemd-logind[1038]: Operation 'sleep' finished. feb 14 14:10:52 localhost systemd[1]: Stopped target sleep.target. feb 14 14:10:52 localhost ifplugd(enp34s0)[1328]: Link beat lost. feb 14 14:10:53 localhost kernel: [drm:drm_new_set_master [drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002600] Failed to grab modeset ownership Desktop is back OK. So at least it does not throw away users work, but it also do not hibernate. I note in the above: "System Power Management attempted without driver procfs suspend interface." Is that something we can fix? Now, I´ll try if nvidia470 works better here on this GTX 1070. --- > (In reply to Giuseppe Ghibò from comment #29) > > if that's old card for nouveau you probably need probably the extra nv > > firmware from older drivers, to be extracted with a the script from github. > > That firmware only for very old card, even older than those supported by the > nvidia470 driver. I believe you are thinking of my Thinkpad T510, or the card that died in my workstation some months ago. My now former workstation have an AMD Navi 24 [Radeon RX 6400]
HOORAY! First hibernate-restore cycle in this bug. So nvidia470 is better for GTX 1070 Ti at least for hibernation. For reference, below a good cycle: feb 14 14:38:21 localhost systemd-logind[1053]: The system will hibernate now! feb 14 14:38:26 localhost systemd-logind[1053]: Delay lock is active (UID 1000/ettan, PID 2110/light-locker) but inhibitor timeout is reached. feb 14 14:38:26 localhost systemd[1]: Reached target sleep.target. feb 14 14:38:26 localhost systemd[1]: Starting systemd-hibernate.service... feb 14 14:38:26 localhost systemd-sleep[2471]: Entering sleep state 'hibernate'... feb 14 14:38:26 localhost kernel: PM: hibernation: hibernation entry feb 14 14:38:26 localhost acpid[994]: client 1600[0:0] has disconnected feb 14 14:38:26 localhost acpid[994]: client 1600[0:0] has disconnected feb 14 14:38:26 localhost kernel: Filesystems sync: 0.023 seconds feb 14 14:39:22 localhost kernel: Freezing user space processes feb 14 14:39:22 localhost kernel: Freezing user space processes completed (elapsed 0.001 seconds) feb 14 14:39:22 localhost kernel: OOM killer disabled. feb 14 14:39:22 localhost kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] ... feb 14 14:39:22 localhost kernel: PM: hibernation: Marking nosave pages: [mem 0xbf000000-0xffffffff] feb 14 14:39:22 localhost kernel: PM: hibernation: Basic memory bitmaps created feb 14 14:39:22 localhost kernel: PM: hibernation: Preallocating image memory feb 14 14:39:22 localhost kernel: PM: hibernation: Allocated 504024 pages for snapshot feb 14 14:39:22 localhost kernel: PM: hibernation: Allocated 2016096 kbytes in 1.26 seconds (1600.07 MB/s) feb 14 14:39:22 localhost kernel: Freezing remaining freezable tasks feb 14 14:39:22 localhost kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds) feb 14 14:39:22 localhost kernel: printk: Suspending console(s) (use no_console_suspend to debug) feb 14 14:39:22 localhost kernel: serial 00:04: disabled feb 14 14:39:22 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 14:39:22 localhost kernel: ACPI: PM: Preparing to enter system sleep state S4 feb 14 14:39:22 localhost kernel: ACPI: PM: Saving platform NVS memory feb 14 14:39:22 localhost kernel: Disabling non-boot CPUs ... feb 14 14:39:22 localhost kernel: smpboot: CPU 1 is now offline ... feb 14 14:39:22 localhost kernel: smpboot: CPU 11 is now offline feb 14 14:39:22 localhost kernel: PM: hibernation: Creating image: feb 14 14:39:22 localhost kernel: PM: hibernation: Need to copy 497498 pages feb 14 14:39:22 localhost kernel: PM: hibernation: Normal pages needed: 497498 + 1024, available pages: 7871161 --- It powered off, Then I powered it up, grub menu, resuming is fast: --- feb 14 14:39:22 localhost kernel: ACPI: PM: Restoring platform NVS memory feb 14 14:39:22 localhost kernel: AMD-Vi: Virtual APIC enabled feb 14 14:39:22 localhost kernel: AMD-Vi: Virtual APIC enabled feb 14 14:39:22 localhost kernel: Enabling non-boot CPUs ... feb 14 14:39:22 localhost kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 feb 14 14:39:22 localhost kernel: ACPI: \_PR_.C002: Found 2 idle states feb 14 14:39:22 localhost kernel: CPU1 is up ... feb 14 14:39:22 localhost kernel: smpboot: Booting Node 0 Processor 11 APIC 0xd feb 14 14:39:22 localhost kernel: ACPI: \_PR_.C00B: Found 2 idle states feb 14 14:39:22 localhost kernel: CPU11 is up feb 14 14:39:22 localhost kernel: ACPI: PM: Waking up from system sleep state S4 feb 14 14:39:22 localhost kernel: usb usb1: root hub lost power or was reset feb 14 14:39:22 localhost kernel: usb usb2: root hub lost power or was reset feb 14 14:39:22 localhost kernel: usb usb3: root hub lost power or was reset feb 14 14:39:22 localhost kernel: usb usb4: root hub lost power or was reset feb 14 14:39:22 localhost kernel: serial 00:04: activated feb 14 14:39:22 localhost kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 14 14:39:22 localhost kernel: ata6: SATA link down (SStatus 0 SControl 330) feb 14 14:39:22 localhost kernel: ata9: SATA link down (SStatus 0 SControl 300) feb 14 14:39:22 localhost kernel: ata5: SATA link down (SStatus 0 SControl 330) feb 14 14:39:22 localhost kernel: ata2: SATA link down (SStatus 0 SControl 300) feb 14 14:39:22 localhost kernel: usb 1-6: reset full-speed USB device number 2 using xhci_hcd feb 14 14:39:22 localhost kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) feb 14 14:39:22 localhost kernel: ata1.00: Entering active power mode feb 14 14:39:22 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 14:39:22 localhost kernel: sd 0:0:0:0: [sda] Starting disk feb 14 14:39:22 localhost kernel: ata1.00: supports DRM functions and may not be fully accessible feb 14 14:39:22 localhost kernel: ata1.00: configured for UDMA/133 feb 14 14:39:22 localhost kernel: ata1.00: Enabling discard_zeroes_data feb 14 14:39:22 localhost kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd feb 14 14:39:22 localhost kernel: PM: hibernation: Basic memory bitmaps freed feb 14 14:39:22 localhost kernel: OOM killer enabled. feb 14 14:39:22 localhost kernel: Restarting tasks ... done. feb 14 14:39:22 localhost kernel: PM: hibernation: hibernation exit feb 14 14:39:22 localhost systemd-sleep[2471]: System returned from sleep state. feb 14 14:39:22 localhost acpid[994]: client connected from 1600[0:0] feb 14 14:39:22 localhost acpid[994]: 1 client rule loaded feb 14 14:39:22 localhost systemd[1]: systemd-hibernate.service: Deactivated successfully. feb 14 14:39:22 localhost systemd[1]: Finished systemd-hibernate.service. feb 14 14:39:22 localhost systemd[1]: systemd-hibernate.service: Consumed 1.549s CPU time. feb 14 14:39:22 localhost systemd[1]: Reached target hibernate.target. feb 14 14:39:22 localhost systemd[1]: Stopped target sleep.target. feb 14 14:39:22 localhost systemd[1]: Stopped target hibernate.target. feb 14 14:39:22 localhost systemd-logind[1053]: Operation 'sleep' finished. feb 14 14:39:22 localhost rtkit-daemon[1010]: Supervising 2 threads of 1 processes of 1 users. feb 14 14:39:22 localhost rtkit-daemon[1010]: Successfully made thread 2549 of process 2105 owned by '1000' RT at priority 5. feb 14 14:39:22 localhost rtkit-daemon[1010]: Supervising 3 threads of 1 processes of 1 users. feb 14 14:39:22 localhost ifplugd(enp34s0)[1324]: Link beat lost. feb 14 14:39:22 localhost acpid[994]: client connected from 1600[0:0] feb 14 14:39:22 localhost acpid[994]: 1 client rule loaded feb 14 14:39:22 localhost rtkit-daemon[1010]: Supervising 2 threads of 1 processes of 1 users. feb 14 14:39:22 localhost rtkit-daemon[1010]: Successfully made thread 2550 of process 2105 owned by '1000' RT at priority 5. feb 14 14:39:22 localhost rtkit-daemon[1010]: Supervising 3 threads of 1 processes of 1 users. feb 14 14:39:23 localhost acpid[994]: client connected from 2562[0:0] ... feb 14 14:39:24 localhost systemd-logind[1053]: New session 5 of user lightdm. ... feb 14 14:39:24 localhost systemd[2586]: Startup finished in 117ms. Desktop is restored, working great. Next I will try to reset the changes per Comment 33 Then if all OK, I will try upgrading to mga10
> > I believe you are thinking of my Thinkpad T510, or the card that died in my > workstation some months ago. > My now former workstation have an AMD Navi 24 [Radeon RX 6400] Yes, maybe we need a hw table somewhere for a quick look. And if you use NVreg_PreserveVideoMemoryAllocations=0 ?
systemctl list-unit-files | grep nvidia ?
(In reply to Morgan Leijström from comment #35) > HOORAY! First hibernate-restore cycle in this bug. > > So nvidia470 is better for GTX 1070 Ti at least for hibernation. ... > Next I will try to reset the changes per Comment 33 Yep, this works too :-) (In reply to Giuseppe Ghibò from comment #37) After the successful hibernate-restore: [ettan@localhost ~]$ systemctl list-unit-files | grep -e nvidia -e "UNIT FILE" UNIT FILE STATE PRESET nvidia-hibernate.service disabled disabled nvidia-resume.service disabled disabled nvidia-suspend.service disabled disabled --- (In reply to Giuseppe Ghibò from comment #36) > > > > I believe you are thinking of my Thinkpad T510, or the card that died in my > > workstation some months ago. > > My now former workstation have an AMD Navi 24 [Radeon RX 6400] > > Yes, maybe we need a hw table somewhere for a quick look. We have this: https://wiki.mageia.org/en/QA_iso_hardware_list But most of us are too lazy to update it. Most of it is probably not relevant today. Maybe we should make a new one, same name except omit "_iso", use for all testing use. It would be helpful if we had some script that generate a line of the table to paste. --- (In reply to Giuseppe Ghibò from comment #36) > And if you use NVreg_PreserveVideoMemoryAllocations=0 ? You mean to try to mitigate the problem when using nvidia 580? Together with uncommenting lines per Comment 33?
(In reply to Morgan Leijström from comment #38) > (In reply to Morgan Leijström from comment #35) > > HOORAY! First hibernate-restore cycle in this bug. > > > > So nvidia470 is better for GTX 1070 Ti at least for hibernation. > ... > > Next I will try to reset the changes per Comment 33 > > Yep, this works too :-) > > > (In reply to Giuseppe Ghibò from comment #37) > > After the successful hibernate-restore: > > [ettan@localhost ~]$ systemctl list-unit-files | grep -e nvidia -e "UNIT > FILE" > UNIT FILE STATE PRESET > nvidia-hibernate.service disabled disabled > nvidia-resume.service disabled disabled > nvidia-suspend.service disabled disabled Probably those need to be enabled and started. When using: NVreg_PreserveVideoMemoryAllocations=1 is should preserve also the nvidia GPU memory and it will be written in some extra file in path indicated from NVreg_TemporaryFilePath=/var/tmp, and thus I think not in the same main system swap file used by hibernation, so it's could be something writable, with enough space and permanent. On resume I think it will do the opposite, resume from system swap hibernation file to RAM, then after that it will load the preserved GPU files from the above path to the GPU.
(In reply to Morgan Leijström from comment #38) > > We have this: https://wiki.mageia.org/en/QA_iso_hardware_list > But most of us are too lazy to update it. Most of it is probably not > relevant today. > Maybe we should make a new one, same name except omit "_iso", use for all > testing use. > It would be helpful if we had some script that generate a line of the table > to paste. maybe should be a quick table, with CPU, GPU, and identification, even an .md file. Then can be refined later, with all the stuff that could matter later, just to know in a glaze what we are talking about. E.g. morgan.navi6400 -> Ryzen+......, morgan.1070ti -> Ryzen+NV... > > --- > > (In reply to Giuseppe Ghibò from comment #36) > > And if you use NVreg_PreserveVideoMemoryAllocations=0 ? > > You mean to try to mitigate the problem when using nvidia 580? > Together with uncommenting lines per Comment 33? Yes, basically like this: options nvidia-current NVreg_DynamicPowerManagement=0x02 options nvidia-current NVreg_PreserveVideoMemoryAllocations=0 options nvidia-current NVreg_TemporaryFilePath=/var/tmp of course the latter probably wouldn't matter if NVreg_PreserveVideoMemoryAllocations=0, it shouldn't write anything to /var/tmp.
I will take on Comment 39 and 40 soon... For reference, now I booted this my new workstation (Ryzen, GTX 1070 Ti), using my production workstation disks, mga9. I note it have two more nvidia services set up but not active: [morgan@svarten ~]$ systemctl list-unit-files | grep -e nvidia -e "UNIT FILE" UNIT FILE STATE PRESET nvidia-hibernate.service disabled disabled nvidia-powerd.service disabled disabled nvidia-resume.service disabled disabled nvidia-suspend-then-hibernate.service disabled disabled nvidia-suspend.service disabled disabled I have not touched /etc/modprobe.d/display-driver.conf [morgan@svarten ~]$ inxi -G Graphics: Device-1: NVIDIA GP104 [GeForce GTX 1070 Ti] driver: nvidia v: 580.126.09 Display: x11 server: X.org v: 1.21.1.21 with: Xwayland v: 22.1.9 driver: X: loaded: nvidia,v4l gpu: nvidia,nvidia-nvswitch resolution: 3840x2160~60Hz API: EGL v: 1.5 drivers: nvidia,swrast platforms: gbm,x11,surfaceless,device API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 580.126.09 renderer: NVIDIA GeForce GTX 1070 Ti/PCIe/SSE2 Hibernating seem to work, but restoring fail: I see by the text on screen it is loading the image (kernel command have not splash quiet) - and then black screen,after a while the monitor goes standby. Took system down using REISUB. fsck was run for / and /home at next boot so either it did not really hibernate correctly, or it messed up during the failed restore. There is no log saved from the failed restoring. This system use LUKS and LVM. Now i change back from production disks to the test disk...
Summary: Fail hibernating (two ways), also fail reverting to desktop (amdgpu fail) => Fail hibernating several ways, amdgpu also fail reverting to desktop at hib fail. Nvidia problems too.
I upgraded the test system to Mageia 10. kernel-desktop-6.18.10-1 During upgrade it forgot kernel-devel (I will open bug) so at first boot it failed and change driver automatically, next boot OK using nouveau, kernel mode setting Like before hibernation fail: it does not even try to make an image and then "kernel: PM: Image not found (code -16)" and shit down, session lost. I installed kenrel-devel and selected nvidia 580 Strange, it (re)built kmod next boot. I edited display-driver.conf per Comment 40: -------8<------- [ettan@hallen ~]$ cat /etc/modprobe.d/display-driver.conf install nvidia /sbin/modprobe nvidia-current $CMDLINE_OPTS alias nvidia nvidia-current # Uncomment the following line to enable complete power management options nvidia-current NVreg_DynamicPowerManagement=0x02 # Other options that might be useful for power management in case of problems options nvidia-current NVreg_PreserveVideoMemoryAllocations=0 options nvidia-current NVreg_TemporaryFilePath=/var/tmp # Comment the following line to disable kernel modesetting support options nvidia-drm modeset=1 # Create a framebuffer device #options nvidia-drm fbdev=1 blacklist nouveau options nouveau modeset=0 blacklist nova_core blacklist nova_drm ------->8-------- And rebooted, and back on Xfce desktop i commanded hibernation. Next boot, after grub just black screen and after some seconds monitor suspends. Rebooted it using REISUB Journal: feb 15 20:53:21 hallen.tribun systemd-logind[1068]: The system will hibernate now! feb 15 20:53:26 hallen.tribun systemd-logind[1068]: Delay lock is active (UID 1000/ettan, PID 2994/light-locker) but inhibitor timeout is reached. feb 15 20:53:26 hallen.tribun systemd[1]: Reached target Sleep. feb 15 20:53:26 hallen.tribun systemd[1]: Starting System Hibernate... feb 15 20:53:27 hallen.tribun systemd[1]: session-3.scope: Unit now frozen-by-parent. feb 15 20:53:27 hallen.tribun systemd[1]: user.slice: Unit now frozen. feb 15 20:53:27 hallen.tribun systemd-sleep[10740]: Successfully froze unit 'user.slice'. feb 15 20:53:27 hallen.tribun systemd[1]: user-1000.slice: Unit now frozen-by-parent. feb 15 20:53:27 hallen.tribun systemd[1]: user@1000.service: Unit now frozen-by-parent. feb 15 20:53:27 hallen.tribun systemd-sleep[10740]: Performing sleep operation 'hibernate'... feb 15 20:53:27 hallen.tribun kernel: PM: hibernation: hibernation entry feb 15 20:53:27 hallen.tribun acpid[1061]: client 1807[0:0] has disconnected feb 15 20:53:27 hallen.tribun acpid[1061]: client 1807[0:0] has disconnected So this is terrible: it do not try to make an image, and does not care at all. There is no log from the subsequently failed restore. Next, I used MCC to enable the nvidia services at start: [ettan@hallen ~]$ systemctl list-unit-files | grep -e nvidia -e "UNIT FILE" UNIT FILE STATE PRESET nvidia-hibernate.service enabled disabled nvidia-powerd.service enabled disabled nvidia-resume.service enabled disabled nvidia-suspend-then-hibernate.service enabled disabled nvidia-suspend.service enabled disabled And rebooted, and back on Xfce desktop i commanded hibernation. Back on Xfce desktop i commanded hibernation. Next boot, after grub just black screen and after some seconds monitor suspends. Rebooted it using REISUB Journal: feb 15 21:04:49 hallen.tribun systemd-logind[1070]: The system will hibernate now! feb 15 21:04:54 hallen.tribun systemd-logind[1070]: Delay lock is active (UID 1000/ettan, PID 3120/light-locker) but inhibitor timeout is reached. feb 15 21:04:54 hallen.tribun systemd[1]: Reached target Sleep. feb 15 21:04:54 hallen.tribun systemd[1]: Starting NVIDIA system hibernate actions... feb 15 21:04:54 hallen.tribun hibernate[4489]: nvidia-hibernate.service feb 15 21:04:54 hallen.tribun logger[4489]: <13>Feb 15 21:04:54 hibernate: nvidia-hibernate.service feb 15 21:04:55 hallen.tribun kernel: snd_hda_codec_nvhdmi hdaudioC0D0: HDMI: invalid ELD data byte 22 feb 15 21:04:55 hallen.tribun systemd[1]: systemd-hostnamed.service: Deactivated successfully. feb 15 21:04:55 hallen.tribun systemd[2374]: Reached target Sound Card. feb 15 21:04:55 hallen.tribun systemd[1]: nvidia-hibernate.service: Deactivated successfully. feb 15 21:04:55 hallen.tribun systemd[1]: Finished NVIDIA system hibernate actions. feb 15 21:04:55 hallen.tribun systemd[1]: Starting System Hibernate... feb 15 21:04:55 hallen.tribun systemd[1]: user@1000.service: Unit now frozen-by-parent. feb 15 21:04:55 hallen.tribun systemd[1]: session-3.scope: Unit now frozen-by-parent. feb 15 21:04:55 hallen.tribun systemd[1]: user-1000.slice: Unit now frozen-by-parent. feb 15 21:04:55 hallen.tribun systemd[1]: user.slice: Unit now frozen. feb 15 21:04:55 hallen.tribun systemd-sleep[4517]: Successfully froze unit 'user.slice'. feb 15 21:04:55 hallen.tribun systemd-sleep[4517]: Performing sleep operation 'hibernate'... feb 15 21:04:55 hallen.tribun kernel: PM: hibernation: hibernation entry feb 15 21:05:53 hallen.tribun kernel: Filesystems sync: 0.029 seconds feb 15 21:05:53 hallen.tribun kernel: Freezing user space processes feb 15 21:05:53 hallen.tribun acpid[1054]: client 1827[0:0] has disconnected feb 15 21:05:53 hallen.tribun kernel: Freezing user space processes completed (elapsed 0.001 seconds) feb 15 21:05:53 hallen.tribun acpid[1054]: client 1827[0:0] has disconnected feb 15 21:05:53 hallen.tribun kernel: OOM killer disabled. feb 15 21:05:53 hallen.tribun dhcpcd[1537]: enp34s0: carrier lost feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] ... feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Marking nosave pages: [mem 0xbf000000-0xffffffff] feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Basic memory bitmaps created feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Preallocating image memory feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Allocated 844566 pages for snapshot feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Allocated 3378264 kbytes in 1.53 seconds (2208.01 MB/s) feb 15 21:05:53 hallen.tribun kernel: Freezing remaining freezable tasks feb 15 21:05:53 hallen.tribun kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds) feb 15 21:05:53 hallen.tribun kernel: printk: Suspending console(s) (use no_console_suspend to debug) feb 15 21:05:53 hallen.tribun kernel: serial 00:04: disabled feb 15 21:05:53 hallen.tribun kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 15 21:05:53 hallen.tribun kernel: ACPI: PM: Preparing to enter system sleep state S4 feb 15 21:05:53 hallen.tribun kernel: ACPI: PM: Saving platform NVS memory feb 15 21:05:53 hallen.tribun kernel: Disabling non-boot CPUs ... feb 15 21:05:53 hallen.tribun kernel: smpboot: CPU 11 is now offline ... feb 15 21:05:53 hallen.tribun kernel: smpboot: CPU 1 is now offline feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Creating image: feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Need to copy 810642 pages feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Normal pages needed: 810642 + 1024, available pages: 7558013 feb 15 21:05:53 hallen.tribun kernel: ACPI: PM: Restoring platform NVS memory feb 15 21:05:53 hallen.tribun kernel: AMD-Vi: Virtual APIC enabled feb 15 21:05:53 hallen.tribun kernel: AMD-Vi: Virtual APIC enabled feb 15 21:05:53 hallen.tribun kernel: Enabling non-boot CPUs ... feb 15 21:05:53 hallen.tribun kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 feb 15 21:05:53 hallen.tribun kernel: CPU1 is up ... feb 15 21:05:53 hallen.tribun kernel: CPU11 is up feb 15 21:05:53 hallen.tribun kernel: ACPI: PM: Waking up from system sleep state S4 feb 15 21:05:53 hallen.tribun kernel: usb usb1: root hub lost power or was reset feb 15 21:05:53 hallen.tribun kernel: usb usb2: root hub lost power or was reset feb 15 21:05:53 hallen.tribun kernel: usb usb3: root hub lost power or was reset feb 15 21:05:53 hallen.tribun kernel: usb usb4: root hub lost power or was reset feb 15 21:05:53 hallen.tribun kernel: serial 00:04: activated feb 15 21:05:53 hallen.tribun kernel: r8169 0000:22:00.0 enp34s0: Link is Down feb 15 21:05:53 hallen.tribun kernel: usb 1-6: WARN: invalid context state for evaluate context command. feb 15 21:05:53 hallen.tribun kernel: ata5: SATA link down (SStatus 0 SControl 330) feb 15 21:05:53 hallen.tribun kernel: ata9: SATA link down (SStatus 0 SControl 300) feb 15 21:05:53 hallen.tribun kernel: ata2: SATA link down (SStatus 0 SControl 300) feb 15 21:05:53 hallen.tribun kernel: ata6: SATA link down (SStatus 0 SControl 330) feb 15 21:05:53 hallen.tribun kernel: usb 1-6: reset full-speed USB device number 2 using xhci_hcd feb 15 21:05:53 hallen.tribun kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) feb 15 21:05:53 hallen.tribun kernel: ata1.00: supports DRM functions and may not be fully accessible feb 15 21:05:53 hallen.tribun kernel: sd 0:0:0:0: [sda] Starting disk feb 15 21:05:53 hallen.tribun kernel: ata1.00: supports DRM functions and may not be fully accessible feb 15 21:05:53 hallen.tribun kernel: ata1.00: configured for UDMA/133 feb 15 21:05:53 hallen.tribun kernel: ata1.00: Enabling discard_zeroes_data feb 15 21:05:53 hallen.tribun kernel: usb 1-8: WARN: invalid context state for evaluate context command. feb 15 21:05:53 hallen.tribun kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd feb 15 21:05:53 hallen.tribun systemd-sleep[4517]: System returned from sleep operation 'hibernate'. feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: Basic memory bitmaps freed feb 15 21:05:53 hallen.tribun kernel: OOM killer enabled. feb 15 21:05:53 hallen.tribun kernel: Restarting tasks: Starting feb 15 21:05:53 hallen.tribun kernel: Restarting tasks: Done feb 15 21:05:53 hallen.tribun kernel: efivarfs: resyncing variable state feb 15 21:05:53 hallen.tribun kernel: efivarfs: finished resyncing variable state feb 15 21:05:53 hallen.tribun kernel: PM: hibernation: hibernation exit feb 15 21:05:53 hallen.tribun dhcpcd[1537]: enp34s0: deleting route to 192.168.68.0/24 feb 15 21:05:53 hallen.tribun dhcpcd[1537]: enp34s0: deleting default route via 192.168.68.1 feb 15 21:05:54 hallen.tribun ifplugd(enp34s0)[1404]: Link beat lost. feb 15 21:05:55 hallen.tribun dhcpcd[1537]: enp34s0: carrier acquired feb 15 21:05:55 hallen.tribun kernel: r8169 0000:22:00.0 enp34s0: Link is Up - 1Gbps/Full - flow control rx/tx feb 15 21:05:55 hallen.tribun dhcpcd[1537]: enp34s0: IAID 23:e1:5b:1e feb 15 21:05:56 hallen.tribun ifplugd(enp34s0)[1404]: Link beat detected. feb 15 21:05:56 hallen.tribun kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing. feb 15 21:05:56 hallen.tribun dhcpcd[1537]: enp34s0: soliciting an IPv6 router feb 15 21:05:57 hallen.tribun dhcpcd[1537]: enp34s0: rebinding lease of 192.168.68.111 feb 15 21:05:57 hallen.tribun dhcpcd[1537]: enp34s0: probing address 192.168.68.111/24 feb 15 21:06:03 hallen.tribun dhcpcd[1537]: enp34s0: leased 192.168.68.111 for 7200 seconds feb 15 21:06:03 hallen.tribun dhcpcd[1537]: enp34s0: adding route to 192.168.68.0/24 feb 15 21:06:03 hallen.tribun dhcpcd[1537]: enp34s0: adding default route via 192.168.68.1 feb 15 21:06:08 hallen.tribun dhcpcd[1537]: enp34s0: no IPv6 Routers available feb 15 21:06:44 hallen.tribun kernel: sysrq: Keyboard mode set to system default feb 15 21:06:44 hallen.tribun systemd-journald[627]: Journal stopped feb 15 21:06:44 hallen.tribun kernel: sysrq: Terminate All Tasks ... (As monitor had suspended, I rebooted it using REISUB) I note above: "kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification" --- Another problem: I think nvidia 580.126.09 is not fully compatible with GTX 1070 Ti Sometimes when I scroll in the terminal in command journalctl, the whole terminal window flickers, showing desktop background. Also, response from some windows are sluggy occasionally, even as simple as typing in a terminal. glmark2 gives very good results anyway, about 11k FPS in first test. To sum it up: § nvidia580 have ugly glitches, some transient delays, and problems with both hibernation and restoring. § nvidia470 had no problem neither with glitches nor hibernation or restoring when i tested some comments above, and glmark2 report similar speed as 580. § nouveau is terribly slow. § modesetting show no glitches, decent speed (well glmark2 say tenth of speed of proprietary, but no problem at all in desktop usage), and it crash during restore, desktop session is lost. So only 470 is working good, but deprecated security wise... Should we try the new free nvidia drivers? --- Also, is it possible to enable more loggin of the hibernation process? Why it sometimes do not even try to make an image. Why it does not try to return to desktop when it cant make an image. Why it fail to find enough space though there are plenty.
Summary: Fail hibernating several ways, amdgpu also fail reverting to desktop at hib fail. Nvidia problems too. => Fail hibernating several ways, amdgpu fail rev to desktop at hib fail. Problems at restore using modesetting and nvidia580.
(In reply to Morgan Leijström from comment #42) > > So only 470 is working good, but deprecated security wise... > Should we try the new free nvidia drivers? If by free you mean those recent with -wopengpu, those are the same user part of 580.126.09, but with open source kernel modules. But those kernel modules works only for any GPU arch beyond or equal to Turing. Your GTX 1070Ti is a Pascal arch, so those kernel modules aren't compatible on your card.
OK, bummer then. Problem then that only the security wise deprecated 470 seem to be working nicely... I will soon try again if hibernate-restore works with modesetting driver on mga10. (fail restoring on fresh install updated mga9, comment 27)
(In reply to Morgan Leijström from comment #42) > I upgraded the test system to Mageia 10. > Another problem: I think nvidia 580.126.09 is not fully compatible with GTX > 1070 Ti it could, maybe the next one will be better. 580.126.09 is also in updates_testing of mga9, does it shows the same there?
This GTX 1070 machine - and me - will need to be dedicated for $work for a few days. So it is now running on my production workstation disks, Mageia 9. nvidia470 works perfectly including hibernation & restore. Backport kernel. --- Maybe i will go back to my HP Pavilion Laptop 15-cw, Comment 0, and try if Mageia9 works. Or another distro to compare. Later i also have another "New" laptop, workstation class hp Zbook 17", to upgrade from W11 when i finished current work on it.
Did you see any improvement with 6.18.13-1.mga10 and nvidia 580.126.18?
More testing on that machine have to wait
Side note: I now switched my mga9 work system on that machine to KMS due to two problems noted on the otherwise well working nvidia470: § Firefox having problem rendering (massive flickering) one or two pdf files. § Booting fail: SDDM found no screen - I dont understand how that suddenly happened, tried both backport kernel server and desktop, and 6.6.120 desktop. Also tried switching DM to light DM, but when switching from nvidia470 to KMS it works now.