| Summary: | MGA Applet still bring the 'can't open the rpm database' at each boot of the system when trying to check updates... | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Jean-François Bachelet <jfbachelet> |
| Component: | RPM Packages | Assignee: | Mageia tools maintainers <mageiatools> |
| Status: | REOPENED --- | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | davidwhodgins, fri, jfbachelet, neoser10, ouaurelien, rolfpedersen |
| Version: | 8 | ||
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
ps results after 'crash'
demsg and journal proof journal -b -2 new boot of today and crash journal good :) |
||
|
Description
Jean-François Bachelet
2021-04-18 16:41:16 CEST
Jean-François Bachelet
2021-04-18 16:41:23 CEST
CC:
(none) =>
jfbachelet Hi, thanks reporting this. Sadly, I can't really reproduce this whereas I have urpmi and dnf installed on my system. If mgaapplet complains about RPM database locked, it is because an other process involving RPM is running. It could be PackageKit with DNF-backend which computes updates also at the same time. It could be also a leading mgaapplet which is running for an other user. To try really fixing this, 1- Quit mgaapplet by right-click on it and choose quit. 2- Remove .mgaapplet folder in your user home directory. 3- Do you also use DNF? If you don't use it, Remove dnf-makecache.timer with the following command as root: systemctl stop dnf-makecache.timer systemctl mask dnf-makecache.timer You can also remove DNF from your system if you only use Urpmi. 4- If you are sure there is no process involved with RPM, rerun mgaapplet in a terminal as user: mgaapplet & See after 5 minutes if it correctly computes looking for updates. CC:
(none) =>
ouaurelien It's been long enough since I've seen the problem and it wasn't handled automatically I had to dig through my old messages to find it. The last time I replied to a person with a corrupted rpm db was in 2014. The fix is to run rm -f /var/lib/rpm/__db.00* rpm --rebuilddb The typical causes of any database corruption are things like hard poweroff, or i/o errors (bad sectors, intermittent connections due to faulty cable or connectors, etc). In most cases rpm can now handle it without having to manually run the above commands. If it recurs after running those commands, the cause of the db corruption needs to be identified and fixed. CC:
(none) =>
davidwhodgins (In reply to Aurelien Oudelet from comment #1) > To try really fixing this, > > 1- Quit mgaapplet by right-click on it and choose quit. > 2- Remove .mgaapplet folder in your user home directory. > > 3- Do you also use DNF? If you don't use it, > Remove dnf-makecache.timer with the following command as root: > systemctl stop dnf-makecache.timer > systemctl mask dnf-makecache.timer > > You can also remove DNF from your system if you only use Urpmi. > > 4- If you are sure there is no process involved with RPM, rerun mgaapplet in > a terminal as user: > mgaapplet & > > See after 5 minutes if it correctly computes looking for updates. Done all that and it seems to have cured the problem. ^^) BTW, I've never used anything else than rpm. so it would be cool to note somewhere to ask the user at install time for what update software it want to use, this should prevent such a problem in the future. Thanks Aurélien ^^) (In reply to Dave Hodgins from comment #2) > The fix is to run > rm -f /var/lib/rpm/__db.00* > rpm --rebuilddb > Hum, that's what I do with 'rpm -vv --rebuildb'... advantage is I don't have to erase anything before ^^) > The typical causes of any database corruption are things like hard poweroff, > or i/o errors (bad sectors, intermittent connections due to faulty cable or > connectors, etc). well, I got that problem at each install I've done on numerous machines since Mageia 4, old and brand new... > In most cases rpm can now handle it without having to manually run the above > commands. If it recurs after running those commands, the cause of the db > corruption needs to be identified and fixed. yeah, I've seen the new 'auto rpm db rebuild' switch in the install software, I've added it but got the problem as if not there... Let's see if the solution from Aurélien works well in long run. However, this can be closed as resolved for now... Status:
NEW =>
RESOLVED (In reply to Jean-François Bachelet from comment #3) > > Thanks Aurélien ^^) If this works, this is due to a race between dnf-makecache.timer which is triggered on this system at the same time mgaapplet is looking for update... Normally, - mgaapplet looks for updates 5 minutes after user logs on AND there is a network connection. - dnf-makecache.timer is OnBootSec=10min OnUnitInactiveSec=1h RandomizedDelaySec=60m Unit=dnf-makecache.service dnf-makecache should be triggerred 10 minutes after boot, then every 1 hour. not 5 minutes. Or, you did make mgaapplet to look for updates 10 minutes after boot in MCC... Note if you have removed dnf, Plasma Discover or Gnome Software applications will not be able to install new rpms from Mageia Repos. You should now use Rpmdrake. (In reply to Dave Hodgins from comment #2) > It's been long enough since I've seen the problem and it wasn't handled > automatically I had to dig through my old messages to find it. The last > time I replied to a person with a corrupted rpm db was in 2014. > > The fix is to run > rm -f /var/lib/rpm/__db.00* > rpm --rebuilddb > It's a long time for me, also, and I have to search the web. It must be more than 20 years since I was first aware of this sequence of commands, in Mandrake, when the need was more common. (In reply to Jean-François Bachelet from comment #4) > (In reply to Dave Hodgins from comment #2) > > > The fix is to run > > rm -f /var/lib/rpm/__db.00* > > rpm --rebuilddb > > > Hum, that's what I do with 'rpm -vv --rebuildb'... > advantage is I don't have to erase anything before ^^) > I don't see where you use > rm -f /var/lib/rpm/__db.00* or, do you say 'rpm -vv --rebuildb' includes it or makes it unnecessary? I tried # rpm -vv --rebuilddb 2>&1 | tee SomeFile.txt and looked through the text file for some evidence of this function but the file is long and I am not so expert. Not knowing any better, I'd suggest that not issuing the commands as Dave wrote would prevent the database rebuild from fixing the problem. CC:
(none) =>
rolfpedersen My understanding is that without first deleting /var/lib/rpm/__db.00* the rebuild, even with the verbose option specified twice, will do nothing to fix a corrupt database. arg. that's not enough, sadly. I got the bug again after two succesful checks the third brang the bug... I completely killed dnf, so what can trigger that again ??? Resolution:
FIXED =>
(none) (In reply to Jean-François Bachelet from comment #8) > arg. that's not enough, sadly. > > I got the bug again after two succesful checks the third brang the bug... > > I completely killed dnf, so what can trigger that again ??? Perhaps bad harddrive sectors... And near faulty device...? Also, you can copy paste here the process list while mgaapplet complains with: $ ps aux >> ~/process_running.txt And we try to see what's wrong on your system. Check the output of dmesg and "journalctl --no-hostname -b" for any obvious errors. Also "smartctl -a /dev/sda" (run smartctl for each hard drive). (In reply to Dave Hodgins from comment #10) > Check the output of dmesg and "journalctl --no-hostname -b" for any obvious > errors. Also "smartctl -a /dev/sda" (run smartctl for each hard drive). smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.30-desktop-1.mga8] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Blue Device Model: WDC WD10EZEX-00MFCA0 Serial Number: WD-WCC6Y2KV1XRC LU WWN Device Id: 5 0014ee 2b801e360 Firmware Version: 01.01A01 User Capacity: 1 000 204 886 016 bytes [1,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Apr 19 22:09:47 2021 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (12240) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 127) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 171 168 021 Pre-fail Always - 2408 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1340 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 15490 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1340 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35 193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 6689 194 Temperature_Celsius 0x0022 111 095 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. (In reply to Aurelien Oudelet from comment #9) > (In reply to Jean-François Bachelet from comment #8) > > arg. that's not enough, sadly. > > > > I got the bug again after two succesful checks the third brang the bug... > > > > I completely killed dnf, so what can trigger that again ??? > > Perhaps bad harddrive sectors... And near faulty device...? > > Also, you can copy paste here the process list while mgaapplet complains > with: > $ ps aux >> ~/process_running.txt > > And we try to see what's wrong on your system. As said it's not only My system, I get that bug on ALL systems I've installed since Mageia v4... and it where nearly all different. BTW, where I find the result of this command ? in my /home ? (In reply to Jean-François Bachelet from comment #12) > > > > Also, you can copy paste here the process list while mgaapplet complains > > with: > > $ ps aux > ~/process_running.txt > BTW, where I find the result of this command ? in my /home ? With the above command, you will find the process_running.txt file at the root of YOUR user directory. Please make the command with the less processes running. Close all applications like firefox, KMail or other before, so we will find the culprit, eventually. Created attachment 12659 [details] ps results after 'crash' (In reply to Aurelien Oudelet from comment #13) > (In reply to Jean-François Bachelet from comment #12) > > > > > > Also, you can copy paste here the process list while mgaapplet complains > > > with: > > > $ ps aux > ~/process_running.txt > Please make the command with the less processes running. > Close all applications like firefox, KMail or other before, so we will find > the culprit, eventually. here are the results : hello Aurelien, did you see something in the textfile that can be causing that havoc ? (In reply to Jean-François Bachelet from comment #15) > hello Aurelien, did you see something in the textfile that can be causing > that havoc ? Sadly NO. I see mgaapplet, but no other program that can lock the RPM database. Is /var directory on other partition than / (system root) ? Also please post the output of (run as root) "df -h|grep -v tmpfs" and "df -hi|grep -v tmpfs". (In reply to Aurelien Oudelet from comment #16) > Is /var directory on other partition than / (system root) ? nope :( only /, /home and /usr/local have their own partitions. (In reply to Dave Hodgins from comment #17) > Also please post the output of (run as root) "df -h|grep -v tmpfs" Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur /dev/sda2 32G 15G 16G 50% / /dev/sdi1 9,1T 7,3T 1,3T 85% /stk03 /dev/sdj1 2,7T 2,6T 39G 99% /sve0 /dev/sdh1 2,7T 2,4T 178G 94% /sve1 /dev/sdg1 2,7T 1,4T 1,2T 54% /stk1 /dev/sde1 2,7T 2,1T 535G 80% /sve2 /dev/sdf1 2,7T 18G 2,6T 1% /sve3 /dev/sda4 63G 175M 59G 1% /usr/local /dev/sda5 807G 641G 166G 80% /home and > "df -hi|grep -v tmpfs". Sys. de fichiers Inœuds IUtil. ILibre IUti% Monté sur /dev/sda2 2,0M 379K 1,7M 19% / /dev/sdi1 292M 4,6K 292M 1% /stk03 /dev/sdj1 175M 1,3M 174M 1% /sve0 /dev/sdh1 175M 41K 175M 1% /sve1 /dev/sdg1 175M 583K 175M 1% /stk1 /dev/sde1 175M 604K 175M 1% /sve2 /dev/sdf1 175M 32 175M 1% /sve3 /dev/sda4 4,0M 4,4K 4,0M 1% /usr/local /dev/sda5 52M 287K 51M 1% /home one ecerpt of the 'journalctl --pager-end --catalog' command ░░ L'unité (unit) UNIT s'est arrêtée et a consommé les ressources indiquées. avril 25 17:15:39 localhost.localdomain mgaapplet-update-checker[518285]: updating inactive backport media Core Backports, Nonfree Backports, Tainted Backports, Core 32bit Backports, Nonfree 32bit Backports, Tainted 32bit Backports avril 25 17:15:39 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Core Backports avril 25 17:15:39 localhost.localdomain pkexec[519099]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:39 localhost.localdomain pkexec[519099]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Core Backports] avril 25 17:15:40 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Nonfree Backports avril 25 17:15:40 localhost.localdomain pkexec[519113]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:40 localhost.localdomain pkexec[519113]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Nonfree Backports] avril 25 17:15:40 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Tainted Backports avril 25 17:15:40 localhost.localdomain pkexec[519139]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:40 localhost.localdomain pkexec[519139]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Tainted Backports] avril 25 17:15:40 localhost.localdomain vnstatd[4287]: Info: Monitoring (1): eno1 (1000 Mbit) avril 25 17:15:41 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Core 32bit Backports avril 25 17:15:41 localhost.localdomain pkexec[519207]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:41 localhost.localdomain pkexec[519207]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Core 32bit Backports] avril 25 17:15:41 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Nonfree 32bit Backports avril 25 17:15:41 localhost.localdomain pkexec[519220]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:41 localhost.localdomain pkexec[519220]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Nonfree 32bit Backports] avril 25 17:15:41 localhost.localdomain mgaapplet-update-checker[518285]: running: urpmi.update Tainted 32bit Backports avril 25 17:15:42 localhost.localdomain pkexec[519229]: pam_unix(polkit-1:session): session opened for user root by (uid=1000) avril 25 17:15:42 localhost.localdomain pkexec[519229]: voxel: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/libexec/urpmi.update Tainted 32bit Backports] avril 25 17:15:44 localhost.localdomain mgaapplet[7450]: Failed to open urpmi database I haven't seen other thing in connexion with mgaapplet there, btw this comes each time it tries to check updates. and rebuilding the database just works for the one or two next checks in the day, after a reboot the problem resurfaces... it's really #!?@ !!! How do you normally shut down the system? What's the output of "urpmq --list-url |grep core/release" and "journalctl --no-hostname -b -1 |tail -n 20"? (In reply to Dave Hodgins from comment #21) > How do you normally shut down the system? > always use the gui shutdown button. > What's the output of "urpmq --list-url |grep core/release" # urpmq --list-url | grep core/release Core Release http://magie.remi.lu/distrib/8/x86_64/media/core/release and > "journalctl --no-hostname -b -1 |tail -n 20"? # journalctl --no-hostname -b -l | tail -n 20 avril 25 18:39:53 systemd[4637]: Started Firefox - Navigateur Web. avril 25 18:39:53 plasmashell[7308]: file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:379: Unable to assign [undefined] to QString avril 25 18:39:54 mgaapplet[7450]: Failed to open urpmi database avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 7 threads of 3 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Successfully made thread 893457 of process 893280 owned by '1000' RT at priority 10. avril 25 18:39:56 rtkit-daemon[963]: Supervising 8 threads of 4 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 8 threads of 4 processes of 1 users. avril 25 18:39:56 rtkit-daemon[963]: Supervising 8 threads of 4 processes of 1 users. avril 25 18:39:57 rtkit-daemon[963]: Supervising 8 threads of 4 processes of 1 users. avril 25 18:39:57 rtkit-daemon[963]: Supervising 8 threads of 4 processes of 1 users. avril 25 18:40:00 vnstatd[4287]: Info: Monitoring (1): eno1 (1000 Mbit) avril 25 18:41:19 plasmashell[7308]: file:///usr/share/plasma/plasmoids/org.kde.plasma.kicker/contents/ui/ItemListDialog.qml:51: TypeError: Cannot read property 'separatorCount' of null avril 25 18:41:19 plasmashell[7308]: trying to show an empty dialog journalctl --no-hostname -b -1 |tail -n 20 Note, number one, not letter l. The purpose is to see if there were any errors in the prior shutdown. Use -2 etc., if you've rebooted more than once since the last good urpmi db open. (In reply to Dave Hodgins from comment #23) > journalctl --no-hostname -b -1 |tail -n 20 > > Note, number one, not letter l. The purpose is to see if there were any > errors in the prior shutdown. Use -2 etc., if you've rebooted more than > once since the last good urpmi db open. oups ! well I've gone back to -10 and the only error I see there is about the watchdog that sometimes does not stop before the full shutdown... # journalctl --no-hostname -b -9 | tail -n 20 avril 21 14:34:30 systemd[1]: systemd-remount-fs.service: Succeeded. avril 21 14:34:30 systemd[1]: Stopped Remount Root and Kernel File Systems. avril 21 14:34:30 systemd[1]: systemd-fsck-root.service: Succeeded. avril 21 14:34:30 systemd[1]: Stopped File System Check on Root Device. avril 21 14:34:30 systemd[1]: systemd-tmpfiles-setup-dev.service: Succeeded. avril 21 14:34:30 systemd[1]: Stopped Create Static Device Nodes in /dev. avril 21 14:34:30 systemd[1]: lvm2-monitor.service: Succeeded. avril 21 14:34:30 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. avril 21 14:34:30 systemd[1]: Reached target Shutdown. avril 21 14:34:30 systemd[1]: Reached target Final Step. avril 21 14:34:30 systemd[1]: systemd-reboot.service: Succeeded. avril 21 14:34:30 systemd[1]: Finished Reboot. avril 21 14:34:30 systemd[1]: Reached target Reboot. avril 21 14:34:30 systemd[1]: Shutting down. avril 21 14:34:30 systemd[1]: Hardware watchdog 'iTCO_wdt', version 0 avril 21 14:34:30 systemd[1]: Set hardware watchdog to 10min. avril 21 14:34:30 kernel: watchdog: watchdog0: watchdog did not stop! avril 21 14:34:31 systemd-shutdown[1]: Syncing filesystems and block devices. avril 21 14:34:31 systemd-shutdown[1]: Sending SIGTERM to remaining processes... avril 21 14:34:31 systemd-journald[595]: Journal stopped # journalctl --no-hostname -b -10 | tail -n 20 avril 21 11:09:44 systemd[1]: Removed slice system-systemd\x2dfsck.slice. avril 21 11:09:44 systemd[1]: Stopped target Local File Systems (Pre). avril 21 11:09:44 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... avril 21 11:09:44 systemd[1]: systemd-remount-fs.service: Succeeded. avril 21 11:09:44 systemd[1]: Stopped Remount Root and Kernel File Systems. avril 21 11:09:44 systemd[1]: systemd-fsck-root.service: Succeeded. avril 21 11:09:44 systemd[1]: Stopped File System Check on Root Device. avril 21 11:09:44 systemd[1]: systemd-tmpfiles-setup-dev.service: Succeeded. avril 21 11:09:44 systemd[1]: Stopped Create Static Device Nodes in /dev. avril 21 11:09:44 systemd[1]: lvm2-monitor.service: Succeeded. avril 21 11:09:44 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. avril 21 11:09:44 systemd[1]: Reached target Shutdown. avril 21 11:09:44 systemd[1]: Reached target Final Step. avril 21 11:09:44 systemd[1]: systemd-poweroff.service: Succeeded. avril 21 11:09:44 systemd[1]: Finished Power-Off. avril 21 11:09:44 systemd[1]: Reached target Power-Off. avril 21 11:09:44 systemd[1]: Shutting down. avril 21 11:09:44 systemd-shutdown[1]: Syncing filesystems and block devices. avril 21 11:09:44 systemd-shutdown[1]: Sending SIGTERM to remaining processes... avril 21 11:09:44 systemd-journald[601]: Journal stopped At this point I have no idea why the local rpm database is becoming corrupted to the point it can not be recovered automatically. My only suggestion is to ensure it's working now , rebuild the db as per comment 2 and the next time it fails immediately save the output of the commands "dmesg", "journalctl --no-hostname -b" and "journalctl --no-hostname -b -1" for analysis. There has to be some sort of file conflict or io error triggering the database corruption, which should be shown by messages in those log outputs. Created attachment 12679 [details] demsg and journal (In reply to Dave Hodgins from comment #25) > At this point I have no idea why the local rpm database is becoming corrupted > to the point it can not be recovered automatically. > > My only suggestion is to ensure it's working now , rebuild the db as per > comment 2 and the next time it fails immediately save the output of the > commands > "dmesg", in the attachement > "journalctl --no-hostname -b" and is way tooooo big > "journalctl --no-hostname -b -1" in the attachment > for > analysis. There has to be some sort of file conflict or io error triggering > the database corruption, which should be shown by messages in those log > outputs. a little note : I've rebuilded the databases then rebooted, waited for the auto update check, it worked fine, and waited for the next update check that failed. another note : when I dan't need to go to internet I cut the ethernet link (disconnect) until I need it again. in that cases, as soon as I reconnect mgaapplet send me the 'can't open the rpm databases' error... so there is a bug there as well... (In reply to Jean-François Bachelet from comment #27) > a little note : I've rebuilded the databases.... After claiming (In reply to Jean-François Bachelet from comment #4) > (In reply to Dave Hodgins from comment #2) > > > The fix is to run > > rm -f /var/lib/rpm/__db.00* > > rpm --rebuilddb > > > Hum, that's what I do with 'rpm -vv --rebuildb'... > advantage is I don't have to erase anything before ^^) Nowhere do you acknowledge the requirement to first run 'rm -f /var/lib/rpm/__db.00*' as the first step in rebuilding the database. Could you confirm that you are now following these *two* steps Dave supplied in comment #2 please? Thanks. Created attachment 12680 [details] proof (In reply to Rolf Pedersen from comment #29) > (In reply to Jean-François Bachelet from comment #27) > > a little note : I've rebuilded the databases.... > > Could you confirm that you are now following these *two* steps Dave supplied > in comment #2 please? Thanks. YES I have done it as said in comment 2 ! here is the content of the /var/lib/rmp directory after the rebuilding today at 13h32 : in attachment Good! The dmesg output doesn't show what ever is going wrong. What the "journalctl --no-hostname -b -1" output shows is that at 12:12:46 mgaapplet failed at 12:12:57 su to root was run, presumably to fix the db at 12:13:18 mgaapplet succeeded at 12:32:37 the system journal ends as part of the reboot If you still have the output of the "journalctl --no-hostname -b" command, run the command $ grep -e 'Packages are up to date' -e 'Failed to open urpmi database' journal.txt note: the above grep command is all one line. Replace journal.txt with the file name where the journal output is stored. That will show what time the last "Packages are up to date" message was displayed before the error was detected. Delete all lines in that journal file from before the last line with the mgaapplet "Packages are up to date" message. Delete all lines in the journal file from after the line with "Failed to open urpmi database" message. Compress the remaining lines. Is that still too big? Created attachment 12681 [details]
journal -b -2
first time I've tried it was printing more than 12000 lines...
this one is 2500 line, btw I hab to go back to -2 to get the two messages.
That one again shows mgaapplet failing, the db being fixed and then mgaapplet working. What's needed are the journal entries where in between when it's working and when it first fails. (In reply to Dave Hodgins from comment #34) > What's needed are the journal entries where in between when it's working and > when it first fails. I do it as soon as it breaks again ^^) Created attachment 12683 [details]
new boot of today and crash
Hello ^^)
well look at the attachment, in fact, I rebuild the database when mgaapplet throw me the eroor, then test for updates, then continue to work and then shutdown properly the machine when going to bed, in between mgaapplet worked fine.
then the next day I boot my machine and after 3 minutes get mgaapplet throwing the error again....
(In reply to Dave Hodgins from comment #25) > At this point I have no idea why the local rpm database is becoming corrupted > to the point it can not be recovered automatically. > > My only suggestion is to ensure it's working now , rebuild the db as per > comment 2 and the next time it fails immediately save the output of the > commands > "dmesg", "journalctl --no-hostname -b" and "journalctl --no-hostname -b -1" > for > analysis. There has to be some sort of file conflict or io error triggering > the database corruption, which should be shown by messages in those log > outputs. Hello Dave ^^) Little question, how can I ask for a 'journalctl....' with auto skipping of 'msec' messages ? (more than 300000 lines -and I stopped way before the end- is wayyyy tooo long ;)) Thanks by advance. Jeff Created attachment 12687 [details]
journal good :)
Hello ^^)
this time this should be good to see what's wrong with mgaapplet failing reppeatidly...
Thanks. Looking through this, the only obvious problem I see are the lines with smartd[1015]: Device: /dev/sdj [SAT], 1 Currently unreadable (pending) sectors If sdj is not where the database is stored, that is likely not relevant. I look through it more after I get some sleep. There's also ...
Device: /dev/sdg [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 113 to 114
I'm surprised there is no device error message associated with the msg
mgaapplet[8882]: Failed to open urpmi database
While sdg and sdj look somewhat questionable (likely to fail soon), there
is no indication of any problems with sda, where the rpm db is stored.
I see nothing in the log to indicate why the database is failing to open.
I'm at a loss for suggestions on how to debug this further.
Assigning to the kernel team who may have some idea or see something I've missed.
in attachment 12687 [details]Assignee:
bugsquad =>
kernel I had this issue too 1. When i configured the paranoic security level and after that the msec polices 2. After a few restars of the VM In the first case, i did not pay attention to it, because my mods But second case, i (without knowledge of this 28801), deleted the mageia mirror listing, because that mirror was slow to download packages and the mgaapplet worked again If i hit the issue again i will load the journal... CCing to me to keep as a tracker for me CC:
(none) =>
neoser10 Hello folks ^^) well another new machines (four) set up with new disks and new mageia 8 latests installs and... still that bug ! these time I have disabled the install CDrom 'core' and 'Core update' repos at install time, so ONLY the online repos are used when installing and for the later updates. that changes nothing, the bug still resurface ever now and then, that works after a db rebuild for one-two times then re-bug... that's getting very unpleasant. DNF is NOT installed nor running. what I've seen is that after each use of the MCC for whatever I need to do, after quitting it the rpm database stays locked and the __db_00* files are let in the rpm directory... so could it be the auto-update of repos that bring the bug ? Can you see in the journal, the last operation before you experience the problem, if/how it fail? > so could it be the auto-update of repos that bring the bug ? Wonder if this could be an effect of downloader failing. https://wiki.mageia.org/en/Mageia_8_Errata#Downloading_software So for a shot in the dark, change downloader to wget. CC:
(none) =>
fri Hello ^^) here are my findings from searching what happened before the mgaaplet fail : 1 / each time you use the MCC it locks the rpm database and create the __db.00x files, and when you exit it it let that files in the the /var/lib/rpm directory :( so when mgaaplet starts it find the __db.00x files and fail... (I have constated that when it find the xxx.lock files but no __db.00x files that doesn't stop it to make its job fine) --> possible cure 1 : force MCC delete that __db.00x files before exiting. 2 / each time MSEC starts, it soon run scripts from the /usr/share/msec/scripts directory, one of these scripts is devoted to rpm ==> 03_rpm.sh. this script upon run create the __db.00x files too and let it in the /var/lib/rpm directory after its job just like the MCC :( so when mgaaplet start it find that __db.00x files and fail again :( --> possible cure 2 : modify the 03_rpm.sh script so it delete the __db.00x files from the /var/lib/rpm directory before it exit running. for now I've not found other possible stuff that could create that ?!#*! __db.00x files... what I can understand is that that __db.00x files are useful for MCC and the 03_rpm.sh script only when that two are running and serve nothing and no-one more after. and when running 'rpm -vv --rebuilddb' it seems to ignore and then erase that __db.00x and xxx.lock files upon exiting... So, from that findings I think it will be wise to make the two modifications I suggest to MCC and 03_rpm.sh script to possibly cure the mgaaplet failing with 'unable to open the rpm database' error for good. another way of doing that will be to NOT store that __db.00x files on disk but make MCC and 03_rpm.sh script keep it in memory or /tmp, so it will vanish from itself when the two exits and not interfere with mgaaplet job... what do you think of that Mr Devs ? This really is for our tools experts Assignee:
kernel =>
mageiatools |