Bug 16197 - RPM database corrupted, reason unknown
Summary: RPM database corrupted, reason unknown
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 5
Hardware: i586 Linux
Priority: Normal critical
Target Milestone: ---
Assignee: RPM stack maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-24 13:51 CEST by Luis Menina
Modified: 2018-08-09 16:12 CEST (History)
4 users (show)

See Also:
Source RPM: rpm
CVE:
Status comment:


Attachments
journalctl log (7.55 KB, text/plain)
2015-06-25 00:50 CEST, Luis Menina
Details
journalctl -a output since last boot (16.47 KB, application/x-xz)
2015-06-30 17:41 CEST, Marja Van Waes
Details
dmesg (41.03 KB, text/plain)
2015-06-30 17:42 CEST, Marja Van Waes
Details
result of: journalctl -a | grep -v gnome-session (233.13 KB, application/octet-stream)
2015-07-01 00:40 CEST, Luis Menina
Details

Description Luis Menina 2015-06-24 13:51:47 CEST
Description of problem:
When trying to install the gparted package using urpmi in the command line, I had a message saying that the RPM db is corrupted.

Version-Release number of selected component (if applicable):
How can I know since the db is corrupted ? I did no updates, so I'm pretty sure it's the rpm version of the Mageia release.

How reproducible:
Happenened once, after one week of light use.

Steps to Reproduce:
I don't know at which point the db was corrupted, I just noticed it when trying to install a new package.

Reproducible: 

Steps to Reproduce:
Comment 1 Luis Menina 2015-06-24 13:53:24 CEST
I know the bug report is incomplete, I wanted to know which files I need to attach to it so you can investigate the problem. I'll attach them with the exact error message tonight.
Comment 2 Marja Van Waes 2015-06-24 14:05:18 CEST
Hi Luis,


This happens more often, as you can read in our forums, e.g. here:
https://forums.mageia.org/en/viewtopic.php?f=15&t=1646

It is usually not a bug that causes this, as explained here:
http://www.oldrpm.org/hintskinks/repairdb/

To fix it:

run (as root), the following commands

  rm -f /var/lib/rpm/__db*
  rpm -vv --rebuilddb

If that doesn't fix it, then please do it again, and *attach* the output of both commands

Keywords: (none) => NEEDINFO
CC: (none) => marja11

Comment 3 Luis Menina 2015-06-24 17:52:40 CEST
Hi Marja, thanks for the fixing options, but my concern is that it's not the kind of user experience that can be expected. I'll be able to sort things out, but many won't.

I've been using Mandrake/Mandriva/Mageia for the past 10 years, and RPM db corruption only happened to me a couple of times. I fail to see how this "is usually not a bug", especially from 10+ years-old mail threads. The machine had *no* unexpected reboots, hangs or similar, and AFAIK no interrupted package management action, so there was no event that made this comportment expected.

I'll backup the files in /var/lib/rpm in case they're needed for analysis, but if there's something else to backup, then please tell it before that data is lost.
Comment 4 Marja Van Waes 2015-06-24 18:25:57 CEST
(In reply to Luis Menina from comment #3)
> The machine
> had *no* unexpected reboots, hangs or similar, and AFAIK no interrupted
> package management action, so there was no event that made this comportment
> expected.
> 
> I'll backup the files in /var/lib/rpm in case they're needed for analysis,
> but if there's something else to backup, then please tell it before that
> data is lost.

Can you attach the output of (as root, please adjust the first date and time to when you last started urpmi without getting an error, and the last date and time to when you first got the error when using urpmi):

journalctl -a --since "2015-06-22 18:50:24" --until "2015-06-24 10:00:18"
Comment 5 Luis Menina 2015-06-25 00:50:28 CEST
Created attachment 6775 [details]
journalctl log

Last successfully installed package is libvdpau-driver-r300. I discovered the bug while trying to install gparted. The log is filtered to keep only information about rpm tools. Please tell me if you need non-filtered logs.
Comment 6 Luis Menina 2015-06-25 00:57:14 CEST
BTW, the commands you gave me in comment 2 fixed the db, thanks :)
Comment 7 Luis Menina 2015-06-30 00:23:35 CEST
Happened again... There is a serious problem here.

Keywords: NEEDINFO => (none)

Marja Van Waes 2015-06-30 00:29:32 CEST

Attachment 6775 mime type: text/x-log => text/plain

Comment 8 Marja Van Waes 2015-06-30 00:39:47 CEST
(In reply to Luis Menina from comment #7)
> Happened again... There is a serious problem here.

CC'ing maintainer

CC: (none) => thierry.vignaud
Summary: RPM database corrupted => RPM database corrupted 2x without obvious reason

Comment 9 Thierry Vignaud 2015-06-30 09:01:50 CEST
There's no errors in your logs...
Comment 10 Marja Van Waes 2015-06-30 12:17:23 CEST
Comment on attachment 6775 [details]
journalctl log

(In reply to Thierry Vignaud from comment #9)
> There's no errors in your logs...

so they should have been there (I wasn't sure)

@ Luis

please do as root

journalctl -a > journalctl-a.txt

xz journalctl-a.txt

and then attach journalctl-a.txt.xzg

Attachment 6775 is obsolete: 0 => 1

Marja Van Waes 2015-06-30 12:17:41 CEST

Keywords: (none) => NEEDINFO

Comment 11 Marja Van Waes 2015-06-30 12:18:37 CEST
minus the "g":
attach journalctl-a.txt.xzhhhhhhhhhhhh
Comment 12 Marja Van Waes 2015-06-30 12:19:26 CEST
what is going on with my keyboard? :-(
Comment 13 Marja Van Waes 2015-06-30 16:11:13 CEST
(In reply to Marja van Waes from comment #10)
> Comment on attachment 6775 [details]
> journalctl log
> 
> (In reply to Thierry Vignaud from comment #9)
> > There's no errors in your logs...
> 
> so they should have been there (I wasn't sure)
> 

wrong

I just hit the issue, too, on a fresh 32bit Mga5 install, and there's nothing in the logs.

I hit it after using drakrpm-editmedia to add core_updates/testing

[root@Mga5_32bit marja]# urpmi --auto-update
medium "Core Release (BlueHD1)" is bijgewerkt
medium "Core Updates (BlueHD3)" is bijgewerkt
medium "Core Updates Testing (BlueHD5)" is bijgewerkt
medium "Nonfree Release (BlueHD11)" is bijgewerkt
medium "Nonfree Updates (BlueHD13)" is bijgewerkt
fout: rpmdb: BDB0113 Thread/process 9314/3069204224 failed: BDB1507 Thread died in Berkeley DB library
fout: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
fout: cannot open Packages index using db5 -  (-30973)
fout: cannot open Packages database in /var/lib/rpm
kon rpmdb niet openen
[root@Mga5_32bit marja]#

There is nothing in the journal but:

jun 30 16:08:29 Mga5_32bit urpmi[8471]: called with: --auto-update
Comment 14 Thierry Vignaud 2015-06-30 16:35:51 CEST
So the reason _is_ obvious: a thread died while keeping rpmdb open...
You need to identify what on your machine access it and which one crashed

Summary: RPM database corrupted 2x without obvious reason => RPM database corrupted 2x

Comment 15 Marja Van Waes 2015-06-30 17:05:20 CEST
(In reply to Thierry Vignaud from comment #14)
> So the reason _is_ obvious: a thread died while keeping rpmdb open...

What's obvious to you is a mystery to others ;-)

Thx for the comment, though, now I understand that "Thread died in Berkeley DB library" was about an earlier thread that caused the issue, not about a thread that died now because of it

> You need to identify what on your machine access it and which one crashed

I'll google for information on how to do that.
Comment 16 Thierry Vignaud 2015-06-30 17:20:32 CEST
For the record, those are fresh install, not updated systems?
There's nothing in dmesg or journalctl about any crash, segfault, ...
Comment 17 Marja Van Waes 2015-06-30 17:41:29 CEST
Created attachment 6792 [details]
journalctl -a output since last boot

(In reply to Thierry Vignaud from comment #16)
> For the record, those are fresh install, not updated systems?
For me - because burning a more recent DVD in cauldron was impossible with all tools I tried - it was a fresh install with the QA 32bits DVD from May 27.

I've updated after installing, so I should have the same packages as were on the final iso.

> There's nothing in dmesg or journalctl about any crash, segfault, ...

I'm well known for my talent to be blind for what's right in front of me, so feel free to check the output
Comment 18 Marja Van Waes 2015-06-30 17:42:03 CEST
Created attachment 6793 [details]
dmesg
Comment 19 Marja Van Waes 2015-06-30 17:43:11 CEST
(In reply to Thierry Vignaud from comment #16)
> For the record, those are fresh install, not updated systems?
> There's nothing in dmesg or journalctl about any crash, segfault, ...

@ Luis

Please answer those questions, too
Comment 20 Luis Menina 2015-06-30 23:41:27 CEST
Fresh install 32 bits too.

Forgot to attache error message last time, as I thought the logs were enough. That's a thread dying too:

erreur : rpmdb: BDB0113 Thread/process 2002/3072952064 failed: BDB1507 Thread died in Berkeley DB library
erreur : db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
erreur : cannot open Packages index using db5 -  (-30973)
erreur : impossible d'ouvrir la base de données paquet dans /var/lib/rpm
impossible d'ouvrir la base de donnée rpm
Comment 21 Luis Menina 2015-07-01 00:37:03 CEST
I'll check dmsg next time I have the problem.

As for journalctl, I found nothing about rpm error... I'm attaching a new log, I just removed gnome-session traces (for privacy).
Comment 22 Luis Menina 2015-07-01 00:40:45 CEST
Created attachment 6794 [details]
result of: journalctl -a | grep -v gnome-session
Comment 23 Thierry Vignaud 2015-07-01 06:49:08 CEST
Humm, in Marja case, sd_festival segfaulted but I don't think it should interfere with rpmdb.
You can try uninstall speech-dispatcher though.

In Luis case, WebKitPluginPro & NetworkManager segfaulted but that was not recent.
You didn't said at which time you'd the errors.
Comment 24 Luis Menina 2015-07-01 12:30:34 CEST
That's because it's hard to guess. Did the corruption happen just after I successfully installed my last software? Or just before the following time I tried to install a software? I don't know. And I don't install software every day.

Maybe I could call a script every 5 minutes, that would check the state of the db, so that I catch the exact moment thing go wrong. Is there a command to verify the state of the rpm db? Or the exit status of a simple rpm query would be enough?

About when I had the issues, the first time it was between installation of libvdpau-driver-r300 (which worked) and installation of gparted (which didn't work). A grep on urpmi commands in the journalctl log will give you the time span. I don't remember the span for the second occurrence of the bug, though.
Samuel Verschelde 2015-07-06 17:23:01 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=16203

Samuel Verschelde 2015-07-06 17:23:16 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=13835

Samuel Verschelde 2015-07-06 17:24:10 CEST

Summary: RPM database corrupted 2x => RPM database corrupted, reason unknown

Comment 25 Florian Hubold 2015-07-07 08:50:59 CEST
@Thierry: Just a shot in the dark - maybe something is accessing the RPM DB in parallel, like apper or packagekit? You know as some media players have packagekit integration and such stuff they might run anytime ...
IIRC they don't apply the same thorough locking mechanisms as urpmi does, so might that be an explanation that you see no errors or crashes from urpmi in the logs?

I've not played with a fresh mga5 install yet, so can't tell about if apper or packagekit are enabled by default and what they do.

CC: (none) => doktor5000

Comment 26 André DESMOTTES 2015-11-19 16:14:36 CET
Luis is here in Paris Open source Summit and ask why the status is NEEDINFO ?
Thanks

CC: (none) => lebarhon

Comment 27 Samuel Verschelde 2015-11-19 16:17:53 CET
I suppose he and we forgot to remove the NEEDINFO keyword once the information had been given. Removing it.

Keywords: NEEDINFO => (none)

Comment 28 Marja Van Waes 2016-10-16 17:13:30 CEST
Btw, is it true, as suggested in chapter 4.4.3.2 of https://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/pdf/RPM_Guide/Fedora_Draft_Documentation-0.1-RPM_Guide-en-US.pdf 
that one should always make a backup of the rpm database (or of at least /var/lib/rpm/Packages) before and after updating/installing/removing packages?

In other words: would it make sense to ask to add an auto-rpm_db-backup_&_restore feature to dnf/dnfdragora and urpmi/rpmdrake/rpmdragora?

Assignee: bugsquad => rpmstack

Comment 29 Luis Menina 2017-04-18 14:04:11 CEST
(In reply to Marja van Waes from comment #28)
> In other words: would it make sense to ask to add an
> auto-rpm_db-backup_&_restore feature to dnf/dnfdragora and
> urpmi/rpmdrake/rpmdragora?

Not really, as it would slow down installation/removal, and the rpm db can be reconstructed. Maybe the reconstruction could be made more easy, but the real issue is avoiding the corruption in the first place.

As a side note, my main hard drive crashed on this system last week, and I installed Mageia 6 sta2 in a new drive, so I basically won't be able to check this anymore. It's a bit sad, as I had this problem very frequently for the whole Mageia 5 life (I basically had to regenerate the RPM database on each session I wanted to install some software). I just hope not many newcommers were bit by this bug.
Comment 30 Marja Van Waes 2018-07-25 13:46:11 CEST
(In reply to Luis Menina from comment #29)

> 
> As a side note, my main hard drive crashed on this system last week, and I
> installed Mageia 6 sta2 in a new drive, so I basically won't be able to
> check this anymore. It's a bit sad, as I had this problem very frequently
> for the whole Mageia 5 life (I basically had to regenerate the RPM database
> on each session I wanted to install some software). I just hope not many
> newcommers were bit by this bug.

No one replied here to tell they hit this issue, too, and I don't think you or I hit it in Mageia 6 or cauldron, so let's close this report as OLD (since Mageia 5 is no longer officially maintained).

Feel free to reopen if needed, and adjust the "Version:" in the upper left of this report.

Resolution: (none) => OLD
Status: NEW => RESOLVED

Comment 31 Luis Menina 2018-08-09 16:12:51 CEST
(In reply to Marja Van Waes from comment #30)
> No one replied here to tell they hit this issue, too, and I don't think you
> or I hit it in Mageia 6 or cauldron

Yeah, never had this problem with Mageia 6.

Note You need to log in before you can comment on or make changes to this bug.