Bug 26827 - urpmi fails due to erroneous downloads in cache, normal users get stuck
Summary: urpmi fails due to erroneous downloads in cache, normal users get stuck
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: High enhancement
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard: MGA7TOO
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-19 17:02 CEST by Morgan Leijström
Modified: 2020-09-19 18:08 CEST (History)
8 users (show)

See Also:
Source RPM: perl-URPM
CVE:
Status comment:


Attachments

Description Morgan Leijström 2020-06-19 17:02:09 CEST
Created from https://ml.mageia.org/l/arc/dev/2020-06/msg00309.html


Description of problem:
When an update fails, users restart a new update and find the same error.
They think that a new install is necessary.
There is no hint given to the user about how to solve it.
"package xxx does not verify: Payload SHA256 digest: BAD" is not much help...


How reproducible, Steps to Reproduce:
This may happen on a network break.


Manual workaround:
It is enough to do "urpmi --clean" to delete bad rpms in cache.

   * BUT NORMAL USERS DO NOT KNOW *  and think the system is corrupt.


Suggestion:
If urpmi detects that rpms from cache are bad, it should delete them and retry.

If an automatic clean can be done without drawback, please implement it.

Another way is to ask the user on error, if they would like to retry with clean cache.

A third way is to use a checkbox "clean the cache".


Examples from bugzilla, mistaken for other faults
 - even us more advanced users waste time on this trap:
Bug 26159
Bug 26323
Bug 25949
Bug 25650
Bug 25111
Bug 25647


Side note from the mail thread:
It seems that the bad rpms are very frequent in cache, more than in the past. It deserve surely to look for what is the main
reason.
Morgan Leijström 2020-06-19 17:06:02 CEST

Assignee: bugsquad => mageiatools
Priority: Normal => High
Whiteboard: (none) => MGA7TOO

Comment 1 Morgan Leijström 2020-06-19 18:28:29 CEST
Also from that mail thread:

"
One way to avoid network errors is to put «download-all» on a single line by itself inside the top curly brackets in /etc/urpmi/urpmi.cfg
Then the installation won't start untill all rpms are downloaded.
This presumes that the cache directory have enough free space to hold all the needed rpms in one go..

I.e.
{
download-all
}

  "


Maybe that can be made a check box in some GUI?
Set the checkbox according to file, then let user toggle it.
Comment 2 Lewis Smith 2020-06-19 21:47:56 CEST
We have had similar bugs before, where pkg installations or updates repeatedly fail because of a download corruption which stays in the rpm cache; resolved as Morgan notes by clearing the cache:
 # urpmi --clean
and re-starting the operation.
Comment 3 Pierre Jarillon 2020-06-20 00:56:05 CEST
# urpmi --clean is the solution but how a new user can guess this?
If a corrupt package is found it should be deleted. If not, each new update is locked.

CC: (none) => jarillon

Comment 4 Dave Hodgins 2020-06-20 01:44:31 CEST
The problem is that urpmi doesn't know if the package is corrupt because the
download was interrupted (so not all there), corrupted during download, or if
the package on the mirror is corrupt.

We don't want to repeatedly download the package if it's corrupt on the mirror,
as that could cost people money for exceeding data caps. That's one of the
major problems I've seen with windows.

If the file download was interrupted, the choice should be to resume. For
corrupted during download, the only option is to clean. For corrupt on the
mirror, it should be reported it and then wait till it's fixed.

We need a better way to ensure users are informed of the situation and the
options.

Perhaps add a file with the explanation, and alter the message to display that
file, either on the terminal output or in a gui dialog with the option to
either try resuming the download or to clean the cache, and explaining that
if the error occurs again, it should be reported.

CC: (none) => davidwhodgins

aguador 2020-06-20 08:25:05 CEST

CC: (none) => waterbearer54

Comment 5 Bernard SIAUD 2020-06-20 11:10:36 CEST
With my cauldron, I sometimes have a similar problem.
When a rpm is not good, or when some dependencies are missing, nothing comes up. It would be good to install rpm anyway.

CC: (none) => liste

Comment 6 papoteur 2020-06-20 11:47:42 CEST
From Liam Quin:
>
>> This happens on a network break.
>
> One way to avoid network errors is to put «download-all» on a single
> line by itself inside the top curly brackets in /etc/urpmi/urpmi.cfg
> Then the installation won't start untill all rpms are downloaded.
> This presumes that the cache directory have enough free space to hold
> all the needed rpms in one go..

I don't think that's a good presumption - and if it is wrong, a full /
partition is also a difficult problem for a new user.

A laptop, often with a small root partition, might travel a lot,
perhaps being restarted several times a day for a student.

The answer is that if an rpm file is detected as corrupted, then
(1) it's incomplete, should be in the "partial" folder, and could be
resumed automatically, or
(2) it's corrupted and should be deleted, or
(3) you need to upgrade rpm itself as there was a backwards
incompatibility introduced, or
(4) it's corrupted on the mirror(s).

In cases (3) and (4) it is necessary to skip the file and move on.
In case (1) urpmi should resume automatically.
Detecting the difference between case (2), local corruption, and case
(4), server-side corruption, maybe means checking the checksum from the
server first and trying again only if they differ, right?

CC: (none) => yves.brungard_mageia

Comment 7 Nicolas Lécureuil 2020-06-20 23:45:55 CEST
way to reproduce: 

Pick a package you don't have installed. I choose 0ad-data because it was at the top of the list (and I have a local repo, so don't care how big it is). Then:

# urpmi --noinstall 0ad-data
# echo bad >> /var/cache/urpmi/rpms/0ad-data-0.0.23b-1.mga7.noarch.rpm
# urpmi --test 0ad-data
installing 0ad-data-0.0.23b-1.mga7.noarch.rpm from /var/cache/urpmi/rpms
Preparing... #########################################################################################################################################
Installation failed:    package 0ad-data-1:0.0.23b-1.mga7.noarch does not verify: Payload SHA256 digest: BAD (Expected 9d1d05c953161efc409127674c852bf2fd2d30c9285eb6d7fe9129e7198e5035 != dfaceab15dcd59f2f91c9b2730d57c8487b2e6bd0585966783ced62eb5e372cf)

CC: (none) => mageia

Marc Mascré 2020-06-21 10:26:10 CEST

CC: (none) => marc

Comment 8 Lewis Smith 2020-06-22 21:41:06 CEST
Well, Morgan has certainly started a worms' nest!
I think Papoteur's comment 6 gets nearest. The root of the problem is that when urpmi detects a bad checksum, it just gives up. We know that this is most common with bad downloads, and the neat solution is to delete the offending package from the cache, & re-download it.
[--clean is a sledgehammer to crack a nut, re-downloading a lot of OK stuff].

If the re-download yields the same failure, what then?
Remove the package from the list (which should take with it any dependent pkgs), inform the user, and do the others.
User to inform Mageia about the bad package on the mirror.
Comment 9 papoteur 2020-07-02 09:39:51 CEST
(In reply to Lewis Smith from comment #8)
> I think Papoteur's comment 6 gets nearest.
I just reported what Liam Quin commented.
Comment 10 papoteur 2020-08-28 11:01:39 CEST
Hello,
Any ideas for progress on this issue?
Comment 11 Dave Hodgins 2020-08-28 21:03:08 CEST
No change. Comment 4 still applies
Comment 12 Pascal Terjan 2020-08-28 21:31:13 CEST
The second part of the proposal in comment #8 seems tricky, urpmi first decides what to install, then groups things into transactions, the downloads and installs for the current transaction. 

"Remove the package from the list (which should take with it any dependent pkgs), inform the user, and do the others." wouldn't fit that model so it would require some sort of restart of urpmi with that package excluded.

Urpmi does verify the checksum before moving the the package from /var/cache/urpmi/partial to /var/cache/urpmi/rpms when the download completes so it seems the case #4 in comment #6 would not lead to this problem but to an earlier error during the download which would be explicit that it is a problem on the mirror, so it should be safe to delete the file and download it again when this problem happens.

So I would vote for deleting the file and still failing, so that when the user tries again running urpmi it gets downloaded again and should work.

CC: (none) => pterjan

Comment 13 Pascal Terjan 2020-08-28 21:32:00 CEST
(In reply to Dave Hodgins from comment #4)
> The problem is that urpmi doesn't know if the package is corrupt because the
> download was interrupted (so not all there), corrupted during download, or if
> the package on the mirror is corrupt.

It does, in the first case it would be in partial/ directory rather than in rpms/
Comment 14 Pascal Terjan 2020-09-09 18:39:05 CEST
There is already some code to delete bad rpms from cache but it only triggers if adding the package to the transaction fails which doesn't happen for corrupted files (so not sure it would ever trigger).

in install.pm:_schedule_packages

            if ([...] $trans->add($true_pkg || $pkg, update => $update,...) {

            } else {
                $urpm->{error}(N("unable to install package %s", $mode->{$_}));
                my $cachefile = "$urpm->{cachedir}/rpms/" . $pkg->filename;
                if (-e $cachefile) {
                    $urpm->{error}(N("removing bad rpm (%s) from %s", $pkg->name, "$urpm->{cachedir}/rpms"));
                    unlink $cachefile or $urpm->{fatal}(1, N("removing %s failed: %s", $cachefile, $!));
                }
            }
Comment 15 Pascal Terjan 2020-09-09 21:41:03 CEST
I created a corrupted package, I think the problem is that URPM::verify_signature succeeds on such package with correct header but corrupted payload:

$ perl -e 'use URPM; print URPM::verify_signature("/var/cache/urpmi/rpms/ruby-glib2-3.4.1-1.mga8.x86_64.rpm");'
OK (RSA/SHA256, Mon 02 Mar 2020 23:52:39 UTC, Key ID b742fa8b80420f66)

$ rpmkeys -Kv /var/cache/urpmi/rpms/ruby-glib2-3.4.1-1.mga8.x86_64.rpm 
/var/cache/urpmi/rpms/ruby-glib2-3.4.1-1.mga8.x86_64.rpm:
    Header V4 RSA/SHA256 Signature, key ID 80420f66: OK
    Header SHA256 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: BAD (Expected c748148ffe7dc294ddd21612cfe272354e6465e98fe6d6538cc427cfc91c1139 != fa5689a6fc10ee60fc39f530fd2a7b3c00e5343c7618c0f959fcade9f07eef15)
    Payload SHA256 ALT digest: NOTFOUND
    V4 RSA/SHA256 Signature, key ID 80420f66: BAD
    MD5 digest: BAD (Expected ffbfa82fce224c6e2e154456622f4bc2 != ce63fffb1e47a24050d6472c5a9db0cb)
Pascal Terjan 2020-09-09 21:41:16 CEST

Source RPM: urpmi => perl-URPM

Comment 16 Pascal Terjan 2020-09-09 22:25:00 CEST
So the problem seems to be the reliance on rpmReadPackageFile which is intended to read the header and verify its signatures and digests

The function to use seems to be rpmpkgVerifySigs
Comment 17 Pascal Terjan 2020-09-09 22:46:35 CEST
rpmpkgVerifySigs is static :(

Simple hacky attempt using rpmVerifySignatures detects a problem:

[pterjan@mageia perl-URPM]$ git diff URPM.xs
diff --git a/URPM.xs b/URPM.xs
index 0b31798..1c43ef9 100644
--- a/URPM.xs
+++ b/URPM.xs
@@ -3201,7 +3201,8 @@ Urpm_verify_signature(filename, prefix=NULL)
   char result[1024];
   rpmRC rc;
   FD_t fd;
-  Header h;
+  Header h = NULL;
+  struct rpmQVKArguments_s unused_qva;
   CODE:
   fd = Fopen(filename, "r");
   if (fd == NULL)
@@ -3212,7 +3213,7 @@ Urpm_verify_signature(filename, prefix=NULL)
     rpmtsSetRootDir(ts, prefix);
     rpmtsOpenDB(ts, O_RDONLY);
     rpmtsSetVSFlags(ts, RPMVSF_DEFAULT);
-    rc = rpmReadPackageFile(ts, fd, filename, &h);
+    rc = rpmVerifySignatures(&unused_qva, ts, fd, filename);
     Fclose(fd);
     *result = '\0';
     switch(rc) {
[pterjan@mageia perl-URPM]$ perl -Iblib/arch -Iblib/lib -e 'use URPM; print URPM::verify_signature("/var/cache/urpmi/rpms/ruby-glib2-3.4.1-1.mga8.x86_64.rpm"); print "\n";'
/var/cache/urpmi/rpms/ruby-glib2-3.4.1-1.mga8.x86_64.rpm: DIGESTS SIGNATURES NOT OK
NOT OK (signature not found): (no error)

I guess one way to do it would be to so the same as rpmpkgVerifySigs (call rpmpkgRead + rpmvsVerify)...
Comment 18 Aurelien Oudelet 2020-09-19 18:08:58 CEST
Hi,
This is High priority bug for a good reason.

Making Mageia even better than ever is best direction.
In order to do right thing, this bug should be examined and fixed as soon as possible.

Packagers, please make the status to Assigned when you are working on this.
Feel free to reassign the bug if bad-triaged. Also, if bug is old, please close it.

On October 1st 2020, we will drop priority to normal.

Note You need to log in before you can comment on or make changes to this bug.