Bug 5312

Summary: need to make sure faulty files are removed from urpmi-proxy cache
Product: Mageia Reporter: Morgan Leijström <fri>
Component: RPM PackagesAssignee: AL13N <alien>
Status: NEW --- QA Contact:
Severity: normal    
Priority: Low CC: mageiatools, marja11
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard: OK
Source RPM: urpmi-proxy-0.4.0-4.mga5.src.rpm CVE:
Status comment:
Attachments: ls -l of the 0.something files.
tgz of excerpt from urpmi-proxy_log and access_log that time on 2014-10-24
file 0.30156800460918418144

Description Morgan Leijström 2012-04-09 21:40:24 CEST
Description of problem:
I found files in the cache that had zero length.
But real files on the repo urpmi fetch from are normal.

Symptom:
Clients say they failed to retrieve packages, while urpmi-proxy log say the files were sent from cache. 

Workaround:
Delete the bad files.

Version-Release number of selected component (if applicable):
urpmi-proxy-0.2.4-3.mga2.src.rpm

Steps to Reproduce:
Probably it was caused because the disk got full at about the same time... i can try dig into the log if you want.
(It may also be induced by my bad connection and rebooting router, or some transient fault at repo i connect to?)

However it was caused maybe there should be arranged a function that removes files if there was problem to create them?
Morgan Leijström 2012-04-09 21:48:25 CEST

Assignee: bugsquad => alien

Comment 1 AL13N 2012-04-09 22:38:05 CEST
well, if you look at the logs, it checks if the file exists and is new enough, so i have way of knowing if that file is correct or not. (i check the modify date).

the clients fetching them, don't tell the host that the file is invalid...

maybe i need better error checking to remove incompletely stored files (which is likely how it happend)

i could check for zero-length files, but that doesn't solve all such problems...

WDYT?
AL13N 2012-04-09 22:50:09 CEST

Summary: Erroneously zero length cached files => need to make sure faulty files are removed

Comment 2 Morgan Leijström 2012-04-11 15:02:03 CEST
Well it is a start.

I just checked one of the rpms (radeon-firmware), and it got downloaded successfully by a client the first time, but when later another client should get it it was served the zero length.
Comment 3 AL13N 2012-04-11 19:19:39 CEST
it is possible that there was not enough space to store the file completely at the time.

even if this happens, the client can still get the file properly. because during transfer (in-memory), it gets duplicated to output stream (in memory), and at the same time also to cache storage (for each small part)...

this means that if the storage failed, the files should get through anyway.

alas, it seems they also store zero length files...

so, i think it's possible if i fix this, that this will not be a problem anymore.

i'll make sure to test on low-disk size conditions
Comment 4 Morgan Leijström 2012-04-11 20:41:23 CEST
Yes. As i said in #0 the disk got full.

Being at that subject... tell me when you have some function cleaning up old files ready to be tested :)
Comment 5 AL13N 2012-04-11 23:05:46 CEST
will do, but that'll likely not be before mga3 release...
Comment 6 Marja Van Waes 2012-05-26 13:05:22 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 7 Marja Van Waes 2012-07-06 15:05:59 CEST
Please look at the bottom of this mail to see whether you're the assignee of this  bug, if you don't already know whether you are.


If you're the assignee:

We'd like to know for sure whether this bug was assigned correctly. Please change status to ASSIGNED if it is, or put OK on the whiteboard instead.

If you don't have a clue and don't see a way to find out, then please put NEEDHELP on the whiteboard.

Please assign back to Bug Squad or to the correct person to solve this bug if we were wrong to assign it to you, and explain why.

Thanks :)

**************************** 

@ the reporter and persons in the cc of this bug:

If you have any new information that wasn't given before (like this bug being valid for another version of Mageia, too, or it being solved) please tell us.

@ the reporter of this bug

If you didn't reply yet to a request for more information, please do so within two weeks from now.

Thanks all :-D
AL13N 2012-07-06 18:00:31 CEST

Keywords: NEEDINFO => (none)
Priority: Normal => Low
Whiteboard: (none) => OK

Comment 8 Morgan Leijström 2014-11-17 19:00:37 CET
IIRC i had similar problems again a couple months ago; clients was served zero byte files.
I found files of zero length, and had to delete them.
Comment 9 AL13N 2014-11-17 20:17:42 CET
by any chance do you have some logs of urpmi-proxy that generates those zero-sized files?

i could program not not have zero-sized files, but that's kind of wrong... otoh, if somehow you have not enough disk space or for whatever reason your server had to stop/be rebooted, there could be files that are incomplete...

i can try to have some kind of code that delete those if an error happened during transfer or something...? i have seen this problem myself, but i can't exactly reproduce it with the necessary logs to see where exactly the problem is...
Comment 10 Morgan Leijström 2014-11-18 05:48:21 CET
I´ll try to grab some log next time

Yes there may be incomplete files of length >0 too.

I *think* i have deleted soem such too from the cache when clients did not accept it.

Yes i think a good measure would be to delete the downloaded file if something went wrong during the download OR file save.

Either by some error state, or by checking its md5sum if possible.

Getting rid of the noarch duplets (Bug 14587) and rinse old files (Bug 14588) would help keping the cache size sane.  I think a good idea would be for the sysop to set up the system to check free capacity and and send warning email. (not to be part of urpmi-proxy)
Comment 11 AL13N 2014-11-18 11:30:51 CET
checksum is not really possible, this is not information you have when trying to check if it's there.

maybe when temp storing, i could use old date and update the date later.

the http protocol uses date to see if it's new or not.

if i use anything else, it'd download everything and then the usefullness would be gone...
Comment 12 Morgan Leijström 2014-11-18 14:59:39 CET
IIRC rpm files have built-in checksums. Can we use
  rpm --checksig package_name.rpm  ?
But maybe that eat too much CPU to be used generally on each package.

Can we just simply check the file length?  I guess that is retrievavble using http without downloading, but this is not my cup of tea...

Then is teh question when to check.  Not every time a client asks for it.  But after each time a file is saved.  You already got info on size when starting file download?  So then check if saved file match.  ... or could it be buffered so it looks like it is ok but is not saved to disk?
Morgan Leijström 2014-11-18 15:13:52 CET

Source RPM: urpmi-proxy-0.2.4-3.mga2.src.rpm => urpmi-proxy-0.4.0-4.mga5.src.rpm

Comment 13 AL13N 2014-11-18 19:15:55 CET
the biggest problem here is probably:

 - what if apache is force stopped or a thread is killed... i can't help that, and the file is half done
 - same when running out of disk space or a power failure...
Comment 14 Morgan Leijström 2014-11-18 22:11:37 CET
Yes there may be many reasons files in cache are not complete.

Maybe it would be best and not too camplicated to save file with extra extension added to the name, and when that file is downloaded and checked for some kind of consitency (probably size wil be enough) it is renamed to its real name.
Comment 15 AL13N 2014-11-18 23:15:50 CET
while that may be true, the problem is that another path IS used (the /tmp ) partition, to store it before it's ready... which should have solved all these cases.

in other words, i don't understand why and so i need full logs of whatever happened that resulted in such a zero-length file...
Comment 16 Morgan Leijström 2014-11-19 00:33:44 CET
I have /tmp on another partition than /var

So maybe it is downloaded OK on /tmp, then the problem happens when moved to /var ?

Or vice versa for that matter... the largest rpm is 1.1 GB (game data) and i often make the /tmp 1 GB ...

Maybe it would be better to store where it is supposed to end up - at least in the same filsystem. (but other dir, or (i suggest) other name same dir)

Then there is only one filesystem full issue instead of two.
Also less disk writes as the file is not moved then, just renamed (if you save in same dir under other name) or pointers on disk change (file moved within same filesystem)

I did not know it was shoving everything trough /tmp ... which brings up another (minor) issue: i have /tmp on a SSD but maybe I should move /tmp to the spinning disc (where i have /var) to lower the wear on the SSD...


I will bring logs when/if it happens again.
Comment 17 AL13N 2014-11-19 11:02:19 CET
it's possible that it happens like this, but it's advised to configure the tmp path at the same partition as the destination path...

you can configure this in /etc/urpmi-proxy.conf
Comment 18 Morgan Leijström 2014-11-19 18:04:25 CET
I had not configured it so it was actually defaulted to /var/tmp/urpmi-proxy/
So it *is* in the same file system, good.

It contain many files named 0.xxx where xxx is a long number, dated 2014-10-26 and recognised by dolphin as rpm files. The majority of them are 16 KiB.

I guess i was wrong in comment 11, it was then probably two *weeks* ago i had problems.

BTW, is there a reason the files are given a number and not the original file name?  That would help to clean it up when something messes up.
Comment 19 Morgan Leijström 2014-11-19 18:17:24 CET
i meant comment 8 not 11
Comment 20 Morgan Leijström 2014-11-19 22:35:11 CET
Created attachment 5613 [details]
ls -l of the 0.something files.

And now when i got a date i will attach corresponding parts of urpmi-proxy and access_log too.

Unfortunately I do not have that old error_log from apache.
Of the access-log i have old saved versions all way back to ending in .5 
But error log only a .1 which starts the day after the havoc.  Maybe they got automatically deleted by the system as they are huge, about a gigabyte as i have much logging and as said /var got full... bad luck...  Since that i have a new disk and /var have 30 G free now :)  So now i can catch it.  Er... but now it will probably not happen for the same reason :^/
Comment 21 Morgan Leijström 2014-11-19 22:43:57 CET
Created attachment 5614 [details]
tgz of excerpt from urpmi-proxy_log and access_log that time on 2014-10-24
Comment 22 AL13N 2014-11-20 00:06:18 CET
well, you should first configure $debug = 1;

otherwise it's probably not that useful...

i'll take a look in any case...

normally those get moved to the real location when it's done (and there's no errors)...

i'll have to reread the code a bit to see how these 0-length files came to be...

btw: did you configure urpmi-proxy with your favorite mirrors, or did you leave it default to "urpmi" ?
Comment 23 AL13N 2014-11-20 00:15:31 CET
everything in the logs seems ok, nicely status code 200 ...

most of the files left behind seem small-ish ...

maybe they are symlinks or directories or something...

maybe you could inspect a few of them with hexdump or something?



i'll really need logs with debug=1 ... perhaps even debug apache error logs...

it *is* so, that out of disk space is a way to get these errors.
Comment 24 Morgan Leijström 2014-11-20 08:57:25 CET
I already have a line in urpmi-proxy.conf
  $debug = 1
Maybe there is another level, how?

I think i already earlier have set logging high in apache (forgot how) as the error_log already for november is 1,5 GiB. Unfortunately the oldest start the day after the havoc (strange, maybe it got deleted when disk got full?  well it may even have been me then I was in desperate quest for more disk space and i do not remember exactly what i did, but it is possible i did not think and deleted the log...)

Yes I use the most reliable close mirror http:/ftp.acc.umu.se

I will attach one of the 0.* files.  Dolphin recognise it as a .rpm and double cklicking it you even get a dialog saying what rpm it is (it apparently at that time have not checked it is complete)  If you chose to open with Ark it apparently do check it as it say libarchive could not open it.

Having these files is probably not a problem - unless their existence blocks future downloads of that same file - if urpmi-proxy really do try to replace a bad file existing in the cache.

The bad problem is that if a file have got cached but not complete, it is not deleted nor replaced so next client also get that truncated file even if there then is disk space.

Maybe you can detect if a file move to the target place from tmp to real cache fails, and then have it delete it - both from the source and target dirs?
Comment 25 Morgan Leijström 2014-11-20 08:58:27 CET
Created attachment 5615 [details]
file 0.30156800460918418144
Marja Van Waes 2017-12-02 11:51:43 CET

Summary: need to make sure faulty files are removed => need to make sure faulty files are removed from urpmi-proxy cache
CC: (none) => mageiatools, marja11