Bug 17057 - Request for cmocka, drpm, and createrepo_c built for infra_5 and createrepo_c metadata generation for Cauldron
Summary: Request for cmocka, drpm, and createrepo_c built for infra_5 and createrepo_c...
Status: RESOLVED FIXED
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: BuildSystem (show other bugs)
Version: unspecified
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Sysadmin Team
QA Contact:
URL: https://wiki.mageia.org/en/Feature:Ad...
Whiteboard:
Keywords:
Depends on:
Blocks: 17400
  Show dependency treegraph
 
Reported: 2015-10-31 20:03 CET by Neal Gompa
Modified: 2017-01-17 10:29 CET (History)
5 users (show)

See Also:
Source RPM: cmocka, drpm, createrepo_c
CVE:
Status comment:


Attachments
Python script to run createrepo_c to generate rpm-md data for Mageia releases (3.61 KB, text/x-python)
2016-03-12 16:40 CET, Neal Gompa
Details

Description Neal Gompa 2015-10-31 20:03:56 CET
Description of request:
As part of the work to enable DNF on Mageia[0], I've added createrepo_c, its required dependency (drpm), and its required dependency (cmocka) to the infra_5 repository.

Please build them in this particular order: cmocka->drpm->createrepo_c

Also, please start generating metadata using createrepo_c for Cauldron. Based on my tests, I recommend the following command:

createrepo_c --no-database --update --workers=10 --general-compress-type=xz <path/to/directory/of/rpms>

The above command will use 10 workers (the number can be any from 1 to 100, though I didn't see any reason to do more or less than 10-20) to read through the RPMs, generate XZ compressed metadata without SQLite database version of metadata (which is really only useful for Yum anyway), and will reuse metadata that doesn't need to change (based on info from existing metadata and the RPM data), which saves time and I/O for subsequent runs. More details can be found in createrepo_c(8).

The first metadata generation will take about 3 minutes, and subsequent metadata updates should take seconds (if any time at all), because not all the packages change every time.

[0]: https://wiki.mageia.org/en/Feature:Add_DNF_as_Alternate_Repository_Manager

Reproducible: 

Steps to Reproduce:
Neal Gompa 2015-11-01 03:47:35 CET

URL: (none) => https://wiki.mageia.org/en/Feature:Add_DNF_as_Alternate_Repository_Manager

Comment 1 Thomas Backlund 2015-11-01 10:43:56 CET
Um,

AFAIK it's not an accepted feature yet, it wont happend.

Status: NEW => RESOLVED
CC: (none) => tmb
Resolution: (none) => WONTFIX

Comment 2 David Walser 2015-11-01 14:17:36 CET
Why mark as WONTFIX just because it hasn't been accepted *yet*?
Comment 3 Nicolas Lécureuil 2015-11-07 00:03:56 CET
@David: because this is for infra_5

CC: (none) => mageia

Comment 4 Neal Gompa 2015-11-11 23:30:12 CET
Per the feature review meeting today, the DNF feature has been accepted[0].

Consequently, I'm re-opening this ticket.

[0]: http://meetbot.mageia.org/mageia-dev/2015/mageia-dev.2015-11-11-20.09.html

Status: RESOLVED => REOPENED
Resolution: WONTFIX => (none)

Neal Gompa 2015-12-14 21:58:45 CET

Whiteboard: (none) => 6dev1

Neal Gompa 2015-12-14 21:59:46 CET

Whiteboard: 6dev1 => (none)

Neal Gompa 2015-12-14 22:02:25 CET

Blocks: (none) => 15527

Neal Gompa 2015-12-25 20:38:15 CET

Blocks: (none) => 17400

Comment 5 Thomas Backlund 2016-02-09 09:02:56 CET
Ok, so primary buildsystem server is now finally upgraded, so we should start planning for this...

Before I start reading up on it, is there any restrictions on where repo metadata is stored ?
where should we place repo data on the mirrors ?

Urpmi has its data in:
<arch>/media/media_info/

Should we do something like:
<arch>/media/metadata/

?
Comment 6 Neal Gompa 2016-02-09 09:26:45 CET
createrepo_c automatically creates the metadata in the /repodata subfolder of whatever location you point it to. It will happily coexist in the same parent directory where hdlist2 data exists.

However, you'll need to point createrepo_c at the individual repo areas. Generally, createrepo_c would be pointed to <arch>/media/<submedia>/<channel>/, where a /repodata subfolder would be created to sit alongside /media_info.

It would also be run on SRPMS/<submedia>/<channel> as well, since DNF supports pulling down source RPMs, too.

For example, for my local tests, I run createrepo_c on my local mirror of the repodata, which is stored in /srv/repos/mageia/cauldron/x86_64/media/. I have a script that iterates through core, nonfree, tainted, and the ones for debug as well, since debug packages aren't a subfolder in the main <submedia>/<channel> folders.

Here's example commands that I've run from my script:
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/core/release
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/nonfree/release
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/tainted/release
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/core/release
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/nonfree/release
createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/tainted/release

Obviously, Mageia's setup would need to handle all the media/submedia/channel sets instead of just the restricted set I use for my testing.

You'd also want the metadata generation to occur after the packages are signed, since the checksums of the packages in the rpm-md data should account for the signing data.
Comment 7 Neal Gompa 2016-02-09 09:45:08 CET
If we decide to also implement comps group metadata (to enable dnf group {list,info,install,remove,upgrade}, we'll have to add --groupfile=<path/to/comps-groups.xml>. However, I'd rather tackle that a bit later (after we figure out if we want to do it, and how those groups would be defined).
Florian Hubold 2016-02-19 17:01:12 CET

CC: (none) => doktor5000

Comment 8 Pascal Terjan 2016-03-10 23:51:17 CET
[schedbot@duvel ~]$ time createrepo_c --no-database --update --workers=10 --general-compress-type=xz /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/
Directory walk started
Directory walk done - 26159 packages
Loaded information about 0 packages
Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/
Pool started (with 10 workers)
Pool finished

real	10m50.914s
user	8m23.850s
sys	0m40.340s

[schedbot@duvel ~]$ time createrepo_c --no-database --update --workers=10 --general-compress-type=xz /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/
Directory walk started
Directory walk done - 26159 packages
Loaded information about 26159 packages
Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/
Pool started (with 10 workers)
Pool finished

real	1m34.283s
user	2m21.830s
sys	0m5.370s

I think it is currently too slow to enable it. Uploading one package will typically mean running it twice per architecture (main directory + debug) + once for src.rpm, so that would add about 15 minutes to uploads...
I guess we really want SSDs...

CC: (none) => pterjan

Comment 9 Pascal Terjan 2016-03-10 23:54:21 CET
(well we can have it for Mageia 6 anyway, but for cauldron we probably don't want to run it after each upload)
Comment 10 Thomas Backlund 2016-03-11 08:13:17 CET
Ooh., that's long... 

but we should atleast run it "sometime" to be able to test the feature before mga6 is out...

What if we run it as post upload only on distrib tree...

that way buildsystem can keep it's speed against bootstrap and we get working repodata too...

of course it wouldn't be perfect as it would block a little if repodata is still being generated while we try to trigger bootstrap -> distrib sync
Comment 11 Pascal Terjan 2016-03-11 10:32:38 CET
I did run it yesterday on armv5tl/i586/x86_64/SRPMS core/release directories which are the bigger ones, and will run it on the other directories soon, so it can already be tested.
Comment 12 Pascal Terjan 2016-03-11 11:01:34 CET
All directories are done.
I can probably set some cron for now so that they do not get too much out of date.
Comment 13 Pascal Terjan 2016-03-11 11:16:17 CET
time for d in /distrib/bootstrap/distrib/cauldron/SRPMS/*/*/ /distrib/bootstrap/distrib/cauldron/*/media/*/*/; do createrepo_c --skip-stat --no-database --update --workers=10 --general-compress-type=xz $d; done
[...]
real    7m17.773s
user    10m58.880s
sys     0m23.890s
Comment 14 Neal Gompa 2016-03-12 16:40:05 CET
Created attachment 7564 [details]
Python script to run createrepo_c to generate rpm-md data for Mageia releases

If it helps any, here's a *lightly* modified version of my script that I used locally to generate rpm-md repodata in all the same places that mdkrepo data exists.

It's meant to be run as part of a cron job or a systemd timer service (since it will iterate through everything specified).

I usually run "python3 genrpmmd-mga.py -a x86_64 -r cauldron" right after I rsync'd from a remote mirror.

You'd probably want to do "python3 genrpmmd-mga.py -a x86_64 -a i586 -a armv5tl -r cauldron".

If you want to see the commands it will run, tack on "-d" to make it print out the command array it will run instead of actually running it.
Comment 15 Neal Gompa 2016-03-12 17:17:29 CET
Also, I noticed that the metadata for the debug information was generated a couple of levels up from where it should be, leading to an enormous rpm-md repo size for debug data. My script will generate that properly, as I added a special case for debug data.

That said, you'd probably want to purge the rpm-md data for debug and start over.
Comment 16 Pascal Terjan 2016-03-13 20:16:00 CET
It is much faster when not using xz:

Directory walk started
Directory walk done - 26269 packages
Loaded information about 26264 packages
Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/
Pool started (with 10 workers)
Pool finished

real	0m28.606s
user	0m43.110s
sys	0m2.680s
Comment 17 Neal Gompa 2016-03-13 20:18:10 CET
Is it acceptably fast to run after every build like genhdlist2 is?

The default is to use gz instead of xz. I figured you'd prefer xz compression, but if the speed is worth it and the (slightly) larger archived xml files are okay, then I'm fine with it.
Comment 18 Pascal Terjan 2016-03-13 20:34:23 CET
Yes the size seems fine to me with gz and it I think is fast enough to run each time
Comment 19 Neal Gompa 2016-03-13 20:36:13 CET
Also, is --skip-stat still necessary for speed if you switch to default compression? I would think it'd be desirable to allow createrepo_c to verify the packages if the increased time isn't too bad.
Comment 20 Pascal Terjan 2016-03-13 20:49:57 CET
I hope we never replace a package with one with the same name but yes it should be fine to drop that option:

Directory walk started
Directory walk done - 26253 packages
Loaded information about 26252 packages
Temporary output repo path: /distrib/bootstrap/distrib/cauldron/i586/media/core/release/.repodata/
Pool started (with 10 workers)
Pool finished

real	0m30.418s
user	0m46.550s
sys	0m2.990s

Total over all media:

real	2m15.254s
user	3m27.120s
sys	0m14.450s
Comment 21 Neal Gompa 2016-03-13 20:53:42 CET
Awesome! Is the build system hooked up to do this now?
Comment 22 Pascal Terjan 2016-03-13 20:58:34 CET
Not yet
Comment 23 Thomas Backlund 2016-03-13 21:01:49 CET
Hm, since we now run mga5 on master, for anything xz related we should start using threaded mode...

for example, on hdlist:

ll hdlist*
-rw-rw-r-- 1 tmb tmb 721742976 mar 13 21:47 hdlist1
-rw-rw-r-- 1 tmb tmb 721742976 mar 13 21:48 hdlist2

[tmb@tmb ~]$ time xz hdlist1

real    4m2.987s
user    4m2.500s
sys     0m0.440s

[tmb@tmb ~]$ time xz -T0 hdlist2

real    0m57.511s
user    7m1.430s
sys     0m1.060s


the "-T0" tells it to use all available cores/threads which in the above case was i Quad Core i7 + HT


the -T | --thread support got added upstream in 5.2.0 wich we have in mga5
(note, the help / man talks about multithreaded de-compression support, but it's actually compression (fixed in 5.2.2 documentation))

Can you test something like "xz -T 10" on duvel for the repodata
Comment 24 Thomas Backlund 2016-03-13 21:09:04 CET
and for gzip stuff, I guess we should switch to pigz for parallell gzip compression...
Comment 25 Pascal Terjan 2016-03-14 00:17:31 CET
This is now run on upload:
http://pkgsubmit.mageia.org/uploads/done/cauldron/core/release/20160313230603.spuhler.duvel.26176.youri

Thomas, I don't think this is possible as createrepo_c uses the library and not a call the the xz command, but I updated the config for genhdlist2 to use xz -T4 instead of lzma -7 for xml-info files with great success.
Comment 26 Neal Gompa 2016-03-14 03:52:51 CET
Fantastic!

As for createrepo_c, I don't see any evidence of threading on compression[0], but that said, it might be worth talking to tmlcoch (Tomas Mlcoch, the author of createrepo_c) on #yum on Freenode about it. I've filed an issue on the createrepo_c GitHub issue tracker as well[1], and I would appreciate it if you guys could add any useful information to the ticket.

[0]: https://github.com/rpm-software-management/createrepo_c/blob/master/src/compression_wrapper.c

[1]: https://github.com/rpm-software-management/createrepo_c/issues/53
Comment 27 Thomas Backlund 2016-03-14 08:00:35 CET
I dont have any github account, and have not planned to add one so ...

here is a reference example doc for multithreaded compression_

http://git.tukaani.org/?p=xz.git;a=blob_plain;f=doc/examples/04_compress_easy_mt.c;hb=HEAD
Comment 28 Neal Gompa 2016-03-19 20:41:10 CET
@Pascal:

There seems to be old, large repodata folders for i586 debug from before you fixed the repodata generation. Can you clear those out?

I'm seeing these old things in:

http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/core/repodata/
http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/nonfree/repodata/
http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/tainted/repodata/

We already have proper repodata for debug in debug/core/<repo> as we do in x86_64, so these are just hanging around for no purpose. It's safe to delete them.

These probably exist in the (currently hidden) armv5tl tree, too.
Comment 29 Neal Gompa 2016-04-16 15:27:05 CEST
We've been generating rpm-md repodata properly for a month now with no observable issues, so I'm marking this bug as fixed. The issue of having the metalinks is being tracked in bug#17400, anyway.

If there's an issue, please re-open it.

Status: REOPENED => RESOLVED
Resolution: (none) => FIXED

Samuel Verschelde 2017-01-17 10:29:39 CET

Blocks: 15527 => (none)


Note You need to log in before you can comment on or make changes to this bug.