Description of request: As part of the work to enable DNF on Mageia[0], I've added createrepo_c, its required dependency (drpm), and its required dependency (cmocka) to the infra_5 repository. Please build them in this particular order: cmocka->drpm->createrepo_c Also, please start generating metadata using createrepo_c for Cauldron. Based on my tests, I recommend the following command: createrepo_c --no-database --update --workers=10 --general-compress-type=xz <path/to/directory/of/rpms> The above command will use 10 workers (the number can be any from 1 to 100, though I didn't see any reason to do more or less than 10-20) to read through the RPMs, generate XZ compressed metadata without SQLite database version of metadata (which is really only useful for Yum anyway), and will reuse metadata that doesn't need to change (based on info from existing metadata and the RPM data), which saves time and I/O for subsequent runs. More details can be found in createrepo_c(8). The first metadata generation will take about 3 minutes, and subsequent metadata updates should take seconds (if any time at all), because not all the packages change every time. [0]: https://wiki.mageia.org/en/Feature:Add_DNF_as_Alternate_Repository_Manager Reproducible: Steps to Reproduce:
URL: (none) => https://wiki.mageia.org/en/Feature:Add_DNF_as_Alternate_Repository_Manager
Um, AFAIK it's not an accepted feature yet, it wont happend.
Status: NEW => RESOLVEDCC: (none) => tmbResolution: (none) => WONTFIX
Why mark as WONTFIX just because it hasn't been accepted *yet*?
@David: because this is for infra_5
CC: (none) => mageia
Per the feature review meeting today, the DNF feature has been accepted[0]. Consequently, I'm re-opening this ticket. [0]: http://meetbot.mageia.org/mageia-dev/2015/mageia-dev.2015-11-11-20.09.html
Status: RESOLVED => REOPENEDResolution: WONTFIX => (none)
Whiteboard: (none) => 6dev1
Whiteboard: 6dev1 => (none)
Blocks: (none) => 15527
Blocks: (none) => 17400
Ok, so primary buildsystem server is now finally upgraded, so we should start planning for this... Before I start reading up on it, is there any restrictions on where repo metadata is stored ? where should we place repo data on the mirrors ? Urpmi has its data in: <arch>/media/media_info/ Should we do something like: <arch>/media/metadata/ ?
createrepo_c automatically creates the metadata in the /repodata subfolder of whatever location you point it to. It will happily coexist in the same parent directory where hdlist2 data exists. However, you'll need to point createrepo_c at the individual repo areas. Generally, createrepo_c would be pointed to <arch>/media/<submedia>/<channel>/, where a /repodata subfolder would be created to sit alongside /media_info. It would also be run on SRPMS/<submedia>/<channel> as well, since DNF supports pulling down source RPMs, too. For example, for my local tests, I run createrepo_c on my local mirror of the repodata, which is stored in /srv/repos/mageia/cauldron/x86_64/media/. I have a script that iterates through core, nonfree, tainted, and the ones for debug as well, since debug packages aren't a subfolder in the main <submedia>/<channel> folders. Here's example commands that I've run from my script: createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/core/release createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/nonfree/release createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/tainted/release createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/core/release createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/nonfree/release createrepo_c --no-database --update --workers=10 --general-compress-type=xz /srv/repos/mageia/cauldron/x86_64/media/debug/tainted/release Obviously, Mageia's setup would need to handle all the media/submedia/channel sets instead of just the restricted set I use for my testing. You'd also want the metadata generation to occur after the packages are signed, since the checksums of the packages in the rpm-md data should account for the signing data.
If we decide to also implement comps group metadata (to enable dnf group {list,info,install,remove,upgrade}, we'll have to add --groupfile=<path/to/comps-groups.xml>. However, I'd rather tackle that a bit later (after we figure out if we want to do it, and how those groups would be defined).
CC: (none) => doktor5000
[schedbot@duvel ~]$ time createrepo_c --no-database --update --workers=10 --general-compress-type=xz /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/ Directory walk started Directory walk done - 26159 packages Loaded information about 0 packages Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/ Pool started (with 10 workers) Pool finished real 10m50.914s user 8m23.850s sys 0m40.340s [schedbot@duvel ~]$ time createrepo_c --no-database --update --workers=10 --general-compress-type=xz /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/ Directory walk started Directory walk done - 26159 packages Loaded information about 26159 packages Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/ Pool started (with 10 workers) Pool finished real 1m34.283s user 2m21.830s sys 0m5.370s I think it is currently too slow to enable it. Uploading one package will typically mean running it twice per architecture (main directory + debug) + once for src.rpm, so that would add about 15 minutes to uploads... I guess we really want SSDs...
CC: (none) => pterjan
(well we can have it for Mageia 6 anyway, but for cauldron we probably don't want to run it after each upload)
Ooh., that's long... but we should atleast run it "sometime" to be able to test the feature before mga6 is out... What if we run it as post upload only on distrib tree... that way buildsystem can keep it's speed against bootstrap and we get working repodata too... of course it wouldn't be perfect as it would block a little if repodata is still being generated while we try to trigger bootstrap -> distrib sync
I did run it yesterday on armv5tl/i586/x86_64/SRPMS core/release directories which are the bigger ones, and will run it on the other directories soon, so it can already be tested.
All directories are done. I can probably set some cron for now so that they do not get too much out of date.
time for d in /distrib/bootstrap/distrib/cauldron/SRPMS/*/*/ /distrib/bootstrap/distrib/cauldron/*/media/*/*/; do createrepo_c --skip-stat --no-database --update --workers=10 --general-compress-type=xz $d; done [...] real 7m17.773s user 10m58.880s sys 0m23.890s
Created attachment 7564 [details] Python script to run createrepo_c to generate rpm-md data for Mageia releases If it helps any, here's a *lightly* modified version of my script that I used locally to generate rpm-md repodata in all the same places that mdkrepo data exists. It's meant to be run as part of a cron job or a systemd timer service (since it will iterate through everything specified). I usually run "python3 genrpmmd-mga.py -a x86_64 -r cauldron" right after I rsync'd from a remote mirror. You'd probably want to do "python3 genrpmmd-mga.py -a x86_64 -a i586 -a armv5tl -r cauldron". If you want to see the commands it will run, tack on "-d" to make it print out the command array it will run instead of actually running it.
Also, I noticed that the metadata for the debug information was generated a couple of levels up from where it should be, leading to an enormous rpm-md repo size for debug data. My script will generate that properly, as I added a special case for debug data. That said, you'd probably want to purge the rpm-md data for debug and start over.
It is much faster when not using xz: Directory walk started Directory walk done - 26269 packages Loaded information about 26264 packages Temporary output repo path: /distrib/bootstrap/distrib/cauldron/x86_64/media/core/release/.repodata/ Pool started (with 10 workers) Pool finished real 0m28.606s user 0m43.110s sys 0m2.680s
Is it acceptably fast to run after every build like genhdlist2 is? The default is to use gz instead of xz. I figured you'd prefer xz compression, but if the speed is worth it and the (slightly) larger archived xml files are okay, then I'm fine with it.
Yes the size seems fine to me with gz and it I think is fast enough to run each time
Also, is --skip-stat still necessary for speed if you switch to default compression? I would think it'd be desirable to allow createrepo_c to verify the packages if the increased time isn't too bad.
I hope we never replace a package with one with the same name but yes it should be fine to drop that option: Directory walk started Directory walk done - 26253 packages Loaded information about 26252 packages Temporary output repo path: /distrib/bootstrap/distrib/cauldron/i586/media/core/release/.repodata/ Pool started (with 10 workers) Pool finished real 0m30.418s user 0m46.550s sys 0m2.990s Total over all media: real 2m15.254s user 3m27.120s sys 0m14.450s
Awesome! Is the build system hooked up to do this now?
Not yet
Hm, since we now run mga5 on master, for anything xz related we should start using threaded mode... for example, on hdlist: ll hdlist* -rw-rw-r-- 1 tmb tmb 721742976 mar 13 21:47 hdlist1 -rw-rw-r-- 1 tmb tmb 721742976 mar 13 21:48 hdlist2 [tmb@tmb ~]$ time xz hdlist1 real 4m2.987s user 4m2.500s sys 0m0.440s [tmb@tmb ~]$ time xz -T0 hdlist2 real 0m57.511s user 7m1.430s sys 0m1.060s the "-T0" tells it to use all available cores/threads which in the above case was i Quad Core i7 + HT the -T | --thread support got added upstream in 5.2.0 wich we have in mga5 (note, the help / man talks about multithreaded de-compression support, but it's actually compression (fixed in 5.2.2 documentation)) Can you test something like "xz -T 10" on duvel for the repodata
and for gzip stuff, I guess we should switch to pigz for parallell gzip compression...
This is now run on upload: http://pkgsubmit.mageia.org/uploads/done/cauldron/core/release/20160313230603.spuhler.duvel.26176.youri Thomas, I don't think this is possible as createrepo_c uses the library and not a call the the xz command, but I updated the config for genhdlist2 to use xz -T4 instead of lzma -7 for xml-info files with great success.
Fantastic! As for createrepo_c, I don't see any evidence of threading on compression[0], but that said, it might be worth talking to tmlcoch (Tomas Mlcoch, the author of createrepo_c) on #yum on Freenode about it. I've filed an issue on the createrepo_c GitHub issue tracker as well[1], and I would appreciate it if you guys could add any useful information to the ticket. [0]: https://github.com/rpm-software-management/createrepo_c/blob/master/src/compression_wrapper.c [1]: https://github.com/rpm-software-management/createrepo_c/issues/53
I dont have any github account, and have not planned to add one so ... here is a reference example doc for multithreaded compression_ http://git.tukaani.org/?p=xz.git;a=blob_plain;f=doc/examples/04_compress_easy_mt.c;hb=HEAD
@Pascal: There seems to be old, large repodata folders for i586 debug from before you fixed the repodata generation. Can you clear those out? I'm seeing these old things in: http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/core/repodata/ http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/nonfree/repodata/ http://mirrors.kernel.org/mageia/distrib/cauldron/i586/media/debug/tainted/repodata/ We already have proper repodata for debug in debug/core/<repo> as we do in x86_64, so these are just hanging around for no purpose. It's safe to delete them. These probably exist in the (currently hidden) armv5tl tree, too.
We've been generating rpm-md repodata properly for a month now with no observable issues, so I'm marking this bug as fixed. The issue of having the metalinks is being tracked in bug#17400, anyway. If there's an issue, please re-open it.
Status: REOPENED => RESOLVEDResolution: (none) => FIXED
Blocks: 15527 => (none)