Description of problem: GNOME Software, Plasma Discover, and other PackageKit frontends that are application centric depend on AppStream metadata to function properly. Without the data, it doesn't really show anything useful. Since we switched the PackageKit backend to the Hif backend, we now have a way to provide useful AppStream data that can be processed by PackageKit frontends. After speaking to Richard Hughes on #yum about it, he suggested using appstream-builder to build the repodata and use modifyrepo_c to merge it into the repodata. He's blogged about this approach[1] and it's probably the best way to go. I've not yet tested how long it takes for it to generate the repodata, however, I do not think we need to generate it as often as we need to for regular rpm-md repodata. There are two possible approaches to selectively doing this: by detecting the Provides in the built RPM, or purely time based. If we do it based on what the detected Provides in the built RPM says, then we would only kick off this task when "appdata()" is detected as one of the generated Provides in the RPM. This is really only viable if the time it takes to actually generate the repodata is small. It also may be overkill as the AppStream data may not be changing very often. If we follow a purely time-based approach, we could generate it once a week and update the repodata then. This is probably the more sensible approach for Cauldron, and after release, updating the metadata as packages are updated will probably be much less painful. Richard suggests that at least 8 threads might make this faster, though we should probably do some benchmarking for this to pick the right balance of speed and not causing the computer to grind to a halt. :) These are the following commands currently proposed: appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata modifyrepo_c --no-compress /tmp/appstream-md/appstream.xml.gz </path/to/packages>/repodata/ modifyrepo_c --no-compress /tmp/appstream-md/appstream-icons.tar.gz </path/to/packages>/repodata/ This will likely require a backport of appstream-util from Cauldron to infra_5. [1]: https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/
Blocks: (none) => 15527
The backport work is ready in infra_5. The packages need to be built in the following order: libsoup -> json-glib -> glib2.0 -> appstream-glib The built appstream-util package includes the appstream-builder tool.
Target Milestone: --- => Mageia 6
Blocks: (none) => 13452
I've updated the backported version to be in sync with Cauldron (appstream-glib 0.6.3).
Priority: Normal => release_blocker
@ Sysadmins: Could we have a status on this? What's left to be done to solve this issue?
Status comment: (none) => Waiting for sysadmins(tm)
QA Contact: (none) => ngompa13
Neal: Are those packages submitted to infra_5 already, or not yet? If not, go ahead and submit them, so that sysadmins just need to add the necessary commands for metadata generation.
Status comment: Waiting for sysadmins(tm) => tmb is going to work on itAssignee: sysadmin-bugs => tmb
So I'll build the packages and do a trial-run on a local repo to see what kind of time/cpu it takes
(In reply to Rémi Verschelde from comment #4) > Neal: Are those packages submitted to infra_5 already, or not yet? If not, > go ahead and submit them, so that sysadmins just need to add the necessary > commands for metadata generation. I cannot, since when I try, it refuses to use the infra_5 copies of the package sources in SVN, which leads to most of them failing (I did have to make changes for it to work on mga5).
With some help from Rémi and Nicholas Lécureuil, I've submitted the packages to infra_5. Nicholas was able to install appstream-util successfully, so it's ready to be used.
what is the next step ?
CC: (none) => mageia
We need to figure out how fast we can make appstream-builder go to generate the metadata. The sample command I have below is a good place to start: > appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata The two things to tweak are --max-threads and --packages-dir. The maximum threads in the example is 8, but we probably want even more, since the server should be capable of more. The packages dir needs to be pointed to the directory where the packages reside. On the other hand, we may want to do them in parallel for each repository. Given that for Cauldron, we have three repositories to concern ourselves with, maybe something along the lines of this would work? appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache-$rel-$sect-$repo-$arch --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/var/tmp/appstream-md-$rel-$sect-$repo-$arch --packages-dir=/distrib/bootstrap/$rel/$arch/media/$sect/$repo/ --temp-dir=/tmp/appstream-tempdata-$rel-$sect-$repo-$arch modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream.xml.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/ modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream-icons.tar.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/ Note the vars: $rel = distro release: "6" or "cauldron", for example $sect = distro section: "core", "nonfree", "tainted" $repo = repo per section: "release", "updates", "updates_testing", "backports", "backports_testing" $arch = architecture: "i586", "x86_64", "armv5tl", "armv7hl" The modifyrepo_c commands need to be run on every createrepo task, regardless, as the AppStream data appended will get deleted otherwise on successive createrepo_c runs from /repodata folder. So the generated AppStream metadata in /var/tmp/appstream-md-$rel-$sect-$repo-$arch (or wherever you put it) should stick around to be reused when it's not being regenerated. Though, if it is quick enough, we could regenerate appstream with rpm-md data. That would simplify things considerably. I'm just not sure how long the process is.
BTW, on IRC Thomas mentioned that if the process takes too long and would slow down the BS noticeably, we could also have a cron job regenerate the repodata regularly.
(In reply to Rémi Verschelde from comment #10) > BTW, on IRC Thomas mentioned that if the process takes too long and would > slow down the BS noticeably, we could also have a cron job regenerate the > repodata regularly. Yes, that's why I mentioned that we want to keep around the generated output and just re-append it on createrepo tasks in between appstream repodata regeneration tasks if we go that route. Appending to the repodata takes zero time, since it just amends repomd.xml with new information and copies the data files into the repodata folder. However, if it is fast enough, we can just generate it as part of the createrepo task.
@Thomas, Any progress on this?
(In reply to Neal Gompa from comment #12) > @Thomas, > > Any progress on this? Please Thomas and other sysadmins, can you give status and ETA for this blocker?
Blocks: 15527 => (none)
CC: (none) => mageia, pterjan
pascal can you look to this ?
Status comment: tmb is going to work on it => Waiting for sysadmins(tm)
Now that we're in Release Freeze, can we *please* get this done? It will make it so that GNOME Software and Plasma Discover actually have something to show users...
Taking this bug off the release blocker list, as discussed in council meeting.
CC: (none) => marja11Priority: release_blocker => High
Can we please have the AppStream metadata generated for the release repos for Mageia 6? I really want to put that into the release notes, as it's a huge deal that we'll properly support them so that things like GNOME Software and Plasma Discover work properly.
I've rebased the backported appstream-glib for mga5 infra and submitted it to be built.
Created attachment 9468 [details] apstream-builder segfault, can also be reproduced on current cauldron
CC: (none) => tmb
Hm, it seemn to not be thread-safe :/ dropping down to 2 threads make it not segfault ... but that makes initial run take 15 minutes :/
Hmm, interestingly, I can't reproduce on my system with the 8 thread command (admittedly, I have only a 4 core system), but I'm going to try to find a more powerful machine to see if I can reproduce it. I've also filed a bug upstream about it, to notify Richard Hughes about the issue.
See Also: (none) => https://github.com/hughsie/appstream-glib/issues/177
Ok, in order to get it done I had to drop down to a single thread :/ So it took about ~16 minutes for core/release with its ~28000 rpms... Now that is not a problem for a stable release as /release repos only have to been generated once, so the time is not an issue. And updates trees are way smaller than /release so they will be easier... but we do need to figure out why it breaks down so easily... Anyway, there is now appstream metadata generated for i586 and x86_64 being mirrored out so you can check if it behaves as you want...
Created attachment 9471 [details] Screenshot of Plasma Discover showing AppStream information (Lugaru HD) So far, verified that Plasma Discover works. Setting up a GNOME environment to check GNOME Software.
Created attachment 9472 [details] Screenshot of GNOME Software showing AppStream information (Lugaru HD) After installing GNOME Software, letting it download metadata, and then restarting the GNOME Software service, the AppStream metadata showed up!
Created attachment 9473 [details] Screenshot of GNOME Shell search showing AppStream information (Lugaru HD) And it seems GNOME Shell application search is now fully functional!
It looks like everything is working. Both appstreamcli (appstream) and appstream-util (appstream-glib) can also search the local AppStream cache. Thomas, thanks for everything! As long as we keep it up, we should be in great shape!
Status: NEW => RESOLVEDResolution: (none) => FIXED
See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=21207
(In reply to Thomas Backlund from comment #22) > Ok, in order to get it done I had to drop down to a single thread :/ > > So it took about ~16 minutes for core/release with its ~28000 rpms... > > Now that is not a problem for a stable release as /release repos only have > to been generated once, so the time is not an issue. > > And updates trees are way smaller than /release so they will be easier... > but we do need to figure out why it breaks down so easily... > > Anyway, there is now appstream metadata generated for i586 and x86_64 being > mirrored out so you can check if it behaves as you want... on cauldron maybe we can generate this "by hand" this is an issue if we need to wait 16mins more for hdlist to be available ( i think about kf5 update for ex :) ).
(In reply to Nicolas Lécureuil from comment #27) > > on cauldron maybe we can generate this "by hand" this is an issue if we need > to wait 16mins more for hdlist to be available ( i think about kf5 update > for ex :) ). Neal mentioned that on Fedora they only update the AppStream metadata via a weekly cron job. We could likely do something similar, 15 min once a week in the night should not be a big issue.
(In reply to Rémi Verschelde from comment #28) > (In reply to Nicolas Lécureuil from comment #27) > > > > on cauldron maybe we can generate this "by hand" this is an issue if we need > > to wait 16mins more for hdlist to be available ( i think about kf5 update > > for ex :) ). > > Neal mentioned that on Fedora they only update the AppStream metadata via a > weekly cron job. We could likely do something similar, 15 min once a week in > the night should not be a big issue. yes this seems a lot saner :)
(In reply to Nicolas Lécureuil from comment #29) > (In reply to Rémi Verschelde from comment #28) > > (In reply to Nicolas Lécureuil from comment #27) > > > > > > on cauldron maybe we can generate this "by hand" this is an issue if we need > > > to wait 16mins more for hdlist to be available ( i think about kf5 update > > > for ex :) ). > > > > Neal mentioned that on Fedora they only update the AppStream metadata via a > > weekly cron job. We could likely do something similar, 15 min once a week in > > the night should not be a big issue. > > > yes this seems a lot saner :) That means we need to cache the results and re-append it each time the rpm-md repodata is regenerated.