Description of problem:
GNOME Software, Plasma Discover, and other PackageKit frontends that are application centric depend on AppStream metadata to function properly. Without the data, it doesn't really show anything useful.
Since we switched the PackageKit backend to the Hif backend, we now have a way to provide useful AppStream data that can be processed by PackageKit frontends.
After speaking to Richard Hughes on #yum about it, he suggested using appstream-builder to build the repodata and use modifyrepo_c to merge it into the repodata. He's blogged about this approach and it's probably the best way to go.
I've not yet tested how long it takes for it to generate the repodata, however, I do not think we need to generate it as often as we need to for regular rpm-md repodata.
There are two possible approaches to selectively doing this: by detecting the Provides in the built RPM, or purely time based.
If we do it based on what the detected Provides in the built RPM says, then we would only kick off this task when "appdata()" is detected as one of the generated Provides in the RPM. This is really only viable if the time it takes to actually generate the repodata is small. It also may be overkill as the AppStream data may not be changing very often.
If we follow a purely time-based approach, we could generate it once a week and update the repodata then. This is probably the more sensible approach for Cauldron, and after release, updating the metadata as packages are updated will probably be much less painful.
Richard suggests that at least 8 threads might make this faster, though we should probably do some benchmarking for this to pick the right balance of speed and not causing the computer to grind to a halt. :)
These are the following commands currently proposed:
appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata
modifyrepo_c --no-compress /tmp/appstream-md/appstream.xml.gz </path/to/packages>/repodata/
modifyrepo_c --no-compress /tmp/appstream-md/appstream-icons.tar.gz </path/to/packages>/repodata/
This will likely require a backport of appstream-util from Cauldron to infra_5.
The backport work is ready in infra_5.
The packages need to be built in the following order: libsoup -> json-glib -> glib2.0 -> appstream-glib
The built appstream-util package includes the appstream-builder tool.
I've updated the backported version to be in sync with Cauldron (appstream-glib 0.6.3).
@ Sysadmins: Could we have a status on this? What's left to be done to solve this issue?
Neal: Are those packages submitted to infra_5 already, or not yet? If not, go ahead and submit them, so that sysadmins just need to add the necessary commands for metadata generation.
So I'll build the packages and do a trial-run on a local repo to see what kind of time/cpu it takes
(In reply to Rémi Verschelde from comment #4)
> Neal: Are those packages submitted to infra_5 already, or not yet? If not,
> go ahead and submit them, so that sysadmins just need to add the necessary
> commands for metadata generation.
I cannot, since when I try, it refuses to use the infra_5 copies of the package sources in SVN, which leads to most of them failing (I did have to make changes for it to work on mga5).
With some help from Rémi and Nicholas Lécureuil, I've submitted the packages to infra_5.
Nicholas was able to install appstream-util successfully, so it's ready to be used.
what is the next step ?
We need to figure out how fast we can make appstream-builder go to generate the metadata.
The sample command I have below is a good place to start:
> appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata
The two things to tweak are --max-threads and --packages-dir. The maximum threads in the example is 8, but we probably want even more, since the server should be capable of more. The packages dir needs to be pointed to the directory where the packages reside. On the other hand, we may want to do them in parallel for each repository.
Given that for Cauldron, we have three repositories to concern ourselves with, maybe something along the lines of this would work?
appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache-$rel-$sect-$repo-$arch --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/var/tmp/appstream-md-$rel-$sect-$repo-$arch --packages-dir=/distrib/bootstrap/$rel/$arch/media/$sect/$repo/ --temp-dir=/tmp/appstream-tempdata-$rel-$sect-$repo-$arch
modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream.xml.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/
modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream-icons.tar.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/
Note the vars:
$rel = distro release: "6" or "cauldron", for example
$sect = distro section: "core", "nonfree", "tainted"
$repo = repo per section: "release", "updates", "updates_testing", "backports", "backports_testing"
$arch = architecture: "i586", "x86_64", "armv5tl", "armv7hl"
The modifyrepo_c commands need to be run on every createrepo task, regardless, as the AppStream data appended will get deleted otherwise on successive createrepo_c runs from /repodata folder. So the generated AppStream metadata in /var/tmp/appstream-md-$rel-$sect-$repo-$arch (or wherever you put it) should stick around to be reused when it's not being regenerated.
Though, if it is quick enough, we could regenerate appstream with rpm-md data. That would simplify things considerably. I'm just not sure how long the process is.
BTW, on IRC Thomas mentioned that if the process takes too long and would slow down the BS noticeably, we could also have a cron job regenerate the repodata regularly.
(In reply to Rémi Verschelde from comment #10)
> BTW, on IRC Thomas mentioned that if the process takes too long and would
> slow down the BS noticeably, we could also have a cron job regenerate the
> repodata regularly.
Yes, that's why I mentioned that we want to keep around the generated output and just re-append it on createrepo tasks in between appstream repodata regeneration tasks if we go that route. Appending to the repodata takes zero time, since it just amends repomd.xml with new information and copies the data files into the repodata folder.
However, if it is fast enough, we can just generate it as part of the createrepo task.
Any progress on this?
(In reply to Neal Gompa from comment #12)
> Any progress on this?
Please Thomas and other sysadmins, can you give status and ETA for this blocker?
pascal can you look to this ?