Bug 18669 - Generate and provide AppStream repodata for GNOME Software and Plasma Discover
Summary: Generate and provide AppStream repodata for GNOME Software and Plasma Discover
Status: NEW
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: BuildSystem (show other bugs)
Version: unspecified
Hardware: All Linux
: High major
Target Milestone: Mageia 6
Assignee: Thomas Backlund
QA Contact: Neal Gompa
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 13452
  Show dependency treegraph
 
Reported: 2016-06-09 16:30 CEST by Neal Gompa
Modified: 2017-06-20 22:10 CEST (History)
5 users (show)

See Also:
Source RPM:
CVE:
Status comment: Waiting for sysadmins(tm)


Attachments

Description Neal Gompa 2016-06-09 16:30:42 CEST
Description of problem:
GNOME Software, Plasma Discover, and other PackageKit frontends that are application centric depend on AppStream metadata to function properly. Without the data, it doesn't really show anything useful.

Since we switched the PackageKit backend to the Hif backend, we now have a way to provide useful AppStream data that can be processed by PackageKit frontends.

After speaking to Richard Hughes on #yum about it, he suggested using appstream-builder to build the repodata and use modifyrepo_c to merge it into the repodata. He's blogged about this approach[1] and it's probably the best way to go.

I've not yet tested how long it takes for it to generate the repodata, however, I do not think we need to generate it as often as we need to for regular rpm-md repodata.

There are two possible approaches to selectively doing this: by detecting the Provides in the built RPM, or purely time based.

If we do it based on what the detected Provides in the built RPM says, then we would only kick off this task when "appdata()" is detected as one of the generated Provides in the RPM. This is really only viable if the time it takes to actually generate the repodata is small. It also may be overkill as the AppStream data may not be changing very often.

If we follow a purely time-based approach, we could generate it once a week and update the repodata then. This is probably the more sensible approach for Cauldron, and after release, updating the metadata as packages are updated will probably be much less painful.

Richard suggests that at least 8 threads might make this faster, though we should probably do some benchmarking for this to pick the right balance of speed and not causing the computer to grind to a halt. :)

These are the following commands currently proposed:

appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata

modifyrepo_c --no-compress /tmp/appstream-md/appstream.xml.gz </path/to/packages>/repodata/

modifyrepo_c --no-compress /tmp/appstream-md/appstream-icons.tar.gz </path/to/packages>/repodata/

This will likely require a backport of appstream-util from Cauldron to infra_5.

[1]: https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/
Comment 1 Neal Gompa 2016-08-07 02:36:01 CEST
The backport work is ready in infra_5.

The packages need to be built in the following order: libsoup -> json-glib -> glib2.0 -> appstream-glib

The built appstream-util package includes the appstream-builder tool.
Comment 2 Neal Gompa 2016-10-11 01:58:24 CEST
I've updated the backported version to be in sync with Cauldron (appstream-glib 0.6.3).
Comment 3 Rémi Verschelde 2016-10-17 12:00:40 CEST
@ Sysadmins: Could we have a status on this? What's left to be done to solve this issue?
Comment 4 Rémi Verschelde 2016-11-23 21:20:51 CET
Neal: Are those packages submitted to infra_5 already, or not yet? If not, go ahead and submit them, so that sysadmins just need to add the necessary commands for metadata generation.
Comment 5 Thomas Backlund 2016-11-23 21:25:08 CET
So I'll build the packages and do a trial-run on a local repo to see what kind of time/cpu it takes
Comment 6 Neal Gompa 2016-11-23 23:50:45 CET
(In reply to Rémi Verschelde from comment #4)
> Neal: Are those packages submitted to infra_5 already, or not yet? If not,
> go ahead and submit them, so that sysadmins just need to add the necessary
> commands for metadata generation.

I cannot, since when I try, it refuses to use the infra_5 copies of the package sources in SVN, which leads to most of them failing (I did have to make changes for it to work on mga5).
Comment 7 Neal Gompa 2016-11-24 11:16:33 CET
With some help from Rémi and Nicholas Lécureuil, I've submitted the packages to infra_5.

Nicholas was able to install appstream-util successfully, so it's ready to be used.
Comment 8 Nicolas Lécureuil 2016-11-24 14:08:13 CET
what is the next step ?
Comment 9 Neal Gompa 2016-11-24 14:35:45 CET
We need to figure out how fast we can make appstream-builder go to generate the metadata.

The sample command I have below is a good place to start:
> appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata

The two things to tweak are --max-threads and --packages-dir. The maximum threads in the example is 8, but we probably want even more, since the server should be capable of more. The packages dir needs to be pointed to the directory where the packages reside. On the other hand, we may want to do them in parallel for each repository.

Given that for Cauldron, we have three repositories to concern ourselves with, maybe something along the lines of this would work?

appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache-$rel-$sect-$repo-$arch --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/var/tmp/appstream-md-$rel-$sect-$repo-$arch --packages-dir=/distrib/bootstrap/$rel/$arch/media/$sect/$repo/ --temp-dir=/tmp/appstream-tempdata-$rel-$sect-$repo-$arch

modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream.xml.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/

modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream-icons.tar.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/

Note the vars:
$rel = distro release: "6" or "cauldron", for example
$sect = distro section: "core", "nonfree", "tainted"
$repo = repo per section: "release", "updates", "updates_testing", "backports", "backports_testing"
$arch = architecture: "i586", "x86_64", "armv5tl", "armv7hl"

The modifyrepo_c commands need to be run on every createrepo task, regardless, as the AppStream data appended will get deleted otherwise on successive createrepo_c runs from /repodata folder. So the generated AppStream metadata in /var/tmp/appstream-md-$rel-$sect-$repo-$arch (or wherever you put it) should stick around to be reused when it's not being regenerated.

Though, if it is quick enough, we could regenerate appstream with rpm-md data. That would simplify things considerably. I'm just not sure how long the process is.
Comment 10 Rémi Verschelde 2016-11-24 14:39:41 CET
BTW, on IRC Thomas mentioned that if the process takes too long and would slow down the BS noticeably, we could also have a cron job regenerate the repodata regularly.
Comment 11 Neal Gompa 2016-11-24 15:26:08 CET
(In reply to Rémi Verschelde from comment #10)
> BTW, on IRC Thomas mentioned that if the process takes too long and would
> slow down the BS noticeably, we could also have a cron job regenerate the
> repodata regularly.

Yes, that's why I mentioned that we want to keep around the generated output and just re-append it on createrepo tasks in between appstream repodata regeneration tasks if we go that route. Appending to the repodata takes zero time, since it just amends repomd.xml with new information and copies the data files into the repodata folder.

However, if it is fast enough, we can just generate it as part of the createrepo task.
Comment 12 Neal Gompa 2016-12-12 22:00:41 CET
@Thomas,

Any progress on this?
Comment 13 Samuel Verschelde 2017-01-09 14:45:45 CET
(In reply to Neal Gompa from comment #12)
> @Thomas,
> 
> Any progress on this?

Please Thomas and other sysadmins, can you give status and ETA for this blocker?
Comment 14 Nicolas Lécureuil 2017-03-22 22:45:32 CET
pascal can you look to this ?
Comment 15 Neal Gompa 2017-06-05 21:32:34 CEST
Now that we're in Release Freeze, can we *please* get this done? It will make it so that GNOME Software and Plasma Discover actually have something to show users...
Comment 16 Marja van Waes 2017-06-20 22:10:25 CEST
Taking this bug off the release blocker list, as discussed in council meeting.

Note You need to log in before you can comment on or make changes to this bug.