Bug 18669 - Generate and provide AppStream repodata for GNOME Software and Plasma Discover
Summary: Generate and provide AppStream repodata for GNOME Software and Plasma Discover
Status: RESOLVED FIXED
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: BuildSystem (show other bugs)
Version: unspecified
Hardware: All Linux
Priority: High major
Target Milestone: Mageia 6
Assignee: Thomas Backlund
QA Contact: Neal Gompa
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 13452
  Show dependency treegraph
 
Reported: 2016-06-09 16:30 CEST by Neal Gompa
Modified: 2017-07-09 12:29 CEST (History)
6 users (show)

See Also:
Source RPM:
CVE:
Status comment: Waiting for sysadmins(tm)


Attachments
apstream-builder segfault, can also be reproduced on current cauldron (49.72 KB, text/plain)
2017-07-08 21:42 CEST, Thomas Backlund
Details
Screenshot of Plasma Discover showing AppStream information (Lugaru HD) (398.09 KB, image/png)
2017-07-09 00:51 CEST, Neal Gompa
Details
Screenshot of GNOME Software showing AppStream information (Lugaru HD) (478.60 KB, image/png)
2017-07-09 01:21 CEST, Neal Gompa
Details
Screenshot of GNOME Shell search showing AppStream information (Lugaru HD) (176.15 KB, image/png)
2017-07-09 01:22 CEST, Neal Gompa
Details

Description Neal Gompa 2016-06-09 16:30:42 CEST
Description of problem:
GNOME Software, Plasma Discover, and other PackageKit frontends that are application centric depend on AppStream metadata to function properly. Without the data, it doesn't really show anything useful.

Since we switched the PackageKit backend to the Hif backend, we now have a way to provide useful AppStream data that can be processed by PackageKit frontends.

After speaking to Richard Hughes on #yum about it, he suggested using appstream-builder to build the repodata and use modifyrepo_c to merge it into the repodata. He's blogged about this approach[1] and it's probably the best way to go.

I've not yet tested how long it takes for it to generate the repodata, however, I do not think we need to generate it as often as we need to for regular rpm-md repodata.

There are two possible approaches to selectively doing this: by detecting the Provides in the built RPM, or purely time based.

If we do it based on what the detected Provides in the built RPM says, then we would only kick off this task when "appdata()" is detected as one of the generated Provides in the RPM. This is really only viable if the time it takes to actually generate the repodata is small. It also may be overkill as the AppStream data may not be changing very often.

If we follow a purely time-based approach, we could generate it once a week and update the repodata then. This is probably the more sensible approach for Cauldron, and after release, updating the metadata as packages are updated will probably be much less painful.

Richard suggests that at least 8 threads might make this faster, though we should probably do some benchmarking for this to pick the right balance of speed and not causing the computer to grind to a halt. :)

These are the following commands currently proposed:

appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata

modifyrepo_c --no-compress /tmp/appstream-md/appstream.xml.gz </path/to/packages>/repodata/

modifyrepo_c --no-compress /tmp/appstream-md/appstream-icons.tar.gz </path/to/packages>/repodata/

This will likely require a backport of appstream-util from Cauldron to infra_5.

[1]: https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/
Neal Gompa 2016-06-09 16:33:02 CEST

Blocks: (none) => 15527

Comment 1 Neal Gompa 2016-08-07 02:36:01 CEST
The backport work is ready in infra_5.

The packages need to be built in the following order: libsoup -> json-glib -> glib2.0 -> appstream-glib

The built appstream-util package includes the appstream-builder tool.
Samuel Verschelde 2016-09-22 11:04:53 CEST

Target Milestone: --- => Mageia 6

Neal Gompa 2016-10-11 01:37:16 CEST

Blocks: (none) => 13452

Comment 2 Neal Gompa 2016-10-11 01:58:24 CEST
I've updated the backported version to be in sync with Cauldron (appstream-glib 0.6.3).
Neal Gompa 2016-10-17 11:12:07 CEST

Priority: Normal => release_blocker

Comment 3 Rémi Verschelde 2016-10-17 12:00:40 CEST
@ Sysadmins: Could we have a status on this? What's left to be done to solve this issue?

Status comment: (none) => Waiting for sysadmins(tm)

Samuel Verschelde 2016-11-08 12:02:33 CET

QA Contact: (none) => ngompa13

Comment 4 Rémi Verschelde 2016-11-23 21:20:51 CET
Neal: Are those packages submitted to infra_5 already, or not yet? If not, go ahead and submit them, so that sysadmins just need to add the necessary commands for metadata generation.
Samuel Verschelde 2016-11-23 21:24:26 CET

Status comment: Waiting for sysadmins(tm) => tmb is going to work on it
Assignee: sysadmin-bugs => tmb

Comment 5 Thomas Backlund 2016-11-23 21:25:08 CET
So I'll build the packages and do a trial-run on a local repo to see what kind of time/cpu it takes
Comment 6 Neal Gompa 2016-11-23 23:50:45 CET
(In reply to Rémi Verschelde from comment #4)
> Neal: Are those packages submitted to infra_5 already, or not yet? If not,
> go ahead and submit them, so that sysadmins just need to add the necessary
> commands for metadata generation.

I cannot, since when I try, it refuses to use the infra_5 copies of the package sources in SVN, which leads to most of them failing (I did have to make changes for it to work on mga5).
Comment 7 Neal Gompa 2016-11-24 11:16:33 CET
With some help from Rémi and Nicholas Lécureuil, I've submitted the packages to infra_5.

Nicholas was able to install appstream-util successfully, so it's ready to be used.
Comment 8 Nicolas Lécureuil 2016-11-24 14:08:13 CET
what is the next step ?

CC: (none) => mageia

Comment 9 Neal Gompa 2016-11-24 14:35:45 CET
We need to figure out how fast we can make appstream-builder go to generate the metadata.

The sample command I have below is a good place to start:
> appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/tmp/appstream-md --packages-dir=</path/to/packages> --temp-dir=/tmp/appstream-tempdata

The two things to tweak are --max-threads and --packages-dir. The maximum threads in the example is 8, but we probably want even more, since the server should be capable of more. The packages dir needs to be pointed to the directory where the packages reside. On the other hand, we may want to do them in parallel for each repository.

Given that for Cauldron, we have three repositories to concern ourselves with, maybe something along the lines of this would work?

appstream-builder --origin="Mageia.Org" --basename=appstream --cache-dir=/tmp/appstream-cache-$rel-$sect-$repo-$arch --enable-hidpi --max-threads=8 --min-icon-size=32 --output-dir=/var/tmp/appstream-md-$rel-$sect-$repo-$arch --packages-dir=/distrib/bootstrap/$rel/$arch/media/$sect/$repo/ --temp-dir=/tmp/appstream-tempdata-$rel-$sect-$repo-$arch

modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream.xml.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/

modifyrepo_c --no-compress /tmp/appstream-md-$rel-$sect-$repo-$arch/appstream-icons.tar.gz /distrib/bootstrap/$rel/$arch/media/$sect/$repo/repodata/

Note the vars:
$rel = distro release: "6" or "cauldron", for example
$sect = distro section: "core", "nonfree", "tainted"
$repo = repo per section: "release", "updates", "updates_testing", "backports", "backports_testing"
$arch = architecture: "i586", "x86_64", "armv5tl", "armv7hl"

The modifyrepo_c commands need to be run on every createrepo task, regardless, as the AppStream data appended will get deleted otherwise on successive createrepo_c runs from /repodata folder. So the generated AppStream metadata in /var/tmp/appstream-md-$rel-$sect-$repo-$arch (or wherever you put it) should stick around to be reused when it's not being regenerated.

Though, if it is quick enough, we could regenerate appstream with rpm-md data. That would simplify things considerably. I'm just not sure how long the process is.
Comment 10 Rémi Verschelde 2016-11-24 14:39:41 CET
BTW, on IRC Thomas mentioned that if the process takes too long and would slow down the BS noticeably, we could also have a cron job regenerate the repodata regularly.
Comment 11 Neal Gompa 2016-11-24 15:26:08 CET
(In reply to Rémi Verschelde from comment #10)
> BTW, on IRC Thomas mentioned that if the process takes too long and would
> slow down the BS noticeably, we could also have a cron job regenerate the
> repodata regularly.

Yes, that's why I mentioned that we want to keep around the generated output and just re-append it on createrepo tasks in between appstream repodata regeneration tasks if we go that route. Appending to the repodata takes zero time, since it just amends repomd.xml with new information and copies the data files into the repodata folder.

However, if it is fast enough, we can just generate it as part of the createrepo task.
Comment 12 Neal Gompa 2016-12-12 22:00:41 CET
@Thomas,

Any progress on this?
Comment 13 Samuel Verschelde 2017-01-09 14:45:45 CET
(In reply to Neal Gompa from comment #12)
> @Thomas,
> 
> Any progress on this?

Please Thomas and other sysadmins, can you give status and ETA for this blocker?
Samuel Verschelde 2017-01-17 10:29:39 CET

Blocks: 15527 => (none)

Nicolas Lécureuil 2017-03-07 00:56:16 CET

CC: (none) => mageia, pterjan

Comment 14 Nicolas Lécureuil 2017-03-22 22:45:32 CET
pascal can you look to this ?
Rémi Verschelde 2017-04-04 11:52:37 CEST

Status comment: tmb is going to work on it => Waiting for sysadmins(tm)

Comment 15 Neal Gompa 2017-06-05 21:32:34 CEST
Now that we're in Release Freeze, can we *please* get this done? It will make it so that GNOME Software and Plasma Discover actually have something to show users...
Comment 16 Marja van Waes 2017-06-20 22:10:25 CEST
Taking this bug off the release blocker list, as discussed in council meeting.

CC: (none) => marja11
Priority: release_blocker => High

Comment 17 Neal Gompa 2017-07-07 13:53:08 CEST
Can we please have the AppStream metadata generated for the release repos for Mageia 6? I really want to put that into the release notes, as it's a huge deal that we'll properly support them so that things like GNOME Software and Plasma Discover work properly.
Comment 18 Neal Gompa 2017-07-07 14:11:48 CEST
I've rebased the backported appstream-glib for mga5 infra and submitted it to be built.
Comment 19 Thomas Backlund 2017-07-08 21:42:51 CEST
Created attachment 9468 [details]
apstream-builder segfault, can also be reproduced on current cauldron

CC: (none) => tmb

Comment 20 Thomas Backlund 2017-07-08 22:22:42 CEST
Hm, it seemn to not be thread-safe :/

dropping down to 2 threads make it not segfault ... but that makes initial run take 15 minutes :/
Comment 21 Neal Gompa 2017-07-08 22:48:35 CEST
Hmm, interestingly, I can't reproduce on my system with the 8 thread command (admittedly, I have only a 4 core system), but I'm going to try to find a more powerful machine to see if I can reproduce it.

I've also filed a bug upstream about it, to notify Richard Hughes about the issue.

See Also: (none) => https://github.com/hughsie/appstream-glib/issues/177

Comment 22 Thomas Backlund 2017-07-09 00:23:06 CEST
Ok, in order to get it done I had to drop down to a single thread :/

So it took about  ~16 minutes for core/release with its ~28000 rpms...

Now that is not a problem for a stable release as /release repos only have to been generated once, so the time is not an issue.

And updates trees are way smaller than /release so they will be easier... but we do need to figure out why it breaks down so easily...

Anyway, there is now appstream metadata generated for i586 and x86_64 being mirrored out so you can check if it behaves as you want...
Comment 23 Neal Gompa 2017-07-09 00:51:55 CEST
Created attachment 9471 [details]
Screenshot of Plasma Discover showing AppStream information (Lugaru HD)

So far, verified that Plasma Discover works. Setting up a GNOME environment to check GNOME Software.
Comment 24 Neal Gompa 2017-07-09 01:21:11 CEST
Created attachment 9472 [details]
Screenshot of GNOME Software showing AppStream information (Lugaru HD)

After installing GNOME Software, letting it download metadata, and then restarting the GNOME Software service, the AppStream metadata showed up!
Comment 25 Neal Gompa 2017-07-09 01:22:10 CEST
Created attachment 9473 [details]
Screenshot of GNOME Shell search showing AppStream information (Lugaru HD)

And it seems GNOME Shell application search is now fully functional!
Comment 26 Neal Gompa 2017-07-09 01:25:52 CEST
It looks like everything is working. Both appstreamcli (appstream) and appstream-util (appstream-glib) can also search the local AppStream cache.

Thomas, thanks for everything! As long as we keep it up, we should be in great shape!

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Neal Gompa 2017-07-09 01:39:01 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=21207

Comment 27 Nicolas Lécureuil 2017-07-09 08:09:23 CEST
(In reply to Thomas Backlund from comment #22)
> Ok, in order to get it done I had to drop down to a single thread :/
> 
> So it took about  ~16 minutes for core/release with its ~28000 rpms...
> 
> Now that is not a problem for a stable release as /release repos only have
> to been generated once, so the time is not an issue.
> 
> And updates trees are way smaller than /release so they will be easier...
> but we do need to figure out why it breaks down so easily...
> 
> Anyway, there is now appstream metadata generated for i586 and x86_64 being
> mirrored out so you can check if it behaves as you want...

on cauldron maybe we can generate this "by hand" this is an issue if we need to wait 16mins more for hdlist to be available ( i think about kf5 update for ex :) ).
Comment 28 Rémi Verschelde 2017-07-09 08:12:03 CEST
(In reply to Nicolas Lécureuil from comment #27)
> 
> on cauldron maybe we can generate this "by hand" this is an issue if we need
> to wait 16mins more for hdlist to be available ( i think about kf5 update
> for ex :) ).

Neal mentioned that on Fedora they only update the AppStream metadata via a weekly cron job. We could likely do something similar, 15 min once a week in the night should not be a big issue.
Comment 29 Nicolas Lécureuil 2017-07-09 08:13:40 CEST
(In reply to Rémi Verschelde from comment #28)
> (In reply to Nicolas Lécureuil from comment #27)
> > 
> > on cauldron maybe we can generate this "by hand" this is an issue if we need
> > to wait 16mins more for hdlist to be available ( i think about kf5 update
> > for ex :) ).
> 
> Neal mentioned that on Fedora they only update the AppStream metadata via a
> weekly cron job. We could likely do something similar, 15 min once a week in
> the night should not be a big issue.


yes this seems a lot saner :)
Comment 30 Neal Gompa 2017-07-09 12:29:55 CEST
(In reply to Nicolas Lécureuil from comment #29)
> (In reply to Rémi Verschelde from comment #28)
> > (In reply to Nicolas Lécureuil from comment #27)
> > > 
> > > on cauldron maybe we can generate this "by hand" this is an issue if we need
> > > to wait 16mins more for hdlist to be available ( i think about kf5 update
> > > for ex :) ).
> > 
> > Neal mentioned that on Fedora they only update the AppStream metadata via a
> > weekly cron job. We could likely do something similar, 15 min once a week in
> > the night should not be a big issue.
> 
> 
> yes this seems a lot saner :)

That means we need to cache the results and re-append it each time the rpm-md repodata is regenerated.

Note You need to log in before you can comment on or make changes to this bug.