Bug 2330 - Downloads stats needed
Summary: Downloads stats needed
Status: RESOLVED OLD
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: Others (show other bugs)
Version: unspecified
Hardware: i586 Linux
Priority: Normal enhancement
Target Milestone: ---
Assignee: Romain d'Alverny
QA Contact:
URL:
Whiteboard:
Keywords: Atelier
Depends on: 4034
Blocks: 859 1308
  Show dependency treegraph
 
Reported: 2011-07-29 17:51 CEST by Romain d'Alverny
Modified: 2013-12-16 15:15 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Romain d'Alverny 2011-07-29 17:51:29 CEST
It will help to know what is downloaded, how many times, how much, and from where; and report this somewhere public (see bug 1308). Today we don't know.

It's not to gather and munch figures just for the sake of it, but to know and report what's happening and if there's some pattern of interest in that (per ISO product, package, package media, installs, updates, mirror location, whole download size or pattern, etc.).

What is downloaded today that interest us:
 - ISO images (through http://www.mageia.org/downloads/ , other index pages or directly from mirrors).
 - packages (for install or updates; through web indexes, direct links or directly from mirrors; manually or by system tools).

We have two main sources to know that:
 - our own www downloads page, which redirects to mirror; but this is limited to ISOs and visitors going to mageia.org;
 - download mirrors themselves, which actually distribute the files.

We need to find a way to gather sanitized/normalized logs [1] in a single place [2] for later queries reports [3].

[1] depends on download mirrors software logging formats
[2] need to merge logs in a single format?
[3] what good tool for that can we have? (related to above)

The goal of this bug is to list what's available, what are the things to take care of [4], and how we can do that.

[4] for instance:
 * privacy, by translating IP addresses into broad geo info, useful enough (city/country?);
 * some mirrors won't send us their logs, some would, some may have requirements or need packaged tools for that; and because of that, stats will stay indicative until we have enough mirrors joining in this.

This relates to the mirrors admins mailing-list and db, so we can start to know:
 - which ones would be interested in this;
 - what software they use, and if they log downloads, what the log format is and how sanitizing/retrieving this log could be done.
Romain d'Alverny 2011-07-29 17:51:46 CEST

Blocks: (none) => 1308

Comment 1 Marja Van Waes 2011-10-30 17:10:59 CET
ping?

CC: (none) => marja11

Comment 2 Romain d'Alverny 2011-10-31 17:40:00 CET
Bug 3166 could help regarding stats (ISOs and individual packages, depending on how it's implemented).
Nicolas Vigier 2012-01-13 23:18:30 CET

Status: NEW => ASSIGNED

Nicolas Vigier 2012-01-13 23:19:37 CET

Blocks: (none) => 859

Romain d'Alverny 2012-05-24 23:12:42 CEST

Depends on: (none) => 4034

Romain d'Alverny 2012-07-04 13:58:45 CEST

Keywords: (none) => Atelier

Comment 3 Romain d'Alverny 2012-07-27 14:51:25 CEST
Ok, here's a first, limited proposal for what we have today:

 - daily fetch httpd logs for www.m.o/*/downloads/get for the day before
 - resolve IPs to continent/country/city
 - push resulting/filtered logs in a public repository (see bug 4034)
 - such a log would consist of text files, one download query per line:

    date       source version arch continent country city  count
    ------------------------------------------------------------
    2012-04-01 www    1       i586 EU        FR      Paris 124

 - use these logs to build at least a few meaningful reports

This will be limited to www downloads of course but:
 - it will give a realistic view of the tendancy,
 - it may be later augmented with mirror logs if provided on a regular basis (where we can discriminate downloads initiated from www.m.o from those with a distinct referrer).
Romain d'Alverny 2012-07-27 14:51:39 CEST

Assignee: sysadmin-bugs => rdalverny

Comment 4 Romain d'Alverny 2012-08-06 20:52:17 CEST
See https://ml.mageia.org/l/arc/council/2012-07/msg00104.html for current progress, with a working script (for downloads and daily pings). Now it needs to be hosted and cron'd.
Comment 5 Romain d'Alverny 2013-12-16 15:15:23 CET
I won't have time to work on this. Old issue.

Status: ASSIGNED => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.