Bug 1879 - Some custom attributes in LDAP
Summary: Some custom attributes in LDAP
Status: RESOLVED OLD
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: Others (show other bugs)
Version: unspecified
Hardware: i586 Linux
Priority: Normal enhancement
Target Milestone: ---
Assignee: Sysadmin Team
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 1308
  Show dependency treegraph
 
Reported: 2011-06-21 15:41 CEST by Romain d'Alverny
Modified: 2014-05-08 18:05 CEST (History)
5 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Romain d'Alverny 2011-06-21 15:41:35 CEST
This is related to bug 1045 in that these data would be used in the project dashboard. Of course, it relies on bug 452 (privacy policy) and it will need explicit input from each user to be used.

This is in no way a hurry, neither a requirement, I would like to collect point of views already.

To get some separate statistics on the whole project, wishes would be:
 * gender
 * birthday
 * size
 * shoe size
 * eye colour

Those may sound pointless as such, and there are partly; but they are as well useful to show the diversity/distribution of profiles of people participating in the project in a demographics part of the dashboard. These stats purpose is only to be aggregated for whole stats on the group, not to be displayed on individual user page or to be queried for - so actually, maybe there's a better way than storing these into the LDAP (a separate database?).

Plus:
 * alt languages, short bio, several URLs (linking one's profile against other personal stuff on the net)
 * countryName and localityName (see bug 998).
 * is there a timestamp of when the account was created?
Comment 1 Michael Scherer 2011-06-21 20:25:09 CEST
While I would agree that ldap is not maybe the best way to do this, the technical implication of having such information in a db are not to be taken lightly :
- storing gender would make us go in a different category regarding the CNIL ( not sure, need to check again ), and could open a can of worm (  http://blog.xkcd.com/2010/05/06/sex-and-gender/ ) later depending on how we store this.

- birthday is a valuable information since it can be used for identity theft or impersonification ( while anything can, birthday is likely more useful for that than shoe size ). 

So if we store that in a db, we will have to secure the access to that db. For now, we have treated ldap as being very important, and thus valstar is not accessible to anyone besides sysadmin. Adding another db would requires use to take care of having the same level of scrutinity on the database server.

But since the goal is aggregation rather than individual information, I guess we could find a system where the link between the data and the user is not stored ( something like what is done for epoll ). For example, each people could fill a form, and we only store the fact that the user has already filled the form, and store the raw data separated ?

Then it would be quite hard to find who entered what, if there is enough person that enter data ( cause of course, if there is 3 person, that's trivial ) ?

This way, we would not have to secure much more the information, and it will be more respectful of privacy while being still useful for the project.

And we have a timestamp in ldap of the account creation :  createTimestamp
For example, here is mine : 

createTimestamp: 20101105155859Z

should be 2010-11-05 at 15:58:59, and Z is the indication for universal time, but I didn't found the specification for the format :/

CC: (none) => misc

Comment 2 Romain d'Alverny 2011-06-21 22:34:44 CEST
(In reply to comment #1)
> - storing gender would make us go in a different category regarding the CNIL (
> not sure, need to check again ),

> and could open a can of worm

Ah yes, good point. Well, stats would be both interesting regarding sex, and gender. But first, sex (male/female/not sure) would be used. That's perhaps over-simplifying, but we won't do everything complete at once. Anyway, help/insight is welcome.

> - birthday is a valuable information since it can be used for identity theft or
> impersonification ( while anything can, birthday is likely more useful for that
> than shoe size ). 

For birthday, we need year and month at most. Day would be nice if we were to wish a happy birthday and suggest special shoes matching their eye colour at the very birthday for every contributor but... we could make this for a whole month as well :-p

Anyway, that's not a requirement, primary intended use is aggregate stats.

> But since the goal is aggregation rather than individual information, I guess
> we could find a system where the link between the data and the user is not
> stored ( something like what is done for epoll ). For example, each people
> could fill a form, and we only store the fact that the user has already filled
> the form, and store the raw data separated ?

Or we could perhaps store these into a separate db (actually, at least a SQL db makes more sense to me for this), and have a one-way scheme for linking this data set to the user, activated only on users request:
 - the stats-gathering db would not hold any upstream (aka, LDAP) identifier to the actual user; but a hash (of uid or other non-public id, strictly attached to user account and/or some salt?)
 - on upstream user/admin request, we can query this data and either update it or remove it (it's good to leave this possibility).

It's a quick idea, so there is certainly a better solution, but this one would at least make the stats data less attractive for an attacker if she doesn't have access to the LDAP data.

Still, location/language/timestamp are info that should be kept in close relation with the user - in LDAP or outside of it.

> And we have a timestamp in ldap of the account creation :  createTimestamp

Excellent, thanks!
Comment 3 Michael Scherer 2011-06-22 00:28:20 CEST
Maybe a structured db  is not the proper tool, wouldn't some schemaless nosql databases be a better idea ?
Comment 4 Romain d'Alverny 2011-06-23 15:23:29 CEST
(In reply to comment #3)
> Maybe a structured db  is not the proper tool, wouldn't some schemaless nosql
> databases be a better idea ?

 a) Maybe yes. CouchDB or MongoDB would do fine here I guess.

But I'd like views too on:
 b) the "one-way-only" identification of data records ownership (and how we compute this hash key); hash(uid + createTimestamp + salt) ? or other method?

 c) and how we dispatch data, between LDAP and this separate db:
   - countryName, localityName (LDAP would be best, used for the users map)
   - alt languages, bio and links (LDAP too?)
   - birth year & month
   - male/female/not sure (separate db?)
   - size, shoe size, eye colour, etc. (separate db)
   - day(createTimestamp) (separate db)
Comment 5 Nicolas Vigier 2011-06-23 15:50:59 CEST
What would be the use of such db/stats ?

If we only want some stats, I think we could create an anonymous feedback webpage, where we ask various questions about users, and store the results in a sql database. Do we need more than this ?

CC: (none) => boklm

Comment 6 Romain d'Alverny 2011-06-23 16:15:42 CEST
(In reply to comment #5)
> What would be the use of such db/stats ?

Already said above:
 - location is for contributors map (see bug 998)
 - languages, bio, links is to enhance users/groups pages (see bug 1045, making people and activities more visible in the project)
 - all the rest is for various demographics stuff.

> If we only want some stats, I think we could create an anonymous feedback
> webpage, where we ask various questions about users, and store the results in a
> sql database. Do we need more than this ?

This is ok if you want a fixed picture at a single point in time. And you have to ask everyone to fill in the feedback form each time you want to get a picture.

Having a distinct record that is somehow tied to an existing account (without knowing who exactly, that's my point b) above) allows to have several pictures at several points in time, with more consistence; and it needs people to answer only once.
Romain d'Alverny 2011-07-29 17:51:46 CEST

Blocks: (none) => 1308

Comment 7 Buchan Milne 2011-10-05 14:17:13 CEST
As commented in bug #998, 'l' or 'localityName' is available, and multi-valued, and sufficient for storing City name(s).

I think before discussing further attributes that we want, maybe we need to discuss more specifically what we want to use them for, possibly have some (basic) mockups or diagrams.

Also, if we are going to do some work on this, we should consider what the requirements/plans are for other uses of the same data.

For example, maintainer groups has come up in the past.

IMHO, these are the kinds of things I might want to be able to do:
-Find out more details about *packages*
-See who maintains a specific package
-See what other packages they maintain
-See what their status is (e.g. 'on holiday').

As a maintainer, I would like to be able to see:
-See what commits I have made in the past x days
-See what uploads I have done to which distro in the past x days
-See what bugs I have closed in the past x days
-See recent commits on packages I maintain or for which I am a member of a maintainers group

A QA team or technical lead or group admin may want to see:
-What percentage of a contributor's uploads have been successful
-Metrics for contributor workload


> But I'd like views too on:
>  b) the "one-way-only" identification of data records ownership (and how we
> compute this hash key); hash(uid + createTimestamp + salt) ? or other method?

Hash the entryUUID


>  c) and how we dispatch data, between LDAP and this separate db:
>    - countryName, localityName (LDAP would be best, used for the users map)
>    - alt languages, bio and links (LDAP too?)
>    - birth year & month

There is no standardised schema for this, but evolutionperson, openxchange, and gosa schemas have attributes for this in different formats (but usually single attributes). It may be best to have separate attributes (e.g. birthYear, birthMonth), as in some cases you may want to use them separately and not tie them too closely together (e.g. 'Average age' vs 'contributors who celebrate - or not - their birthday this month).

>    - male/female/not sure (separate db?)

person has 'title' attribute, but nothing specifically for gender. Is it important? What would we do with this data? (About the only useful thing I see is trying to avoid too much gender bias, but without a plan for addressing it, keeping the data may be counter-productive.)

>    - size, shoe size, eye colour, etc. (separate db)

What is 'size' for? Height? But, I don't really see what value this *really* holds besides random demographics.

>    - day(createTimestamp) (separate db)

CC: (none) => bgmilne

Comment 8 Romain d'Alverny 2011-10-05 17:50:58 CEST
(In reply to comment #7)
> I think before discussing further attributes that we want, maybe we need to
> discuss more specifically what we want to use them for, possibly have some
> (basic) mockups or diagrams.

To add to your scenarios, maintdb is one possible use case; bug 1045 and bug 998 are two others, a (better) buildsystem dashboard than http://dashboard.mageia.org/ as well. Something along the lines of http://www.artlebedev.com/studio/stat/names/ as well.

> >  c) and how we dispatch data, between LDAP and this separate db:
> >    - countryName, localityName (LDAP would be best, used for the users map)
> >    - alt languages, bio and links (LDAP too?)
> >    - birth year & month
> 
> There is no standardised schema for this, but evolutionperson, openxchange, and
> gosa schemas have attributes for this in different formats (but usually single
> attributes). It may be best to have separate attributes (e.g. birthYear,
> birthMonth), as in some cases you may want to use them separately and not tie
> them too closely together (e.g. 'Average age' vs 'contributors who celebrate -
> or not - their birthday this month).

Sure.

> >    - male/female/not sure (separate db?)
> 
> person has 'title' attribute, but nothing specifically for gender. Is it
> important? What would we do with this data? (About the only useful thing I see
> is trying to avoid too much gender bias, but without a plan for addressing it,
> keeping the data may be counter-productive.)

At least the data is here and we can track if there's a change, because of plans or not. My personal metric of success for Mageia is that we reach parity in contributors. :-p

> >    - size, shoe size, eye colour, etc. (separate db)
> 
> What is 'size' for? Height? But, I don't really see what value this *really*
> holds besides random demographics.

Indeed. Random, fun demographics. Nothing more (I know the list of things to track could be infinite, starting with musical and colour tastes but I didn't list these here ;-) )
Romain d'Alverny 2011-10-05 17:51:16 CEST

CC: (none) => rdalverny

Comment 9 Marja Van Waes 2012-01-08 18:56:15 CET
(In reply to comment #8)
> (In reply to comment #7)

> 
> > >    - size, shoe size, eye colour, etc. (separate db)
> > 
> > What is 'size' for? Height? But, I don't really see what value this *really*
> > holds besides random demographics.
> 
> Indeed. Random, fun demographics. Nothing more (I know the list of things to
> track could be infinite, starting with musical and colour tastes but I didn't
> list these here ;-) )


I can make it extra funny, pretend to have baby feet and be 3 m tall, or pretend to have feet that exceed my length :þ

@ Romain

Are you more interested in the answers, true or not, or in what the values really are?
If it is the latter: I'll give true values for the questions that I think are useful, when given the possibility to skip the other ones :)

CC: (none) => marja11

Comment 10 Romain d'Alverny 2012-01-08 19:46:24 CET
(In reply to comment #9)
> Are you more interested in the answers, true or not, or in what the values
> really are?
> If it is the latter: I'll give true values for the questions that I think are
> useful, when given the possibility to skip the other ones :)

All values are to be optional; but indeed, the goal is to dress stats with real values, not fake ones (but we won't avoid that I guess).

I guess it will make more sense for people when the visualisation of the data, and what trends insight they provide, will be available - I have a home prototype, I guess I should feed it with fake data to first show the thing.
Comment 11 Marja Van Waes 2012-01-08 21:46:20 CET
(In reply to comment #10)

> 
> I guess it will make more sense for people when the visualisation of the data,
> and what trends insight they provide, will be available - I have a home
> prototype, I guess I should feed it with fake data to first show the thing.

I can't wait to see it :)
Nicolas Vigier 2012-01-13 23:15:28 CET

Status: NEW => ASSIGNED

Comment 12 Romain d'Alverny 2012-08-21 16:48:55 CEST
Giving up on this idea.

Status: ASSIGNED => RESOLVED
Resolution: (none) => OLD

Nicolas Vigier 2014-05-08 18:05:40 CEST

CC: boklm => (none)


Note You need to log in before you can comment on or make changes to this bug.