You are viewing a plain text version of this content. The canonical link for it is here.

Posted to infrastructure-dev@apache.org by Alan Cabrera <ad...@toolazydogs.com> on 2013/05/27 19:15:43 UTC

Canonical sources for information

I need to know what are the official sources for the following:

full name from username
whether a person is a corporate member from username
What PMC/PPMCs the person belongs to from username

I need to get access to this data from the Python utilities that I am writing.


Regards,
Alan

Re: Canonical sources for information

Posted by sebb <se...@gmail.com>.

On 30 May 2013 23:33, Tony Stevenson <to...@pc-tony.com> wrote:
>
>
>
> Cheers,
> Tony
>
> Sent from my iPhone - Please excuse any brevity or typos.
>
>
>
> On 30 May 2013, at 17:49, Alan Cabrera <ad...@toolazydogs.com> wrote:
>
>>
>> On May 28, 2013, at 10:44 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>
>>> Try the following (passing in --user adc:password):
>>>
>>> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
>>>
>>> Let me know if there are ways that I can make this more convenient.
>>
>> I'm seeing an inconsistency for the records for Sean Kelly, kelly.  He seems to be in the Incubator group but not the Incubator committee.
>>
>
> Why is that an inconsistency? He might need access based on the group. Iirc the group is not equivalent to IPMC membership.

There are (usually) two LDAP groups for each TLP.

The unix group, which is for TLP committers, and gives access to files
on people as well as SVN trees.
The committee group, which should be the same as the PMC, and gives
access to SVN trees.

Usually the unix group is a superset of the committee group.

Not all TLPs have both LDAP groups.

>> Regards,
>> Alan
>>

Re: Canonical sources for information

Posted by Tony Stevenson <to...@pc-tony.com>.

Cheers,
Tony

Sent from my iPhone - Please excuse any brevity or typos. 

On 30 May 2013, at 17:49, Alan Cabrera <ad...@toolazydogs.com> wrote:

> 
> On May 28, 2013, at 10:44 AM, Sam Ruby <ru...@intertwingly.net> wrote:
> 
>> Try the following (passing in --user adc:password):
>> 
>> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
>> 
>> Let me know if there are ways that I can make this more convenient.
> 
> I'm seeing an inconsistency for the records for Sean Kelly, kelly.  He seems to be in the Incubator group but not the Incubator committee.
> 

Why is that an inconsistency? He might need access based on the group. Iirc the group is not equivalent to IPMC membership. 

> Regards,
> Alan
>

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 10:44 AM, Sam Ruby <ru...@intertwingly.net> wrote:

> Try the following (passing in --user adc:password):
> 
> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
> 
> Let me know if there are ways that I can make this more convenient.

I'm seeing an inconsistency for the records for Sean Kelly, kelly.  He seems to be in the Incubator group but not the Incubator committee.

Regards,
Alan

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 10:44 AM, Sam Ruby <ru...@intertwingly.net> wrote:

> On Tue, May 28, 2013 at 1:31 PM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>> 
>> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>> 
>>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>> 
>>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>> 
>>>>>> I need to know what are the official sources for the following:
>>>>>> 
>>>>>> full name from username
>>>>> 
>>>>> ldap
>>>>> 
>>>>>> whether a person is a corporate member from username
>>>>> 
>>>>>> What PMC/PPMCs the person belongs to from username
>>>>> 
>>>>> ldap, well this will show group membership.
>>>>> 
>>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>> 
>>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>> 
>>>> Inconvenient but not a "deal breaker"
>>> 
>>> There is already a cron job that extracts a fair amount of information
>>> from LDAP (and elsewhere) to produce the people.a.o pages.
>>> This information is currently published as HTML, for example:
>>> 
>>> http://people.apache.org/committer-index.html
>>> which contains the following information:
>>> SVN id        Name    SVN Projects
>>> Member (bold)
>>> URL (link)
>>> 
>>> Maybe it would make more sense to modify that to additionally generate
>>> the output in CSV/JSON so it can be used by anyone?
>>> 
>>> Scripts would then just need HTTP access to get the data, and they
>>> would only have access to the data that had been cleared for
>>> publication.
>> 
>> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.
> 
> Try the following (passing in --user adc:password):
> 
> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
> 
> Let me know if there are ways that I can make this more convenient.

Awesome!!!!!!!!!!!!!!



Regards,
Alan

Re: Canonical sources for information

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Alan Cabrera wrote on Tue, May 28, 2013 at 22:39:02 -0700:
> So,  the tooling for now will always provide the username/password.

Make sure you store your password securely --- where no one except you
can reach it.  (That includes root@ folks, btw, so don't run this from
your minotaur crontab.)

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 29, 2013, at 6:36 AM, Sam Ruby <ru...@intertwingly.net> wrote:

> On Wed, May 29, 2013 at 9:10 AM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>> 
>> Having whimsy change would be a bad thing.  I need a simple, canonical, REST API to access data and was hoping whimsy would be that API.  It makes no sense having a myriad of scripts and libraries all with the same cookie cut code to access lower levels of information.  All of that stuff should be behind an API like whimsy.  Then you'll have a single choke point to control access.
> 
> A few things...
> 
> 1) I wrote that JSON code on a whim, and it can be changed very
> quickly.  As in an svn commit on my machine and an svn up on whimsy.
> And that code was written long before your request.  And as far as I
> know, you are the first user of it (IIRC, danielsh opted for the
> text/plain interface).  Net: it is software.  It can be changed
> quickly.

No worries.  I assume that you will make a reasonable effort to keep things stable and backward compatible.

What kind of SLA are you providing with your service?  ;)

> 2) looking at that code[1], I immediately spot a bug.  While the HTML
> interface will filter out data that a non-member couldn't see, the
> JSON interface doesn't.  So I already need to open up that code.
> While I am in there, if you have any requests, now would be a good
> time.

Ok, atm I don't think I rely on that data.  What data would that be?

> 3) that code is just a view.  The underlying model is shared between
> several services.  Adding a new view is very trivial.

Yeah, it's the coalescing of data from disparate sources that makes your service so attractive to me.


Regards,
Alan

Re: Canonical sources for information

Posted by Sam Ruby <ru...@intertwingly.net>.

On Wed, May 29, 2013 at 9:10 AM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>
> Having whimsy change would be a bad thing.  I need a simple, canonical, REST API to access data and was hoping whimsy would be that API.  It makes no sense having a myriad of scripts and libraries all with the same cookie cut code to access lower levels of information.  All of that stuff should be behind an API like whimsy.  Then you'll have a single choke point to control access.

A few things...

1) I wrote that JSON code on a whim, and it can be changed very
quickly.  As in an svn commit on my machine and an svn up on whimsy.
And that code was written long before your request.  And as far as I
know, you are the first user of it (IIRC, danielsh opted for the
text/plain interface).  Net: it is software.  It can be changed
quickly.

2) looking at that code[1], I immediately spot a bug.  While the HTML
interface will filter out data that a non-member couldn't see, the
JSON interface doesn't.  So I already need to open up that code.
While I am in there, if you have any requests, now would be a good
time.

3) that code is just a view.  The underlying model is shared between
several services.  Adding a new view is very trivial.

> Regards,
> Alan

- Sam Ruby

[1] https://svn.apache.org/repos/infra/infrastructure/trunk/projects/whimsy/www/roster/committer.cgi

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

Sent from my iPad

On May 29, 2013, at 5:51 AM, Tony Stevenson <pc...@apache.org> wrote:

> Alan D. Cabrera wrote on Wed, May 29, 2013 at 05:36:19AM -0700:
>> If you review earlier emails in this thread you'll see that Sam's REST API requires a username/password.  My tools will prompt for the user's username and password and pass that along.
> 
> Alan, I will re-iterate my concerns about the storing of personal data -
> please dont cache our data, and more importantly never cache the
> crenedtials you prompt for.  
> 
> Please also ensure that you only do this via SSL to prevent simple eaves
> dropping etc. 

Yes. I will use ssl.  No data will be cached.

>> Previously there was talk about Sam also providing a REST API that didn't require a username/password, and this API would return a smaller subset of publicly safe information.  While I could use such a thing I'm not going to wait for it to take shape as the content of this API still seems to be in flux.
> 
> That is the whole point of whimsy, it is a staging post for
> tools, scripts etc before they make it into prime time. Things will
> likely work just fine, but they may also change.  

> 

Having whimsy change would be a bad thing.  I need a simple, canonical, REST API to access data and was hoping whimsy would be that API.  It makes no sense having a myriad of scripts and libraries all with the same cookie cut code to access lower levels of information.  All of that stuff should be behind an API like whimsy.  Then you'll have a single choke point to control access.

Regards,
Alan

Re: Canonical sources for information

Posted by Tony Stevenson <pc...@apache.org>.

Alan D. Cabrera wrote on Wed, May 29, 2013 at 05:36:19AM -0700:
> If you review earlier emails in this thread you'll see that Sam's REST API requires a username/password.  My tools will prompt for the user's username and password and pass that along.

Alan, I will re-iterate my concerns about the storing of personal data -
please dont cache our data, and more importantly never cache the
crenedtials you prompt for.  

Please also ensure that you only do this via SSL to prevent simple eaves
dropping etc. 

> 
> Previously there was talk about Sam also providing a REST API that didn't require a username/password, and this API would return a smaller subset of publicly safe information.  While I could use such a thing I'm not going to wait for it to take shape as the content of this API still seems to be in flux.
> 

That is the whole point of whimsy, it is a staging post for
tools, scripts etc before they make it into prime time. Things will
likely work just fine, but they may also change.  

> 
> Regards,
> Alan
> 
> On May 28, 2013, at 11:28 PM, Dennis E. Hamilton <de...@acm.org> wrote:
> 
> > What?  What password?
> > 
> > -----Original Message-----
> > From: Alan Cabrera [mailto:adc@toolazydogs.com] 
> > Sent: Tuesday, May 28, 2013 10:39 PM
> > To: infrastructure-dev@apache.org
> > Subject: Re: Canonical sources for information
> > 
> > [ ... ]
> > 
> > So,  the tooling for now will always provide the username/password.  I will code it against 
> > 
> > https://whimsy.apache.org/roster/committer/<username>
> > 
> > 
> > Regards,
> > Alan
> > 
> > 
> 

-- 
Cheers,
Tony

----------------------------------
Tony Stevenson

tony@pc-tony.com
pctony@apache.org

http://www.pc-tony.com

GPG - 1024D/51047D66
----------------------------------

Re: Canonical sources for information

Posted by Tony Stevenson <to...@pc-tony.com>.



Cheers,
Tony

Sent from my iPhone - Please excuse any brevity or typos. 



On 28 May 2013, at 23:11, Sam Ruby <ru...@intertwingly.net> wrote:

> On Tue, May 28, 2013 at 5:55 PM, Tony Stevenson <to...@pc-tony.com> wrote:
>> 
>> On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>> 
>>>> This requires an LDAP login, which means the code probably cannot be
>>>> safely automated to run on a shared host, as the password would need
>>>> to be stored somewhere.
>>>> 
>>>> Would it be possible to provide access to the public information
>>>> without requiring a login?
>>> 
>>> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?
>> 
>> The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.
> 
> Actually, I don't believe so.  Anybody with shell access to certain
> machines can obtain this data (read only).  This includes role
> accounts, including the one used by the web server.
> 

That's because these machine bind as a pre-configured user (nss_ldap). IIRC. 

> Given the way HTTP authentication works, it probably would be best if
> we want to provide an unauthenticated service that we do so with a
> separate URI.
> 
> - Sam Ruby

Re: Canonical sources for information

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Wed, May 29, 2013 at 3:36 PM, Alan D. Cabrera <ad...@toolazydogs.com> wrote:
> Previously there was talk about Sam also providing a REST API that didn't require a
> username/password, and this API would return a smaller subset of publicly safe
> information.

FWIW, I spent a few minutes hacking together a script that scrapes
data from http://people.apache.org/committer-index.html and turns it
into the kind of JSON you outlined earlier. A static snapshot is
available under http://zitting.name/2013/05/api/v1/:

    $ curl http://zitting.name/2013/05/api/v1/committers
    {"committers":["a_horuzhenko","aadamchik","aadomowski", ... ]}

    $ curl http://zitting.name/2013/05/api/v1/committers/adc
    {"fullName":"Alan Cabrera","member":true,"projects":...}

The quick and dirty Perl script I used for this is included below. It
would obviously be cleaner if done as a part of the p.a.o generation
instead of as a web scraper.

BR,

Jukka Zitting

----
#!/usr/bin/perl

use strict;
use warnings;

use JSON;

my @committers = ();
my $html = join '', <>;
while ($html =~
      m{<tr>
        <td bgcolor=".*?">(<b class="member">)?<a
id='.+?'></a>(.+?)(<a id='.'></a>)?(</b>)?</td>
        <td bgcolor=".*?">(<b class="member">)?(<a
href="(.*?)">)?(.+?)(</a>)?(</b>)?</td>
        <td bgcolor=".*?">(.*?)</td>
        </tr>}sg) {
  my $member = $1 ? JSON::true : JSON::false;
  my $username = $2;
  my $url = $7 || "";
  my $name = $8;
  my $projects = $11;
  my @projects = ();
  my @pmcs = ();
  while ($projects =~ m{<a
href='committers-by-project\.html#.*?'>(.*?)(-pmc)?</a>}g) {
    if ($2) {
      push @pmcs, $1;
    } elsif ($1 ne "member"
         and $1 ne "apsite"
         and $1 ne "pmc-chairs"
         and $1 ne "infrastructure") {
      push @projects, $1;
    }
  }
  open FILE, ">committers/$username.json";
  print FILE to_json({
    username => $username,
    fullName => $name,
    projects => \@projects,
    pmcs     => \@pmcs,
    member   => $member
  });
  close FILE;
  push @committers, $username;
}

open FILE, ">committers.json";
print FILE to_json({ committers => \@committers });
close FILE;

Re: Canonical sources for information

Posted by "Alan D. Cabrera" <ad...@toolazydogs.com>.

If you review earlier emails in this thread you'll see that Sam's REST API requires a username/password.  My tools will prompt for the user's username and password and pass that along.

Previously there was talk about Sam also providing a REST API that didn't require a username/password, and this API would return a smaller subset of publicly safe information.  While I could use such a thing I'm not going to wait for it to take shape as the content of this API still seems to be in flux.

Regards,
Alan

On May 28, 2013, at 11:28 PM, Dennis E. Hamilton <de...@acm.org> wrote:

> What?  What password?
> 
> -----Original Message-----
> From: Alan Cabrera [mailto:adc@toolazydogs.com] 
> Sent: Tuesday, May 28, 2013 10:39 PM
> To: infrastructure-dev@apache.org
> Subject: Re: Canonical sources for information
> 
> [ ... ]
> 
> So,  the tooling for now will always provide the username/password.  I will code it against 
> 
> https://whimsy.apache.org/roster/committer/<username>
> 
> 
> Regards,
> Alan
> 
>

RE: Canonical sources for information

Posted by "Dennis E. Hamilton" <de...@acm.org>.

What?  What password?

-----Original Message-----
From: Alan Cabrera [mailto:adc@toolazydogs.com] 
Sent: Tuesday, May 28, 2013 10:39 PM
To: infrastructure-dev@apache.org
Subject: Re: Canonical sources for information

[ ... ]

So,  the tooling for now will always provide the username/password.  I will code it against 

https://whimsy.apache.org/roster/committer/<username>


Regards,
Alan

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 3:48 PM, Sam Ruby <ru...@intertwingly.net> wrote:

> On Tue, May 28, 2013 at 6:19 PM, sebb <se...@gmail.com> wrote:
>> On 28 May 2013 23:11, Sam Ruby <ru...@intertwingly.net> wrote:
>>> On Tue, May 28, 2013 at 5:55 PM, Tony Stevenson <to...@pc-tony.com> wrote:
>>>> 
>>>> On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>>> 
>>>>>> This requires an LDAP login, which means the code probably cannot be
>>>>>> safely automated to run on a shared host, as the password would need
>>>>>> to be stored somewhere.
>>>>>> 
>>>>>> Would it be possible to provide access to the public information
>>>>>> without requiring a login?
>>>>> 
>>>>> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?
>>>> 
>>>> The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.
>>> 
>>> Actually, I don't believe so.
>> 
>> Aren't the credentials used to restrict which data is returned?
>> i.e. members get more info than committers?
> 
> Yes, the authenticated user name is used in determining what filters
> to apply to the results; but the point still stands that the
> credentials aren't used to bind to LDAP.
> 
>>> Anybody with shell access to certain
>>> machines can obtain this data (read only).  This includes role
>>> accounts, including the one used by the web server.
>> 
>> I think that's a separate feature.
> 
> Indeed.
> 
>> AFAIK it's how the people.a.o cron job works.
> 
> Almost certainly.
> 
>>> Given the way HTTP authentication works, it probably would be best if
>>> we want to provide an unauthenticated service that we do so with a
>>> separate URI.
>> 
>> Or (as I already wrote) update the people cron job to generate
>> additional output file formats.
> 
> That works too.
> 
> I'll note that the json output option on whimsy was created long
> before this request was made.
> 
> And I will also note that python has excellent libraries for dealing
> directly with LDAP, should Alan make the choice to run his scripts on
> ASF Infrastructure.
> 
> I'll finally note that LDAP isn't the only authoritative source,
> though in many cases it is the only one that matters.  Many of our
> other authoritative sources contain contradictory information, as you
> can see here:
> 
>  https://whimsy.apache.org/roster/committee/


So,  the tooling for now will always provide the username/password.  I will code it against 

https://whimsy.apache.org/roster/committer/<username>


Regards,
Alan

Re: Canonical sources for information

Posted by Sam Ruby <ru...@intertwingly.net>.

On Tue, May 28, 2013 at 6:19 PM, sebb <se...@gmail.com> wrote:
> On 28 May 2013 23:11, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Tue, May 28, 2013 at 5:55 PM, Tony Stevenson <to...@pc-tony.com> wrote:
>>>
>>> On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>>
>>>>> This requires an LDAP login, which means the code probably cannot be
>>>>> safely automated to run on a shared host, as the password would need
>>>>> to be stored somewhere.
>>>>>
>>>>> Would it be possible to provide access to the public information
>>>>> without requiring a login?
>>>>
>>>> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?
>>>
>>> The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.
>>
>> Actually, I don't believe so.
>
> Aren't the credentials used to restrict which data is returned?
> i.e. members get more info than committers?

Yes, the authenticated user name is used in determining what filters
to apply to the results; but the point still stands that the
credentials aren't used to bind to LDAP.

>> Anybody with shell access to certain
>> machines can obtain this data (read only).  This includes role
>> accounts, including the one used by the web server.
>
> I think that's a separate feature.

Indeed.

> AFAIK it's how the people.a.o cron job works.

Almost certainly.

>> Given the way HTTP authentication works, it probably would be best if
>> we want to provide an unauthenticated service that we do so with a
>> separate URI.
>
> Or (as I already wrote) update the people cron job to generate
> additional output file formats.

That works too.

I'll note that the json output option on whimsy was created long
before this request was made.

And I will also note that python has excellent libraries for dealing
directly with LDAP, should Alan make the choice to run his scripts on
ASF Infrastructure.

I'll finally note that LDAP isn't the only authoritative source,
though in many cases it is the only one that matters.  Many of our
other authoritative sources contain contradictory information, as you
can see here:

  https://whimsy.apache.org/roster/committee/

- Sam Ruby

Re: Canonical sources for information

Posted by sebb <se...@gmail.com>.

On 28 May 2013 23:11, Sam Ruby <ru...@intertwingly.net> wrote:
> On Tue, May 28, 2013 at 5:55 PM, Tony Stevenson <to...@pc-tony.com> wrote:
>>
>> On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>
>>>> This requires an LDAP login, which means the code probably cannot be
>>>> safely automated to run on a shared host, as the password would need
>>>> to be stored somewhere.
>>>>
>>>> Would it be possible to provide access to the public information
>>>> without requiring a login?
>>>
>>> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?
>>
>> The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.
>
> Actually, I don't believe so.

Aren't the credentials used to restrict which data is returned?
i.e. members get more info than committers?

> Anybody with shell access to certain
> machines can obtain this data (read only).  This includes role
> accounts, including the one used by the web server.

I think that's a separate feature.

AFAIK it's how the people.a.o cron job works.

> Given the way HTTP authentication works, it probably would be best if
> we want to provide an unauthenticated service that we do so with a
> separate URI.

Or (as I already wrote) update the people cron job to generate
additional output file formats.

> - Sam Ruby

Re: Canonical sources for information

Posted by Sam Ruby <ru...@intertwingly.net>.

On Tue, May 28, 2013 at 5:55 PM, Tony Stevenson <to...@pc-tony.com> wrote:
>
> On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>
>>> This requires an LDAP login, which means the code probably cannot be
>>> safely automated to run on a shared host, as the password would need
>>> to be stored somewhere.
>>>
>>> Would it be possible to provide access to the public information
>>> without requiring a login?
>>
>> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?
>
> The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.

Actually, I don't believe so.  Anybody with shell access to certain
machines can obtain this data (read only).  This includes role
accounts, including the one used by the web server.

Given the way HTTP authentication works, it probably would be best if
we want to provide an unauthenticated service that we do so with a
separate URI.

- Sam Ruby

Re: Canonical sources for information

Posted by Tony Stevenson <to...@pc-tony.com>.

On 28 May 2013, at 21:50, Alan Cabrera <ad...@toolazydogs.com> wrote:
>> 
>> This requires an LDAP login, which means the code probably cannot be
>> safely automated to run on a shared host, as the password would need
>> to be stored somewhere.
>> 
>> Would it be possible to provide access to the public information
>> without requiring a login?
> 
> How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?

The issue here is that your credentials are used to bind to LDAP to collect this data, AIUI.  Implementing your change would mean changing the access control we have in LDAP (slapd). In fact the current ACL prevents *any* anonymous access to our LDAP data.  We did not do this accidentally, it prevents any data from being accidentally leaked. 

Technically our LDAP tree does not contain anything that we might consider 'public' - which is why we require auth for all access.  That does not mean the data is not public, but we require access to be able to publish it. 

Cheers,
Tony

----------------------------------
Tony Stevenson

tony@pc-tony.com
pctony@apache.org

http://www.pc-tony.com

GPG - 1024D/51047D66
----------------------------------

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 12:10 PM, sebb <se...@gmail.com> wrote:

> On 28 May 2013 18:44, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Tue, May 28, 2013 at 1:31 PM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>> 
>>> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>>> 
>>>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>> 
>>>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>>> 
>>>>>> 
>>>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>>> 
>>>>>>> I need to know what are the official sources for the following:
>>>>>>> 
>>>>>>> full name from username
>>>>>> 
>>>>>> ldap
>>>>>> 
>>>>>>> whether a person is a corporate member from username
>>>>>> 
>>>>>>> What PMC/PPMCs the person belongs to from username
>>>>>> 
>>>>>> ldap, well this will show group membership.
>>>>>> 
>>>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>>> 
>>>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>>> 
>>>>> Inconvenient but not a "deal breaker"
>>>> 
>>>> There is already a cron job that extracts a fair amount of information
>>>> from LDAP (and elsewhere) to produce the people.a.o pages.
>>>> This information is currently published as HTML, for example:
>>>> 
>>>> http://people.apache.org/committer-index.html
>>>> which contains the following information:
>>>> SVN id        Name    SVN Projects
>>>> Member (bold)
>>>> URL (link)
>>>> 
>>>> Maybe it would make more sense to modify that to additionally generate
>>>> the output in CSV/JSON so it can be used by anyone?
>>>> 
>>>> Scripts would then just need HTTP access to get the data, and they
>>>> would only have access to the data that had been cleared for
>>>> publication.
>>> 
>>> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.
>> 
>> Try the following (passing in --user adc:password):
>> 
>> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
>> 
>> Let me know if there are ways that I can make this more convenient.
> 
> This requires an LDAP login, which means the code probably cannot be
> safely automated to run on a shared host, as the password would need
> to be stored somewhere.
> 
> Would it be possible to provide access to the public information
> without requiring a login?

How about returning just public information when credentials are not supplied and returning full information if credentials are supplied?


Regards,
Alan

Re: Canonical sources for information

Posted by sebb <se...@gmail.com>.

On 28 May 2013 18:44, Sam Ruby <ru...@intertwingly.net> wrote:
> On Tue, May 28, 2013 at 1:31 PM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>
>> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>>
>>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>
>>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>>
>>>>>
>>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>>
>>>>>> I need to know what are the official sources for the following:
>>>>>>
>>>>>> full name from username
>>>>>
>>>>> ldap
>>>>>
>>>>>> whether a person is a corporate member from username
>>>>>
>>>>>> What PMC/PPMCs the person belongs to from username
>>>>>
>>>>> ldap, well this will show group membership.
>>>>>
>>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>>
>>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>>
>>>> Inconvenient but not a "deal breaker"
>>>
>>> There is already a cron job that extracts a fair amount of information
>>> from LDAP (and elsewhere) to produce the people.a.o pages.
>>> This information is currently published as HTML, for example:
>>>
>>> http://people.apache.org/committer-index.html
>>> which contains the following information:
>>> SVN id        Name    SVN Projects
>>> Member (bold)
>>> URL (link)
>>>
>>> Maybe it would make more sense to modify that to additionally generate
>>> the output in CSV/JSON so it can be used by anyone?
>>>
>>> Scripts would then just need HTTP access to get the data, and they
>>> would only have access to the data that had been cleared for
>>> publication.
>>
>> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.
>
> Try the following (passing in --user adc:password):
>
> curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc
>
> Let me know if there are ways that I can make this more convenient.

This requires an LDAP login, which means the code probably cannot be
safely automated to run on a shared host, as the password would need
to be stored somewhere.

Would it be possible to provide access to the public information
without requiring a login?

>> Regards,
>> Alan
>
> - Sam Ruby

Re: Canonical sources for information

Posted by Sam Ruby <ru...@intertwingly.net>.

On Tue, May 28, 2013 at 1:31 PM, Alan Cabrera <ad...@toolazydogs.com> wrote:
>
> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>
>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>
>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>
>>>>
>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>
>>>>> I need to know what are the official sources for the following:
>>>>>
>>>>> full name from username
>>>>
>>>> ldap
>>>>
>>>>> whether a person is a corporate member from username
>>>>
>>>>> What PMC/PPMCs the person belongs to from username
>>>>
>>>> ldap, well this will show group membership.
>>>>
>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>
>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>
>>> Inconvenient but not a "deal breaker"
>>
>> There is already a cron job that extracts a fair amount of information
>> from LDAP (and elsewhere) to produce the people.a.o pages.
>> This information is currently published as HTML, for example:
>>
>> http://people.apache.org/committer-index.html
>> which contains the following information:
>> SVN id        Name    SVN Projects
>> Member (bold)
>> URL (link)
>>
>> Maybe it would make more sense to modify that to additionally generate
>> the output in CSV/JSON so it can be used by anyone?
>>
>> Scripts would then just need HTTP access to get the data, and they
>> would only have access to the data that had been cleared for
>> publication.
>
> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.

Try the following (passing in --user adc:password):

curl -H "Accept:application/json" https://whimsy.apache.org/roster/committer/adc

Let me know if there are ways that I can make this more convenient.

> Regards,
> Alan

- Sam Ruby

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 10:34 AM, sebb <se...@gmail.com> wrote:

> On 28 May 2013 18:31, Alan Cabrera <ad...@toolazydogs.com> wrote:
>> 
>> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>> 
>>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>> 
>>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>> 
>>>>>> I need to know what are the official sources for the following:
>>>>>> 
>>>>>> full name from username
>>>>> 
>>>>> ldap
>>>>> 
>>>>>> whether a person is a corporate member from username
>>>>> 
>>>>>> What PMC/PPMCs the person belongs to from username
>>>>> 
>>>>> ldap, well this will show group membership.
>>>>> 
>>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>> 
>>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>> 
>>>> Inconvenient but not a "deal breaker"
>>> 
>>> There is already a cron job that extracts a fair amount of information
>>> from LDAP (and elsewhere) to produce the people.a.o pages.
>>> This information is currently published as HTML, for example:
>>> 
>>> http://people.apache.org/committer-index.html
>>> which contains the following information:
>>> SVN id        Name    SVN Projects
>>> Member (bold)
>>> URL (link)
>>> 
>>> Maybe it would make more sense to modify that to additionally generate
>>> the output in CSV/JSON so it can be used by anyone?
>>> 
>>> Scripts would then just need HTTP access to get the data, and they
>>> would only have access to the data that had been cleared for
>>> publication.
>> 
>> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.
> 
> I'm suggesting further that the data should only need to be vetted once.

What do you mean by "data should only need to be vetted once"?


Regards,
Alan

Re: Canonical sources for information

Posted by sebb <se...@gmail.com>.

On 28 May 2013 18:31, Alan Cabrera <ad...@toolazydogs.com> wrote:
>
> On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:
>
>> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>
>>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>>>
>>>>
>>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>>>
>>>>> I need to know what are the official sources for the following:
>>>>>
>>>>> full name from username
>>>>
>>>> ldap
>>>>
>>>>> whether a person is a corporate member from username
>>>>
>>>>> What PMC/PPMCs the person belongs to from username
>>>>
>>>> ldap, well this will show group membership.
>>>>
>>>>> I need to get access to this data from the Python utilities that I am writing.
>>>>
>>>> ldap is only available from within the ASF network, so you would have to run this from there.
>>>
>>> Inconvenient but not a "deal breaker"
>>
>> There is already a cron job that extracts a fair amount of information
>> from LDAP (and elsewhere) to produce the people.a.o pages.
>> This information is currently published as HTML, for example:
>>
>> http://people.apache.org/committer-index.html
>> which contains the following information:
>> SVN id        Name    SVN Projects
>> Member (bold)
>> URL (link)
>>
>> Maybe it would make more sense to modify that to additionally generate
>> the output in CSV/JSON so it can be used by anyone?
>>
>> Scripts would then just need HTTP access to get the data, and they
>> would only have access to the data that had been cleared for
>> publication.
>
> That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.

I'm suggesting further that the data should only need to be vetted once.

>
> Regards,
> Alan
>
>>
>>
>>> With that said, I think that it would be a good thing to have a simple R/O rest API that allows tooling to get information on a user.  For example:
>>>
>>> GET /api/v1/committers/acabrera
>>>
>>> would return
>>>
>>> {
>>>  "username": "acabrera",
>>>  "fullName": "Alan Cabrera",
>>>  "projects": ["geronimo", "tomee", "aries" , "incubator"],
>>>  "pmcs": ["geronimo", "tomee", "incubator"],
>>>  "member": true,
>>> }
>>>
>>> GET /api/v1/committers
>>> {
>>>  "committers" : ["a_horuzhenko", "aadamchik", … ]
>>> }
>>>
>

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 28, 2013, at 10:24 AM, sebb <se...@gmail.com> wrote:

> On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>> 
>> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>> 
>>> 
>>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>> 
>>>> I need to know what are the official sources for the following:
>>>> 
>>>> full name from username
>>> 
>>> ldap
>>> 
>>>> whether a person is a corporate member from username
>>> 
>>>> What PMC/PPMCs the person belongs to from username
>>> 
>>> ldap, well this will show group membership.
>>> 
>>>> I need to get access to this data from the Python utilities that I am writing.
>>> 
>>> ldap is only available from within the ASF network, so you would have to run this from there.
>> 
>> Inconvenient but not a "deal breaker"
> 
> There is already a cron job that extracts a fair amount of information
> from LDAP (and elsewhere) to produce the people.a.o pages.
> This information is currently published as HTML, for example:
> 
> http://people.apache.org/committer-index.html
> which contains the following information:
> SVN id	Name	SVN Projects
> Member (bold)
> URL (link)
> 
> Maybe it would make more sense to modify that to additionally generate
> the output in CSV/JSON so it can be used by anyone?
> 
> Scripts would then just need HTTP access to get the data, and they
> would only have access to the data that had been cleared for
> publication.

That's one way to implement it.  I'm glad we agree that providing HTTP access to vetted data has value.


Regards,
Alan
 
> 
> 
>> With that said, I think that it would be a good thing to have a simple R/O rest API that allows tooling to get information on a user.  For example:
>> 
>> GET /api/v1/committers/acabrera
>> 
>> would return
>> 
>> {
>>  "username": "acabrera",
>>  "fullName": "Alan Cabrera",
>>  "projects": ["geronimo", "tomee", "aries" , "incubator"],
>>  "pmcs": ["geronimo", "tomee", "incubator"],
>>  "member": true,
>> }
>> 
>> GET /api/v1/committers
>> {
>>  "committers" : ["a_horuzhenko", "aadamchik", … ]
>> }
>>

Re: Canonical sources for information

Posted by sebb <se...@gmail.com>.

On 27 May 2013 22:18, Alan Cabrera <ad...@toolazydogs.com> wrote:
>
> On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:
>
>>
>> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
>>
>>> I need to know what are the official sources for the following:
>>>
>>> full name from username
>>
>> ldap
>>
>>> whether a person is a corporate member from username
>>
>>> What PMC/PPMCs the person belongs to from username
>>
>> ldap, well this will show group membership.
>>
>>> I need to get access to this data from the Python utilities that I am writing.
>>
>> ldap is only available from within the ASF network, so you would have to run this from there.
>
> Inconvenient but not a "deal breaker"

There is already a cron job that extracts a fair amount of information
from LDAP (and elsewhere) to produce the people.a.o pages.
This information is currently published as HTML, for example:

http://people.apache.org/committer-index.html
which contains the following information:
SVN id	Name	SVN Projects
Member (bold)
URL (link)

Maybe it would make more sense to modify that to additionally generate
the output in CSV/JSON so it can be used by anyone?

Scripts would then just need HTTP access to get the data, and they
would only have access to the data that had been cleared for
publication.

>> Can I please confirm to what end you want to use this data? Where you will store it, and how you intend to store it securely (privacy etc)?
>
> Sure, I'm writing tools that will help with things such as mailing list monitoring, mentor podling tracking, etc.  The tooling will be tracking users by their apache usernames.  Sometimes I need to get their names from those usernames.
>
> The data will never be stored other than for webpages that get generated.
>
> For example, the podlings.xml file that the Incubator uses contains full names for the mentors and champions. The file really should have apache usernames since these are really foreign indexes into commiters/members.  If a full name gets spelled differently, e.g. adding a middle initial, the referential integrity gets broken.
>
> With that said, I think that it would be a good thing to have a simple R/O rest API that allows tooling to get information on a user.  For example:
>
> GET /api/v1/committers/acabrera
>
> would return
>
> {
>   "username": "acabrera",
>   "fullName": "Alan Cabrera",
>   "projects": ["geronimo", "tomee", "aries" , "incubator"],
>   "pmcs": ["geronimo", "tomee", "incubator"],
>   "member": true,
> }
>
> GET /api/v1/committers
> {
>   "committers" : ["a_horuzhenko", "aadamchik", … ]
> }
>
>
>
> Regards,
> Alan
>

Re: Canonical sources for information

Posted by Alan Cabrera <ad...@toolazydogs.com>.

On May 27, 2013, at 1:11 PM, Tony Stevenson <pc...@apache.org> wrote:

> 
> On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:
> 
>> I need to know what are the official sources for the following:
>> 
>> full name from username
> 
> ldap
> 
>> whether a person is a corporate member from username
> 
>> What PMC/PPMCs the person belongs to from username
> 
> ldap, well this will show group membership. 
> 
>> I need to get access to this data from the Python utilities that I am writing.
> 
> ldap is only available from within the ASF network, so you would have to run this from there.  

Inconvenient but not a "deal breaker"

> Can I please confirm to what end you want to use this data? Where you will store it, and how you intend to store it securely (privacy etc)? 

Sure, I'm writing tools that will help with things such as mailing list monitoring, mentor podling tracking, etc.  The tooling will be tracking users by their apache usernames.  Sometimes I need to get their names from those usernames.

The data will never be stored other than for webpages that get generated.

For example, the podlings.xml file that the Incubator uses contains full names for the mentors and champions. The file really should have apache usernames since these are really foreign indexes into commiters/members.  If a full name gets spelled differently, e.g. adding a middle initial, the referential integrity gets broken.

With that said, I think that it would be a good thing to have a simple R/O rest API that allows tooling to get information on a user.  For example:

GET /api/v1/committers/acabrera

would return

{
  "username": "acabrera",
  "fullName": "Alan Cabrera",
  "projects": ["geronimo", "tomee", "aries" , "incubator"],
  "pmcs": ["geronimo", "tomee", "incubator"],
  "member": true,
}

GET /api/v1/committers
{
  "committers" : ["a_horuzhenko", "aadamchik", … ]
}

Regards,
Alan

Re: Canonical sources for information

Posted by Tony Stevenson <pc...@apache.org>.

On 27 May 2013, at 18:15, Alan Cabrera <ad...@toolazydogs.com> wrote:

> I need to know what are the official sources for the following:
> 
> full name from username

ldap

> whether a person is a corporate member from username

> What PMC/PPMCs the person belongs to from username

ldap, well this will show group membership. 

> I need to get access to this data from the Python utilities that I am writing.

ldap is only available from within the ASF network, so you would have to run this from there.  

Can I please confirm to what end you want to use this data? Where you will store it, and how you intend to store it securely (privacy etc)? 

> 
> Regards,
> Alan
> 

Cheers,
Tony

----------------------------------
Tony Stevenson

tony@pc-tony.com
pctony@apache.org

http://www.pc-tony.com

GPG - 1024D/51047D66
----------------------------------