You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by Rich Bowen <rb...@rcbowen.com> on 2020/03/27 13:43:52 UTC

Data inconsistency in projects.apache.org

I'm trying to understand the twisty maze of data sources that fuel 
projects.apache.org and either I'm confused, or there's some 
inconsistency in how this all fits together.

I'll start with just one data source for now, so that I don't muddle 
multiple things together.

https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/committees.xml 


This file has a list of rdf files which are supposed to be in the 
committees/ subdirectory. The file itself says:

    This list should agree with the files in the directory committees/

However, in addition to the entries that look like:

   <location>committees/any23.rdf</location>

there are also lines that look like:

   <location>http://flex.apache.org/pmc_Flex.rdf</location>

(4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)

Is that correct? Or is that not how the data is supposed to be stored?

Meanwhile, committees.xml contains 209 projects:

grep location committees.xml| grep -vc Retired
209

while the committees/ directory contains just 206 rdf files:

ls committees/*.rdf| wc -l
206
(Note, one of those files is _template.rdf, so it's really 205, and 205 
+ 4 = 209, so at least everything else matches up.)



-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.

On 3/27/20 2:45 PM, Hervé BOUTEMY wrote:
> please start by reading the human-oriented explanation:
> https://projects.apache.org/about.html
> 
> this should ease the deep dive into data behind the recurring "Committees vs
> Projects" discussion

Thanks. That is indeed where I started. I think where I get hung up (and 
where I got hung up last time) was on the way that the doap files are 
scattered hither and yon across the universe. :)

--Rich



> Le vendredi 27 mars 2020, 14:43:52 CET Rich Bowen a écrit :
>> I'm trying to understand the twisty maze of data sources that fuel
>> projects.apache.org and either I'm confused, or there's some
>> inconsistency in how this all fits together.
>>
>> I'll start with just one data source for now, so that I don't muddle
>> multiple things together.
>>
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/commi
>> ttees.xml
>>
>>
>> This file has a list of rdf files which are supposed to be in the
>> committees/ subdirectory. The file itself says:
>>
>>      This list should agree with the files in the directory committees/
>>
>> However, in addition to the entries that look like:
>>
>>     <location>committees/any23.rdf</location>
>>
>> there are also lines that look like:
>>
>>     <location>http://flex.apache.org/pmc_Flex.rdf</location>
>>
>> (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)
>>
>> Is that correct? Or is that not how the data is supposed to be stored?
>>
>> Meanwhile, committees.xml contains 209 projects:
>>
>> grep location committees.xml| grep -vc Retired
>> 209
>>
>> while the committees/ directory contains just 206 rdf files:
>>
>> ls committees/*.rdf| wc -l
>> 206
>> (Note, one of those files is _template.rdf, so it's really 205, and 205
>> + 4 = 209, so at least everything else matches up.)
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 

-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
please start by reading the human-oriented explanation:
https://projects.apache.org/about.html

this should ease the deep dive into data behind the recurring "Committees vs 
Projects" discussion

Regards,

Hervé

Le vendredi 27 mars 2020, 14:43:52 CET Rich Bowen a écrit :
> I'm trying to understand the twisty maze of data sources that fuel
> projects.apache.org and either I'm confused, or there's some
> inconsistency in how this all fits together.
> 
> I'll start with just one data source for now, so that I don't muddle
> multiple things together.
> 
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/commi
> ttees.xml
> 
> 
> This file has a list of rdf files which are supposed to be in the
> committees/ subdirectory. The file itself says:
> 
>     This list should agree with the files in the directory committees/
> 
> However, in addition to the entries that look like:
> 
>    <location>committees/any23.rdf</location>
> 
> there are also lines that look like:
> 
>    <location>http://flex.apache.org/pmc_Flex.rdf</location>
> 
> (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)
> 
> Is that correct? Or is that not how the data is supposed to be stored?
> 
> Meanwhile, committees.xml contains 209 projects:
> 
> grep location committees.xml| grep -vc Retired
> 209
> 
> while the committees/ directory contains just 206 rdf files:
> 
> ls committees/*.rdf| wc -l
> 206
> (Note, one of those files is _template.rdf, so it's really 205, and 205
> + 4 = 209, so at least everything else matches up.)





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.
On Fri, Mar 27, 2020, 16:16 sebb <se...@gmail.com> wrote:

>
> > So, while to me that seems like an obvious and enormous improvement, my
> > understanding is that this was proposed before and someone (I understood
> > it was you?) vetoed the change. So I'm a teensy bit confused.
>
> Not me.
> I have always been in favour of centralising the files.
>

Awesome. I'm glad I misunderstood :)

>

Re: Data inconsistency in projects.apache.org

Posted by sebb <se...@gmail.com>.
On Fri, 27 Mar 2020 at 18:04, Rich Bowen <rb...@rcbowen.com> wrote:
>
>
>
> On 3/27/20 1:13 PM, sebb wrote:
> > On Fri, 27 Mar 2020 at 13:44, Rich Bowen <rb...@rcbowen.com> wrote:
> >> there are also lines that look like:
> >>
> >>     <location>http://flex.apache.org/pmc_Flex.rdf</location>
> >>
> >> (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)
> >>
> >> Is that correct? Or is that not how the data is supposed to be stored?
> >
> > Most PMC RDF files are stored locally, but the app does allow for
> > projects to store the files elsewhere.
>
> Awesome. So it's just an "error" in the comment in the file, not in the
> way things are done. Thanks. That helps.
>
> > If any changes are made, I strongly recommend centralising the data files.
> > DOAP files maintained in project data areas often get moved, and the
> > project forgets to update the entry in projects.xml
> > Also, sometimes edits to DOAP files have syntax errors.
> > My experience is that it can be very hard work getting projects to fix
> > errors, whereas if DOAPs were centrally located, anyone could fix
> > errors.
>
> So, while to me that seems like an obvious and enormous improvement, my
> understanding is that this was proposed before and someone (I understood
> it was you?) vetoed the change. So I'm a teensy bit confused.

Not me.
I have always been in favour of centralising the files.

> --
> Rich Bowen - rbowen@rcbowen.com
> http://rcbowen.com/
> @rbowen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 27 mars 2020, 21:19:56 CET sebb a écrit :
> > > That way, over time, we'd eventually have all of those files in one
> > > place, making them easier to find and update.
> > 
> > find = 2 files (1 for committees, 1 for projects)
> 
> May be more than one for projects.
> e.g. Commons.

I was not clear, here are the 2 files:
https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/committees.xml
https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/projects.xml

> > On letting PMC RDF files go outside the centralised approach, I'd be
> > curious to check if the 4 PMCs that chose to host outside of projects.a.o
> > did that to fill more data, or if they just felt that they'd host this
> > file the same way they did with project DOAP file.
> I suggest dropping them entirely.
yes, I suppose the PMC RDF content could be fully extracted from other sources: the only hard part is the charter, that you seem to have already extracted automatically

dropping PMC RDF files would simplify the discussion, since only project DOAP files would remain

big +1: a good step in the right direction of simplification to enable us to focus on the hard part = project DOAP files

Regards,

Hervé



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
yes, I'm convinced some data can be extracted automatically

but I also know that some data can't
for example:
- for committees with multiple projects, like https://projects.apache.org/projects.html?committee#commons
- for projects still using svn
- the definition of languages and categories and the association of projects to these:
  https://projects.apache.org/projects.html?category
  https://projects.apache.org/projects.html?language

in the past (the previous projects.apache.org), some tooling was provided to PMCs to easily generate a first project description without reading too much.

I now see Sebb (I suppose) did some hard work to integrate this tool:
https://projects.apache.org/create.html

It could probably be improved to provide pre-filled form or some form of autocomplete
and of course, letting committees know thereis some tooling to help them maintain their data could help

Regards,

Hervé

Le samedi 28 mars 2020, 01:20:05 CET Dave Fisher a écrit :
> See http://incubator.apache.org/clutch/tuweni
> 
> The repositories are actual and are updated from
> gitbox.apache.org/repositories.json
> 
> The releases are from dist.apache.org/repos/dist/release and are exactly
> what is available.
> 
> Other items on the page are either from a podling status file or other bits
> of information including the podlings.xml.
> 
> This information can be a service to the projects and the foundation.
> 
> Sent from my iPhone
> 
> > On Mar 27, 2020, at 2:04 PM, Hervé BOUTEMY <he...@free.fr> wrote:
> > 
> > there are many more parts, see some examples of human-readable output:
> > https://projects.apache.org/project.html?accumulo
> > https://projects.apache.org/project.html?calcite
> > 
> > Le vendredi 27 mars 2020, 21:44:56 CET Dave Fisher a écrit :
> >> metadata for project releases is discoverable from the dist in svn. It is
> >> already done for podlings in the Incubator in the clutch analysis.
> >> 
> >> It is python. I can provide some help late next week.
> >> 
> >> Sent from my iPhone
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Dave Fisher <wa...@comcast.net>.
See http://incubator.apache.org/clutch/tuweni

The repositories are actual and are updated from gitbox.apache.org/repositories.json

The releases are from dist.apache.org/repos/dist/release and are exactly what is available.

Other items on the page are either from a podling status file or other bits of information including the podlings.xml.

This information can be a service to the projects and the foundation.

Sent from my iPhone

> On Mar 27, 2020, at 2:04 PM, Hervé BOUTEMY <he...@free.fr> wrote:
> 
> there are many more parts, see some examples of human-readable output:
> https://projects.apache.org/project.html?accumulo
> https://projects.apache.org/project.html?calcite
> 
> Le vendredi 27 mars 2020, 21:44:56 CET Dave Fisher a écrit :
>> metadata for project releases is discoverable from the dist in svn. It is
>> already done for podlings in the Incubator in the clutch analysis.
>> 
>> It is python. I can provide some help late next week.
>> 
>> Sent from my iPhone
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 

Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
there are many more parts, see some examples of human-readable output:
https://projects.apache.org/project.html?accumulo
https://projects.apache.org/project.html?calcite

Le vendredi 27 mars 2020, 21:44:56 CET Dave Fisher a écrit :
> metadata for project releases is discoverable from the dist in svn. It is
> already done for podlings in the Incubator in the clutch analysis.
> 
> It is python. I can provide some help late next week.
> 
> Sent from my iPhone




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Dave Fisher <wa...@comcast.net>.
metadata for project releases is discoverable from the dist in svn. It is already done for podlings in the Incubator in the clutch analysis.

It is python. I can provide some help late next week.

Sent from my iPhone

> On Mar 27, 2020, at 1:20 PM, sebb <se...@gmail.com> wrote:
> 
> On Fri, 27 Mar 2020 at 20:01, Hervé BOUTEMY <he...@free.fr> wrote:
>> 
>> Le vendredi 27 mars 2020 20:29:14 CET, vous avez écrit :
>>>> On 3/27/20 3:07 PM, Hervé BOUTEMY wrote:
>>>>> It's good to see some interest back on DOAP files content ad organisation,
>>>>> now that the projects.apache.org rendering makes them really useful: a
>>>>> few years ago, trying to open any discussion on that was deemed to
>>>>> failure. But any change is hard, since every PMC will have to be
>>>>> involved.
>>> 
>>> What if we - and I'm perfectly prepared to be told "You can't do that
>>> because ..." - fetched remote (ie, project-hosted) doap files, a few at
>>> a time, and move them to the central repo, and as we do that, we go talk
>>> to projects individually, telling them that we're doing it, and why, and
>>> what the new process is for updating. Yes, I'm volunteering to do that
>>> outreach.
>> you can, but I don't see the benefit of this hard work
>> 
>>> 
>>> That way, over time, we'd eventually have all of those files in one
>>> place, making them easier to find and update.
>> find = 2 files (1 for committees, 1 for projects)
> 
> May be more than one for projects.
> e.g. Commons.
> 
>>> 
>>> I'm leaving the file format question for someone else entirely. I am far
>>> less concerned about that, than about ensuring that the files are easily
>>> found and updated.
>> my point about "PMC RDF files" vs "projects DOAP files" is not a question of format, but a question of amount of data and who would have real knowledge to update content:
>> - PMC RDF files are very light, rarely updated, and contain data that are really foundation-centric
>> - projects DOAP files contain a lot more data, can/should be often updated, with data that are really to be delegated to PMCs given they are more technical details on code
>> 
>> That's why I really think keeping centralised PMC RDF files and decentralised projects DOAP files is a good idea.
>> 
>> IMHO, centralising project DOAP files would be a hard task with low benefit, and even counter productive effect on having every PMC responsible for the content, that is technical.
>> 
>> On letting PMC RDF files go outside the centralised approach, I'd be curious to check if the 4 PMCs that chose to host outside of projects.a.o did that to fill more data, or if they just felt that they'd host this file the same way they did with project DOAP file.
> 
> I suggest dropping them entirely.
> 
>> Regards,
>> 
>> Hervé
>> 
>>> 
>>> --Rich
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by sebb <se...@gmail.com>.
On Fri, 27 Mar 2020 at 20:01, Hervé BOUTEMY <he...@free.fr> wrote:
>
> Le vendredi 27 mars 2020 20:29:14 CET, vous avez écrit :
> > On 3/27/20 3:07 PM, Hervé BOUTEMY wrote:
> > > It's good to see some interest back on DOAP files content ad organisation,
> > > now that the projects.apache.org rendering makes them really useful: a
> > > few years ago, trying to open any discussion on that was deemed to
> > > failure. But any change is hard, since every PMC will have to be
> > > involved.
> >
> > What if we - and I'm perfectly prepared to be told "You can't do that
> > because ..." - fetched remote (ie, project-hosted) doap files, a few at
> > a time, and move them to the central repo, and as we do that, we go talk
> > to projects individually, telling them that we're doing it, and why, and
> > what the new process is for updating. Yes, I'm volunteering to do that
> > outreach.
> you can, but I don't see the benefit of this hard work
>
> >
> > That way, over time, we'd eventually have all of those files in one
> > place, making them easier to find and update.
> find = 2 files (1 for committees, 1 for projects)

May be more than one for projects.
e.g. Commons.

> >
> > I'm leaving the file format question for someone else entirely. I am far
> > less concerned about that, than about ensuring that the files are easily
> > found and updated.
> my point about "PMC RDF files" vs "projects DOAP files" is not a question of format, but a question of amount of data and who would have real knowledge to update content:
> - PMC RDF files are very light, rarely updated, and contain data that are really foundation-centric
> - projects DOAP files contain a lot more data, can/should be often updated, with data that are really to be delegated to PMCs given they are more technical details on code
>
> That's why I really think keeping centralised PMC RDF files and decentralised projects DOAP files is a good idea.
>
> IMHO, centralising project DOAP files would be a hard task with low benefit, and even counter productive effect on having every PMC responsible for the content, that is technical.
>
> On letting PMC RDF files go outside the centralised approach, I'd be curious to check if the 4 PMCs that chose to host outside of projects.a.o did that to fill more data, or if they just felt that they'd host this file the same way they did with project DOAP file.

I suggest dropping them entirely.

> Regards,
>
> Hervé
>
> >
> > --Rich
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.

On 3/27/20 4:01 PM, Hervé BOUTEMY wrote:
> my point about "PMC RDF files" vs "projects DOAP files" is not a question of format, but a question of amount of data and who would have real knowledge to update content:
> - PMC RDF files are very light, rarely updated, and contain data that are really foundation-centric
> - projects DOAP files contain a lot more data, can/should be often updated, with data that are really to be delegated to PMCs given they are more technical details on code
> 
> That's why I really think keeping centralised PMC RDF files and decentralised projects DOAP files is a good idea.

Aha, I see. Thank you for clarifying that for me.


-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 27 mars 2020 20:29:14 CET, vous avez écrit :
> On 3/27/20 3:07 PM, Hervé BOUTEMY wrote:
> > It's good to see some interest back on DOAP files content ad organisation,
> > now that the projects.apache.org rendering makes them really useful: a
> > few years ago, trying to open any discussion on that was deemed to
> > failure. But any change is hard, since every PMC will have to be
> > involved.
> 
> What if we - and I'm perfectly prepared to be told "You can't do that
> because ..." - fetched remote (ie, project-hosted) doap files, a few at
> a time, and move them to the central repo, and as we do that, we go talk
> to projects individually, telling them that we're doing it, and why, and
> what the new process is for updating. Yes, I'm volunteering to do that
> outreach.
you can, but I don't see the benefit of this hard work

> 
> That way, over time, we'd eventually have all of those files in one
> place, making them easier to find and update.
find = 2 files (1 for committees, 1 for projects)

> 
> I'm leaving the file format question for someone else entirely. I am far
> less concerned about that, than about ensuring that the files are easily
> found and updated.
my point about "PMC RDF files" vs "projects DOAP files" is not a question of format, but a question of amount of data and who would have real knowledge to update content:
- PMC RDF files are very light, rarely updated, and contain data that are really foundation-centric
- projects DOAP files contain a lot more data, can/should be often updated, with data that are really to be delegated to PMCs given they are more technical details on code

That's why I really think keeping centralised PMC RDF files and decentralised projects DOAP files is a good idea.

IMHO, centralising project DOAP files would be a hard task with low benefit, and even counter productive effect on having every PMC responsible for the content, that is technical.

On letting PMC RDF files go outside the centralised approach, I'd be curious to check if the 4 PMCs that chose to host outside of projects.a.o did that to fill more data, or if they just felt that they'd host this file the same way they did with project DOAP file.

Regards,

Hervé

> 
> --Rich





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 27 mars 2020, 20:11:33 CET Rich Bowen a écrit :
> For context, I'm trying to address Sally's complaint that the data on
> projects.a.o is inconsistent, out of date, and wonky.
yes, I like this objective

> I am very willing
> to reach out to various projects about data updates (and am doing that
> already for other things - namely, the stuff on
> https://whimsy.apache.org/site/) but the "where is our data" question
> not having one consistent answer is a little frustrating.
this is where the mix between committee oriented data vs "technical" project 
data starts to hurt: projects DOAP files are technical details, that are to be 
delegated

you can see the difference by looking at Committees page https://
projects.apache.org/committees.html vs Projects page https://
projects.apache.org/projects.html that can be sorted by Category and 
Programming Language

> 
> I think, though, I now know where to go to look it up.





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.

On 3/27/20 3:07 PM, Hervé BOUTEMY wrote:
> It's good to see some interest back on DOAP files content ad organisation, now
> that the projects.apache.org rendering makes them really useful: a few years
> ago, trying to open any discussion on that was deemed to failure. But any
> change is hard, since every PMC will have to be involved.

What if we - and I'm perfectly prepared to be told "You can't do that 
because ..." - fetched remote (ie, project-hosted) doap files, a few at 
a time, and move them to the central repo, and as we do that, we go talk 
to projects individually, telling them that we're doing it, and why, and 
what the new process is for updating. Yes, I'm volunteering to do that 
outreach.

That way, over time, we'd eventually have all of those files in one 
place, making them easier to find and update.

I'm leaving the file format question for someone else entirely. I am far 
less concerned about that, than about ensuring that the files are easily 
found and updated.

--Rich

-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.

On 3/27/20 3:07 PM, Hervé BOUTEMY wrote:
> It's good to see some interest back on DOAP files content ad organisation, now
> that the projects.apache.org rendering makes them really useful: a few years
> ago, trying to open any discussion on that was deemed to failure. But any
> change is hard, since every PMC will have to be involved.

Yeah, I can totally appreciate that, as there is no convenient 
consistent way to get in touch with every project, and believe, with any 
degree of certainty, that they'll all actually see it.

I'll try to find the last discussion in the archives and see what the 
issues were.

For context, I'm trying to address Sally's complaint that the data on 
projects.a.o is inconsistent, out of date, and wonky. I am very willing 
to reach out to various projects about data updates (and am doing that 
already for other things - namely, the stuff on 
https://whimsy.apache.org/site/) but the "where is our data" question 
not having one consistent answer is a little frustrating.

I think, though, I now know where to go to look it up.

-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 27 mars 2020, 19:04:28 CET Rich Bowen a écrit :
> > If any changes are made, I strongly recommend centralising the data files.
> > DOAP files maintained in project data areas often get moved, and the
> > project forgets to update the entry in projects.xml
> > Also, sometimes edits to DOAP files have syntax errors.
> > My experience is that it can be very hard work getting projects to fix
> > errors, whereas if DOAPs were centrally located, anyone could fix
> > errors.
> 
> So, while to me that seems like an obvious and enormous improvement, my
> understanding is that this was proposed before and someone (I understood
> it was you?) vetoed the change. So I'm a teensy bit confused.

on PMC RDF files, it was fully centralised, and I see now that some PMCs chose 
to host externally
on projects DOAP files, it is de-centralised for a long time
(notice: "PMC RDF files" vs "projects DOAP files")

I don't remember precisely last discussions, but I am one who wants to keep 
DOAP files de-centralised: my rationale is that projects DOAP files contain a 
lot of data that can be updated often (like releases)

PMC RDF files require a lot less maintenance: keeping them centralised seemed 
sufficient for a long time

It's good to see some interest back on DOAP files content ad organisation, now 
that the projects.apache.org rendering makes them really useful: a few years 
ago, trying to open any discussion on that was deemed to failure. But any 
change is hard, since every PMC will have to be involved.

Regards,

Hervé



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by Rich Bowen <rb...@rcbowen.com>.

On 3/27/20 1:13 PM, sebb wrote:
> On Fri, 27 Mar 2020 at 13:44, Rich Bowen <rb...@rcbowen.com> wrote:
>> there are also lines that look like:
>>
>>     <location>http://flex.apache.org/pmc_Flex.rdf</location>
>>
>> (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)
>>
>> Is that correct? Or is that not how the data is supposed to be stored?
> 
> Most PMC RDF files are stored locally, but the app does allow for
> projects to store the files elsewhere.

Awesome. So it's just an "error" in the comment in the file, not in the 
way things are done. Thanks. That helps.

> If any changes are made, I strongly recommend centralising the data files.
> DOAP files maintained in project data areas often get moved, and the
> project forgets to update the entry in projects.xml
> Also, sometimes edits to DOAP files have syntax errors.
> My experience is that it can be very hard work getting projects to fix
> errors, whereas if DOAPs were centrally located, anyone could fix
> errors.

So, while to me that seems like an obvious and enormous improvement, my 
understanding is that this was proposed before and someone (I understood 
it was you?) vetoed the change. So I'm a teensy bit confused.

-- 
Rich Bowen - rbowen@rcbowen.com
http://rcbowen.com/
@rbowen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Data inconsistency in projects.apache.org

Posted by sebb <se...@gmail.com>.
On Fri, 27 Mar 2020 at 13:44, Rich Bowen <rb...@rcbowen.com> wrote:
>
> I'm trying to understand the twisty maze of data sources that fuel
> projects.apache.org and either I'm confused, or there's some
> inconsistency in how this all fits together.
>
> I'll start with just one data source for now, so that I don't muddle
> multiple things together.
>
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/committees.xml
>
>
> This file has a list of rdf files which are supposed to be in the
> committees/ subdirectory. The file itself says:
>
>     This list should agree with the files in the directory committees/
>
> However, in addition to the entries that look like:
>
>    <location>committees/any23.rdf</location>
>
> there are also lines that look like:
>
>    <location>http://flex.apache.org/pmc_Flex.rdf</location>
>
> (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez)
>
> Is that correct? Or is that not how the data is supposed to be stored?

Most PMC RDF files are stored locally, but the app does allow for
projects to store the files elsewhere.

> Meanwhile, committees.xml contains 209 projects:
>
> grep location committees.xml| grep -vc Retired
> 209
>
> while the committees/ directory contains just 206 rdf files:
>
> ls committees/*.rdf| wc -l
> 206
> (Note, one of those files is _template.rdf, so it's really 205, and 205
> + 4 = 209, so at least everything else matches up.)

Some PMCs have multiple projects.

There is a cron job that reports an error if there is a new PMC
without a corresponding RDF file.
I tend to create the the PMC RDF to silence the error.
However, it is up to the PMC to create the DOAP(s), so it's possible
that a PMC has no DOAPs.

I suspect that the PMC RDF files could be eliminated - I think the
only useful info they contain is the charter.
The charter is now also maintained in:
https://svn.apache.org/repos/private/committers/board/committee-info.yaml

If any changes are made, I strongly recommend centralising the data files.
DOAP files maintained in project data areas often get moved, and the
project forgets to update the entry in projects.xml
Also, sometimes edits to DOAP files have syntax errors.
My experience is that it can be very hard work getting projects to fix
errors, whereas if DOAPs were centrally located, anyone could fix
errors.

>
>
> --
> Rich Bowen - rbowen@rcbowen.com
> http://rcbowen.com/
> @rbowen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org