You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by Shane Curcuru <as...@shanecurcuru.org> on 2016/02/11 13:03:28 UTC

Adding asfext:registered to projects.a.o?

I need to annotate our structured data set of Apache projects to track
which project names are registered trademarks.  This is needed to be
able to properly generate a.o/foundation/marks/list (which is currently
sadly outdated since it's manually built now).  This is a serious need
for Brand Management, since we regularly have third parties say "but you
didn't SAY it was your trademark, so I can do it anyway..."

My thought is to annotate the PMC DOAP files with a registered marker,
then use the existing projects.a.o building of the organized data.  Then
use either JS or some cron static generation to display the actual
marks/list page.

Is annotating the project data sources the best idea, or should I simply
create a new stable URL data source that's just a list of registered
names, and join the tables?

The end result needs to be webcontent listing projects like:

<h2>The ASF claims these trademarks</h2>
...list all active TLPs
<a href="{$homepage}">Apache <b>{$projectname}</b></a>
{$if registered then "&reg;" else "&trade;"}

<br/>
  {$shortdesc}
...
<h2>The following projects are retired</h2>
...list all Attic projects

<h2>The following projects are in incubation; all trademarks here may be
property of respective owners</h2>
...list all Incubation projects

Separately, we should list the name of each software *product* here,
since if we offer something with a clear name as an independently
downloadable software product, it can be our trademark.  So I'd like to
list "Apache Directory Studio", since that's a notable name and a major
product.  But I don't want to list "Apache Commons Foo Bar Baz and
Kitchensink", since those are effectively just minor components that
aren't really worth claiming.

Comments/suggestions please?  I'm including the Whimsical project since
they are also major consumers of this data.

- Shane

Re: Adding asfext:registered to projects.a.o?

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Sam Ruby wrote on 2/11/16 12:28 PM:
> On Thu, Feb 11, 2016 at 11:35 AM, sebb <se...@gmail.com> wrote:
>> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>>> I need to annotate our structured data set of Apache projects to track
>>> which project names are registered trademarks.  This is needed to be
>>> able to properly generate a.o/foundation/marks/list ...

...

>> Therefore I think a separate file is needed.
>> That would also allow write access to be limited if necessary.
> 
> There are indeed multiple ways to solve this, and each way involves a tradeoff.
> 
> I would suggest separating this question into three parts.
> 
> - - -
> 
> First, where is the ultimate source for the data.  And the best way to
> address that question is to first decide who will be updating that
> data.  Will it be each project, or those on the branding mailing list,
> or only VP brand?  Knowing the answer to that question will make a big
> difference.
> 
> My suggestion would be to start simple with a single file, in the same
> directory as committee-info.txt.  I'd suggest YAML as a format as it
> is a good tradeoff between human edit-ability and programmatic
> parse-ability.

The raw data of which TLP names are registered can be public; it's
already findable in various national registries.  I may want to add an
additional enum "application-submitted", but even that can be public.

Theoretically just the brand committee should update the file, but in
reality we can restrict to members; I don't think they'll mess anything up.

The file won't change that often, but changes will be manual (i.e. when
we hear from counsel about applications).

> 
> - - -
> 
> Next is access.  What you need is something that takes the data from
> the private repository, sanitizes it, and publishes the result for
> public consumption.  Whimsy has a bunch of cron jobs that places
> similar data here: https://whimsy.apache.org/public/.  A script that
> parses a YAML file out of SVN, selects and filters out various parts,
> and publishes the results in JSON format is very doable.

It can go in a public repository if that makes it easier.  Of course,
this data isn't technically owned by any one project, so we need to find
a home for it, unless I should just dump it in the a.o site.

Is there any overall place for structured data about corporate
operations currently?
> 
> ---
> 
> Finally, there is publishing.  While that could be a cron job that
> produces static HTML, web browsers have the ability to consume JSON
> and format the results.  That's probably the best solution to this.

Thinking it through, we should fold this data into a number of places:

- The marks/list page, which needs to be regenerated each month after
the board meeting formally graduates or attics projects.  It likely has
low traffic to the page itself, but needs to be accurate, because
lawyers are the kind of people who will read it.

- projects.a.o, where it would be really nice to annotate project names
with the appropriate &reg; and &trade; symbols.  As this service becomes
more popular, having clear trademark indicators for our projects will
help ensure that third parties know (and can verify) that the ASF takes
it's trademarks seriously.

- www.a.o homepage, where whatever parts of the main site are generated
in any fashion include appropriate &reg; and &trade; symbols

I figure the first thing is to come up with schema and location of where
to put the source YAML/JSON file, then engineer the display into
marks/list or the main projects.a.o stuff.  Then see where to go from there.

> 
> ---
> 
> The Apache Phone book is an example of an application that uses the
> above design:
> 
> https://home.apache.org/phonebook.html
> 
> In fact, if the data is made available in this manner, the trademark
> information could be included directly in the results of the page it
> produces.  That's one of the nice things about having a public JSON
> version of the data published - multiple tools can consume that data.

Yeah, the more of these useful sites we have, it would be nice to fold
this in so it just gets automatically included.  It's especially
important for registered marks, because some countries require use of
the (R).

- Shane


Re: Adding asfext:registered to projects.a.o?

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Sam Ruby wrote on 2/11/16 12:28 PM:
> On Thu, Feb 11, 2016 at 11:35 AM, sebb <se...@gmail.com> wrote:
>> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>>> I need to annotate our structured data set of Apache projects to track
>>> which project names are registered trademarks.  This is needed to be
>>> able to properly generate a.o/foundation/marks/list ...

...

>> Therefore I think a separate file is needed.
>> That would also allow write access to be limited if necessary.
> 
> There are indeed multiple ways to solve this, and each way involves a tradeoff.
> 
> I would suggest separating this question into three parts.
> 
> - - -
> 
> First, where is the ultimate source for the data.  And the best way to
> address that question is to first decide who will be updating that
> data.  Will it be each project, or those on the branding mailing list,
> or only VP brand?  Knowing the answer to that question will make a big
> difference.
> 
> My suggestion would be to start simple with a single file, in the same
> directory as committee-info.txt.  I'd suggest YAML as a format as it
> is a good tradeoff between human edit-ability and programmatic
> parse-ability.

The raw data of which TLP names are registered can be public; it's
already findable in various national registries.  I may want to add an
additional enum "application-submitted", but even that can be public.

Theoretically just the brand committee should update the file, but in
reality we can restrict to members; I don't think they'll mess anything up.

The file won't change that often, but changes will be manual (i.e. when
we hear from counsel about applications).

> 
> - - -
> 
> Next is access.  What you need is something that takes the data from
> the private repository, sanitizes it, and publishes the result for
> public consumption.  Whimsy has a bunch of cron jobs that places
> similar data here: https://whimsy.apache.org/public/.  A script that
> parses a YAML file out of SVN, selects and filters out various parts,
> and publishes the results in JSON format is very doable.

It can go in a public repository if that makes it easier.  Of course,
this data isn't technically owned by any one project, so we need to find
a home for it, unless I should just dump it in the a.o site.

Is there any overall place for structured data about corporate
operations currently?
> 
> ---
> 
> Finally, there is publishing.  While that could be a cron job that
> produces static HTML, web browsers have the ability to consume JSON
> and format the results.  That's probably the best solution to this.

Thinking it through, we should fold this data into a number of places:

- The marks/list page, which needs to be regenerated each month after
the board meeting formally graduates or attics projects.  It likely has
low traffic to the page itself, but needs to be accurate, because
lawyers are the kind of people who will read it.

- projects.a.o, where it would be really nice to annotate project names
with the appropriate &reg; and &trade; symbols.  As this service becomes
more popular, having clear trademark indicators for our projects will
help ensure that third parties know (and can verify) that the ASF takes
it's trademarks seriously.

- www.a.o homepage, where whatever parts of the main site are generated
in any fashion include appropriate &reg; and &trade; symbols

I figure the first thing is to come up with schema and location of where
to put the source YAML/JSON file, then engineer the display into
marks/list or the main projects.a.o stuff.  Then see where to go from there.

> 
> ---
> 
> The Apache Phone book is an example of an application that uses the
> above design:
> 
> https://home.apache.org/phonebook.html
> 
> In fact, if the data is made available in this manner, the trademark
> information could be included directly in the results of the page it
> produces.  That's one of the nice things about having a public JSON
> version of the data published - multiple tools can consume that data.

Yeah, the more of these useful sites we have, it would be nice to fold
this in so it just gets automatically included.  It's especially
important for registered marks, because some countries require use of
the (R).

- Shane


Re: Adding asfext:registered to projects.a.o?

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 11:35 AM, sebb <se...@gmail.com> wrote:
> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>> I need to annotate our structured data set of Apache projects to track
>> which project names are registered trademarks.  This is needed to be
>> able to properly generate a.o/foundation/marks/list (which is currently
>> sadly outdated since it's manually built now).  This is a serious need
>> for Brand Management, since we regularly have third parties say "but you
>> didn't SAY it was your trademark, so I can do it anyway..."
>>
>> My thought is to annotate the PMC DOAP files with a registered marker,
>> then use the existing projects.a.o building of the organized data.  Then
>> use either JS or some cron static generation to display the actual
>> marks/list page.
>
> There are two kinds of RDF files:
> - the PMC RDF files [1] which are mainly stored in the comdev area
> [2], though they can also be stored elsewhere.
> The locations of the files are held in committees.xml [3]
> [These are not actually DOAP files, though the format looks similar.]
>
> - the project DOAP files which are stored by individual projects; they
> are listed in projects.xml [4]
>
> A single PMC RDF file can be associated with multiple DOAP files, e.g.
> Commons, Creadur, Tomcat all have multiple independent project
> releases.
>
>> Is annotating the project data sources the best idea, or should I simply
>> create a new stable URL data source that's just a list of registered
>> names, and join the tables?
>
> I doubt if either of the above file types are suitable.
> The location of the index XML files [3], [4] has already been changed
> once (when projects-new was established).
>
> DOAP files are located all over the place and are often moved within
> the SCM without updating the index file.
> If they are located in the source tree there are often multiple copies
> in different branches.
>
> PMC RDF files may not be updateable except by the project (if located
> in their SCM), and again may move without warning if they are not in
> [2].
>
> It would potentially be possible to recover the PMC RDF files from
> their external locations and insist that they only be stored in the
> comdev area.
> But a single PMC may have multiple marks. Potentially also a project
> may move from a PMC to become its own PMC.
>
> Therefore I think a separate file is needed.
> That would also allow write access to be limited if necessary.

There are indeed multiple ways to solve this, and each way involves a tradeoff.

I would suggest separating this question into three parts.

- - -

First, where is the ultimate source for the data.  And the best way to
address that question is to first decide who will be updating that
data.  Will it be each project, or those on the branding mailing list,
or only VP brand?  Knowing the answer to that question will make a big
difference.

My suggestion would be to start simple with a single file, in the same
directory as committee-info.txt.  I'd suggest YAML as a format as it
is a good tradeoff between human edit-ability and programmatic
parse-ability.

- - -

Next is access.  What you need is something that takes the data from
the private repository, sanitizes it, and publishes the result for
public consumption.  Whimsy has a bunch of cron jobs that places
similar data here: https://whimsy.apache.org/public/.  A script that
parses a YAML file out of SVN, selects and filters out various parts,
and publishes the results in JSON format is very doable.

---

Finally, there is publishing.  While that could be a cron job that
produces static HTML, web browsers have the ability to consume JSON
and format the results.  That's probably the best solution to this.

---

The Apache Phone book is an example of an application that uses the
above design:

https://home.apache.org/phonebook.html

In fact, if the data is made available in this manner, the trademark
information could be included directly in the results of the page it
produces.  That's one of the nice things about having a public JSON
version of the data published - multiple tools can consume that data.

- Sam Ruby

>> The end result needs to be webcontent listing projects like:
>>
>> <h2>The ASF claims these trademarks</h2>
>> ...list all active TLPs
>> <a href="{$homepage}">Apache <b>{$projectname}</b></a>
>> {$if registered then "&reg;" else "&trade;"}
>>
>> <br/>
>>   {$shortdesc}
>> ...
>> <h2>The following projects are retired</h2>
>> ...list all Attic projects
>>
>> <h2>The following projects are in incubation; all trademarks here may be
>> property of respective owners</h2>
>> ...list all Incubation projects
>>
>> Separately, we should list the name of each software *product* here,
>> since if we offer something with a clear name as an independently
>> downloadable software product, it can be our trademark.  So I'd like to
>> list "Apache Directory Studio", since that's a notable name and a major
>> product.  But I don't want to list "Apache Commons Foo Bar Baz and
>> Kitchensink", since those are effectively just minor components that
>> aren't really worth claiming.
>>
>> Comments/suggestions please?  I'm including the Whimsical project since
>> they are also major consumers of this data.
>>
>> - Shane
>
> [1] https://projects.apache.org/pmc_rdf.html
>
> [2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
> [3] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
> [4] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml

Re: Adding asfext:registered to projects.a.o?

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 2:38 PM, Stian Soiland-Reyes <st...@apache.org> wrote:
> How about something very modern - moving to JSON-LD schema.org annotations
> in the root index of the project homepage and just fetching all of those..?
>
> Seriously; keeping them under a single comdev control sounds most sensible
> as I doubt the distributed DOAP files are well maintained.  Projects can
> raise pull requests to update and then see their changes live on the new
> projects.apache.org pages

I agree with centralize first, and decentralize when the need shows itself.

As for format: let prototype.  Seriously.

If Shane can provide some initial test data in any format (e.g. CSV) I
can convert that to YAML and you can convert it to JSON-LD, and Shane
can determine which would be easier for him to maintain.  I'll also go
the extra step and write a small script that converts it to JSON
(note: POJO, not LD), and write an ugly page that fetches and displays
that data.  Others can do likewise.

Shane should be able to use these programs as examples and extend them
as he sees fit.

- Sam Ruby

> On 11 Feb 2016 17:35, "sebb" <se...@gmail.com> wrote:
>
>> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>> > I need to annotate our structured data set of Apache projects to track
>> > which project names are registered trademarks.  This is needed to be
>> > able to properly generate a.o/foundation/marks/list (which is currently
>> > sadly outdated since it's manually built now).  This is a serious need
>> > for Brand Management, since we regularly have third parties say "but you
>> > didn't SAY it was your trademark, so I can do it anyway..."
>> >
>> > My thought is to annotate the PMC DOAP files with a registered marker,
>> > then use the existing projects.a.o building of the organized data.  Then
>> > use either JS or some cron static generation to display the actual
>> > marks/list page.
>>
>> There are two kinds of RDF files:
>> - the PMC RDF files [1] which are mainly stored in the comdev area
>> [2], though they can also be stored elsewhere.
>> The locations of the files are held in committees.xml [3]
>> [These are not actually DOAP files, though the format looks similar.]
>>
>> - the project DOAP files which are stored by individual projects; they
>> are listed in projects.xml [4]
>>
>> A single PMC RDF file can be associated with multiple DOAP files, e.g.
>> Commons, Creadur, Tomcat all have multiple independent project
>> releases.
>>
>> > Is annotating the project data sources the best idea, or should I simply
>> > create a new stable URL data source that's just a list of registered
>> > names, and join the tables?
>>
>> I doubt if either of the above file types are suitable.
>> The location of the index XML files [3], [4] has already been changed
>> once (when projects-new was established).
>>
>> DOAP files are located all over the place and are often moved within
>> the SCM without updating the index file.
>> If they are located in the source tree there are often multiple copies
>> in different branches.
>>
>> PMC RDF files may not be updateable except by the project (if located
>> in their SCM), and again may move without warning if they are not in
>> [2].
>>
>> It would potentially be possible to recover the PMC RDF files from
>> their external locations and insist that they only be stored in the
>> comdev area.
>> But a single PMC may have multiple marks. Potentially also a project
>> may move from a PMC to become its own PMC.
>>
>> Therefore I think a separate file is needed.
>> That would also allow write access to be limited if necessary.
>>
>> > The end result needs to be webcontent listing projects like:
>> >
>> > <h2>The ASF claims these trademarks</h2>
>> > ...list all active TLPs
>> > <a href="{$homepage}">Apache <b>{$projectname}</b></a>
>> > {$if registered then "&reg;" else "&trade;"}
>> >
>> > <br/>
>> >   {$shortdesc}
>> > ...
>> > <h2>The following projects are retired</h2>
>> > ...list all Attic projects
>> >
>> > <h2>The following projects are in incubation; all trademarks here may be
>> > property of respective owners</h2>
>> > ...list all Incubation projects
>> >
>> > Separately, we should list the name of each software *product* here,
>> > since if we offer something with a clear name as an independently
>> > downloadable software product, it can be our trademark.  So I'd like to
>> > list "Apache Directory Studio", since that's a notable name and a major
>> > product.  But I don't want to list "Apache Commons Foo Bar Baz and
>> > Kitchensink", since those are effectively just minor components that
>> > aren't really worth claiming.
>> >
>> > Comments/suggestions please?  I'm including the Whimsical project since
>> > they are also major consumers of this data.
>> >
>> > - Shane
>>
>> [1] https://projects.apache.org/pmc_rdf.html
>>
>> [2]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
>> [3]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
>> [4]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml
>>

Re: Adding asfext:registered to projects.a.o?

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 2:38 PM, Stian Soiland-Reyes <st...@apache.org> wrote:
> How about something very modern - moving to JSON-LD schema.org annotations
> in the root index of the project homepage and just fetching all of those..?
>
> Seriously; keeping them under a single comdev control sounds most sensible
> as I doubt the distributed DOAP files are well maintained.  Projects can
> raise pull requests to update and then see their changes live on the new
> projects.apache.org pages

I agree with centralize first, and decentralize when the need shows itself.

As for format: let prototype.  Seriously.

If Shane can provide some initial test data in any format (e.g. CSV) I
can convert that to YAML and you can convert it to JSON-LD, and Shane
can determine which would be easier for him to maintain.  I'll also go
the extra step and write a small script that converts it to JSON
(note: POJO, not LD), and write an ugly page that fetches and displays
that data.  Others can do likewise.

Shane should be able to use these programs as examples and extend them
as he sees fit.

- Sam Ruby

> On 11 Feb 2016 17:35, "sebb" <se...@gmail.com> wrote:
>
>> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>> > I need to annotate our structured data set of Apache projects to track
>> > which project names are registered trademarks.  This is needed to be
>> > able to properly generate a.o/foundation/marks/list (which is currently
>> > sadly outdated since it's manually built now).  This is a serious need
>> > for Brand Management, since we regularly have third parties say "but you
>> > didn't SAY it was your trademark, so I can do it anyway..."
>> >
>> > My thought is to annotate the PMC DOAP files with a registered marker,
>> > then use the existing projects.a.o building of the organized data.  Then
>> > use either JS or some cron static generation to display the actual
>> > marks/list page.
>>
>> There are two kinds of RDF files:
>> - the PMC RDF files [1] which are mainly stored in the comdev area
>> [2], though they can also be stored elsewhere.
>> The locations of the files are held in committees.xml [3]
>> [These are not actually DOAP files, though the format looks similar.]
>>
>> - the project DOAP files which are stored by individual projects; they
>> are listed in projects.xml [4]
>>
>> A single PMC RDF file can be associated with multiple DOAP files, e.g.
>> Commons, Creadur, Tomcat all have multiple independent project
>> releases.
>>
>> > Is annotating the project data sources the best idea, or should I simply
>> > create a new stable URL data source that's just a list of registered
>> > names, and join the tables?
>>
>> I doubt if either of the above file types are suitable.
>> The location of the index XML files [3], [4] has already been changed
>> once (when projects-new was established).
>>
>> DOAP files are located all over the place and are often moved within
>> the SCM without updating the index file.
>> If they are located in the source tree there are often multiple copies
>> in different branches.
>>
>> PMC RDF files may not be updateable except by the project (if located
>> in their SCM), and again may move without warning if they are not in
>> [2].
>>
>> It would potentially be possible to recover the PMC RDF files from
>> their external locations and insist that they only be stored in the
>> comdev area.
>> But a single PMC may have multiple marks. Potentially also a project
>> may move from a PMC to become its own PMC.
>>
>> Therefore I think a separate file is needed.
>> That would also allow write access to be limited if necessary.
>>
>> > The end result needs to be webcontent listing projects like:
>> >
>> > <h2>The ASF claims these trademarks</h2>
>> > ...list all active TLPs
>> > <a href="{$homepage}">Apache <b>{$projectname}</b></a>
>> > {$if registered then "&reg;" else "&trade;"}
>> >
>> > <br/>
>> >   {$shortdesc}
>> > ...
>> > <h2>The following projects are retired</h2>
>> > ...list all Attic projects
>> >
>> > <h2>The following projects are in incubation; all trademarks here may be
>> > property of respective owners</h2>
>> > ...list all Incubation projects
>> >
>> > Separately, we should list the name of each software *product* here,
>> > since if we offer something with a clear name as an independently
>> > downloadable software product, it can be our trademark.  So I'd like to
>> > list "Apache Directory Studio", since that's a notable name and a major
>> > product.  But I don't want to list "Apache Commons Foo Bar Baz and
>> > Kitchensink", since those are effectively just minor components that
>> > aren't really worth claiming.
>> >
>> > Comments/suggestions please?  I'm including the Whimsical project since
>> > they are also major consumers of this data.
>> >
>> > - Shane
>>
>> [1] https://projects.apache.org/pmc_rdf.html
>>
>> [2]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
>> [3]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
>> [4]
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml
>>

Re: Adding asfext:registered to projects.a.o?

Posted by Stian Soiland-Reyes <st...@apache.org>.
How about something very modern - moving to JSON-LD schema.org annotations
in the root index of the project homepage and just fetching all of those..?

Seriously; keeping them under a single comdev control sounds most sensible
as I doubt the distributed DOAP files are well maintained.  Projects can
raise pull requests to update and then see their changes live on the new
projects.apache.org pages
On 11 Feb 2016 17:35, "sebb" <se...@gmail.com> wrote:

> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
> > I need to annotate our structured data set of Apache projects to track
> > which project names are registered trademarks.  This is needed to be
> > able to properly generate a.o/foundation/marks/list (which is currently
> > sadly outdated since it's manually built now).  This is a serious need
> > for Brand Management, since we regularly have third parties say "but you
> > didn't SAY it was your trademark, so I can do it anyway..."
> >
> > My thought is to annotate the PMC DOAP files with a registered marker,
> > then use the existing projects.a.o building of the organized data.  Then
> > use either JS or some cron static generation to display the actual
> > marks/list page.
>
> There are two kinds of RDF files:
> - the PMC RDF files [1] which are mainly stored in the comdev area
> [2], though they can also be stored elsewhere.
> The locations of the files are held in committees.xml [3]
> [These are not actually DOAP files, though the format looks similar.]
>
> - the project DOAP files which are stored by individual projects; they
> are listed in projects.xml [4]
>
> A single PMC RDF file can be associated with multiple DOAP files, e.g.
> Commons, Creadur, Tomcat all have multiple independent project
> releases.
>
> > Is annotating the project data sources the best idea, or should I simply
> > create a new stable URL data source that's just a list of registered
> > names, and join the tables?
>
> I doubt if either of the above file types are suitable.
> The location of the index XML files [3], [4] has already been changed
> once (when projects-new was established).
>
> DOAP files are located all over the place and are often moved within
> the SCM without updating the index file.
> If they are located in the source tree there are often multiple copies
> in different branches.
>
> PMC RDF files may not be updateable except by the project (if located
> in their SCM), and again may move without warning if they are not in
> [2].
>
> It would potentially be possible to recover the PMC RDF files from
> their external locations and insist that they only be stored in the
> comdev area.
> But a single PMC may have multiple marks. Potentially also a project
> may move from a PMC to become its own PMC.
>
> Therefore I think a separate file is needed.
> That would also allow write access to be limited if necessary.
>
> > The end result needs to be webcontent listing projects like:
> >
> > <h2>The ASF claims these trademarks</h2>
> > ...list all active TLPs
> > <a href="{$homepage}">Apache <b>{$projectname}</b></a>
> > {$if registered then "&reg;" else "&trade;"}
> >
> > <br/>
> >   {$shortdesc}
> > ...
> > <h2>The following projects are retired</h2>
> > ...list all Attic projects
> >
> > <h2>The following projects are in incubation; all trademarks here may be
> > property of respective owners</h2>
> > ...list all Incubation projects
> >
> > Separately, we should list the name of each software *product* here,
> > since if we offer something with a clear name as an independently
> > downloadable software product, it can be our trademark.  So I'd like to
> > list "Apache Directory Studio", since that's a notable name and a major
> > product.  But I don't want to list "Apache Commons Foo Bar Baz and
> > Kitchensink", since those are effectively just minor components that
> > aren't really worth claiming.
> >
> > Comments/suggestions please?  I'm including the Whimsical project since
> > they are also major consumers of this data.
> >
> > - Shane
>
> [1] https://projects.apache.org/pmc_rdf.html
>
> [2]
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
> [3]
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
> [4]
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml
>

Re: Adding asfext:registered to projects.a.o?

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 11:35 AM, sebb <se...@gmail.com> wrote:
> On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
>> I need to annotate our structured data set of Apache projects to track
>> which project names are registered trademarks.  This is needed to be
>> able to properly generate a.o/foundation/marks/list (which is currently
>> sadly outdated since it's manually built now).  This is a serious need
>> for Brand Management, since we regularly have third parties say "but you
>> didn't SAY it was your trademark, so I can do it anyway..."
>>
>> My thought is to annotate the PMC DOAP files with a registered marker,
>> then use the existing projects.a.o building of the organized data.  Then
>> use either JS or some cron static generation to display the actual
>> marks/list page.
>
> There are two kinds of RDF files:
> - the PMC RDF files [1] which are mainly stored in the comdev area
> [2], though they can also be stored elsewhere.
> The locations of the files are held in committees.xml [3]
> [These are not actually DOAP files, though the format looks similar.]
>
> - the project DOAP files which are stored by individual projects; they
> are listed in projects.xml [4]
>
> A single PMC RDF file can be associated with multiple DOAP files, e.g.
> Commons, Creadur, Tomcat all have multiple independent project
> releases.
>
>> Is annotating the project data sources the best idea, or should I simply
>> create a new stable URL data source that's just a list of registered
>> names, and join the tables?
>
> I doubt if either of the above file types are suitable.
> The location of the index XML files [3], [4] has already been changed
> once (when projects-new was established).
>
> DOAP files are located all over the place and are often moved within
> the SCM without updating the index file.
> If they are located in the source tree there are often multiple copies
> in different branches.
>
> PMC RDF files may not be updateable except by the project (if located
> in their SCM), and again may move without warning if they are not in
> [2].
>
> It would potentially be possible to recover the PMC RDF files from
> their external locations and insist that they only be stored in the
> comdev area.
> But a single PMC may have multiple marks. Potentially also a project
> may move from a PMC to become its own PMC.
>
> Therefore I think a separate file is needed.
> That would also allow write access to be limited if necessary.

There are indeed multiple ways to solve this, and each way involves a tradeoff.

I would suggest separating this question into three parts.

- - -

First, where is the ultimate source for the data.  And the best way to
address that question is to first decide who will be updating that
data.  Will it be each project, or those on the branding mailing list,
or only VP brand?  Knowing the answer to that question will make a big
difference.

My suggestion would be to start simple with a single file, in the same
directory as committee-info.txt.  I'd suggest YAML as a format as it
is a good tradeoff between human edit-ability and programmatic
parse-ability.

- - -

Next is access.  What you need is something that takes the data from
the private repository, sanitizes it, and publishes the result for
public consumption.  Whimsy has a bunch of cron jobs that places
similar data here: https://whimsy.apache.org/public/.  A script that
parses a YAML file out of SVN, selects and filters out various parts,
and publishes the results in JSON format is very doable.

---

Finally, there is publishing.  While that could be a cron job that
produces static HTML, web browsers have the ability to consume JSON
and format the results.  That's probably the best solution to this.

---

The Apache Phone book is an example of an application that uses the
above design:

https://home.apache.org/phonebook.html

In fact, if the data is made available in this manner, the trademark
information could be included directly in the results of the page it
produces.  That's one of the nice things about having a public JSON
version of the data published - multiple tools can consume that data.

- Sam Ruby

>> The end result needs to be webcontent listing projects like:
>>
>> <h2>The ASF claims these trademarks</h2>
>> ...list all active TLPs
>> <a href="{$homepage}">Apache <b>{$projectname}</b></a>
>> {$if registered then "&reg;" else "&trade;"}
>>
>> <br/>
>>   {$shortdesc}
>> ...
>> <h2>The following projects are retired</h2>
>> ...list all Attic projects
>>
>> <h2>The following projects are in incubation; all trademarks here may be
>> property of respective owners</h2>
>> ...list all Incubation projects
>>
>> Separately, we should list the name of each software *product* here,
>> since if we offer something with a clear name as an independently
>> downloadable software product, it can be our trademark.  So I'd like to
>> list "Apache Directory Studio", since that's a notable name and a major
>> product.  But I don't want to list "Apache Commons Foo Bar Baz and
>> Kitchensink", since those are effectively just minor components that
>> aren't really worth claiming.
>>
>> Comments/suggestions please?  I'm including the Whimsical project since
>> they are also major consumers of this data.
>>
>> - Shane
>
> [1] https://projects.apache.org/pmc_rdf.html
>
> [2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
> [3] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
> [4] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml

Re: Adding asfext:registered to projects.a.o?

Posted by sebb <se...@gmail.com>.
On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
> I need to annotate our structured data set of Apache projects to track
> which project names are registered trademarks.  This is needed to be
> able to properly generate a.o/foundation/marks/list (which is currently
> sadly outdated since it's manually built now).  This is a serious need
> for Brand Management, since we regularly have third parties say "but you
> didn't SAY it was your trademark, so I can do it anyway..."
>
> My thought is to annotate the PMC DOAP files with a registered marker,
> then use the existing projects.a.o building of the organized data.  Then
> use either JS or some cron static generation to display the actual
> marks/list page.

There are two kinds of RDF files:
- the PMC RDF files [1] which are mainly stored in the comdev area
[2], though they can also be stored elsewhere.
The locations of the files are held in committees.xml [3]
[These are not actually DOAP files, though the format looks similar.]

- the project DOAP files which are stored by individual projects; they
are listed in projects.xml [4]

A single PMC RDF file can be associated with multiple DOAP files, e.g.
Commons, Creadur, Tomcat all have multiple independent project
releases.

> Is annotating the project data sources the best idea, or should I simply
> create a new stable URL data source that's just a list of registered
> names, and join the tables?

I doubt if either of the above file types are suitable.
The location of the index XML files [3], [4] has already been changed
once (when projects-new was established).

DOAP files are located all over the place and are often moved within
the SCM without updating the index file.
If they are located in the source tree there are often multiple copies
in different branches.

PMC RDF files may not be updateable except by the project (if located
in their SCM), and again may move without warning if they are not in
[2].

It would potentially be possible to recover the PMC RDF files from
their external locations and insist that they only be stored in the
comdev area.
But a single PMC may have multiple marks. Potentially also a project
may move from a PMC to become its own PMC.

Therefore I think a separate file is needed.
That would also allow write access to be limited if necessary.

> The end result needs to be webcontent listing projects like:
>
> <h2>The ASF claims these trademarks</h2>
> ...list all active TLPs
> <a href="{$homepage}">Apache <b>{$projectname}</b></a>
> {$if registered then "&reg;" else "&trade;"}
>
> <br/>
>   {$shortdesc}
> ...
> <h2>The following projects are retired</h2>
> ...list all Attic projects
>
> <h2>The following projects are in incubation; all trademarks here may be
> property of respective owners</h2>
> ...list all Incubation projects
>
> Separately, we should list the name of each software *product* here,
> since if we offer something with a clear name as an independently
> downloadable software product, it can be our trademark.  So I'd like to
> list "Apache Directory Studio", since that's a notable name and a major
> product.  But I don't want to list "Apache Commons Foo Bar Baz and
> Kitchensink", since those are effectively just minor components that
> aren't really worth claiming.
>
> Comments/suggestions please?  I'm including the Whimsical project since
> they are also major consumers of this data.
>
> - Shane

[1] https://projects.apache.org/pmc_rdf.html

[2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
[3] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
[4] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml

Re: Adding asfext:registered to projects.a.o?

Posted by sebb <se...@gmail.com>.
On 11 February 2016 at 12:03, Shane Curcuru <as...@shanecurcuru.org> wrote:
> I need to annotate our structured data set of Apache projects to track
> which project names are registered trademarks.  This is needed to be
> able to properly generate a.o/foundation/marks/list (which is currently
> sadly outdated since it's manually built now).  This is a serious need
> for Brand Management, since we regularly have third parties say "but you
> didn't SAY it was your trademark, so I can do it anyway..."
>
> My thought is to annotate the PMC DOAP files with a registered marker,
> then use the existing projects.a.o building of the organized data.  Then
> use either JS or some cron static generation to display the actual
> marks/list page.

There are two kinds of RDF files:
- the PMC RDF files [1] which are mainly stored in the comdev area
[2], though they can also be stored elsewhere.
The locations of the files are held in committees.xml [3]
[These are not actually DOAP files, though the format looks similar.]

- the project DOAP files which are stored by individual projects; they
are listed in projects.xml [4]

A single PMC RDF file can be associated with multiple DOAP files, e.g.
Commons, Creadur, Tomcat all have multiple independent project
releases.

> Is annotating the project data sources the best idea, or should I simply
> create a new stable URL data source that's just a list of registered
> names, and join the tables?

I doubt if either of the above file types are suitable.
The location of the index XML files [3], [4] has already been changed
once (when projects-new was established).

DOAP files are located all over the place and are often moved within
the SCM without updating the index file.
If they are located in the source tree there are often multiple copies
in different branches.

PMC RDF files may not be updateable except by the project (if located
in their SCM), and again may move without warning if they are not in
[2].

It would potentially be possible to recover the PMC RDF files from
their external locations and insist that they only be stored in the
comdev area.
But a single PMC may have multiple marks. Potentially also a project
may move from a PMC to become its own PMC.

Therefore I think a separate file is needed.
That would also allow write access to be limited if necessary.

> The end result needs to be webcontent listing projects like:
>
> <h2>The ASF claims these trademarks</h2>
> ...list all active TLPs
> <a href="{$homepage}">Apache <b>{$projectname}</b></a>
> {$if registered then "&reg;" else "&trade;"}
>
> <br/>
>   {$shortdesc}
> ...
> <h2>The following projects are retired</h2>
> ...list all Attic projects
>
> <h2>The following projects are in incubation; all trademarks here may be
> property of respective owners</h2>
> ...list all Incubation projects
>
> Separately, we should list the name of each software *product* here,
> since if we offer something with a clear name as an independently
> downloadable software product, it can be our trademark.  So I'd like to
> list "Apache Directory Studio", since that's a notable name and a major
> product.  But I don't want to list "Apache Commons Foo Bar Baz and
> Kitchensink", since those are effectively just minor components that
> aren't really worth claiming.
>
> Comments/suggestions please?  I'm including the Whimsical project since
> they are also major consumers of this data.
>
> - Shane

[1] https://projects.apache.org/pmc_rdf.html

[2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/
[3] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml
[4] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml