You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Benedikt Ritter <br...@apache.org> on 2015/01/15 09:40:50 UTC

[ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Hi all,

I just want to let you know, that I've joined the discussion, the github
commons rdf community is currently having at github [3]. I think it is time
for the PMC to take action here since it feels like there is a conflict in
the beginning.

Hello Commons RDF community,

first of all, I'm speaking for myself an not on behalf of the Apache
Commons PMC.

I'm really confused by this whole situation. This is what happend from my
POV: the Commons RDF community at one point had the idea of moving to
Apache Commons (which in my eyes made sense, given that fact the Apache
Commons is a place to share code between Apache projects). You really began
pushing things, you even requested a git mirror on behalf of the Apache
Commons project from Infra, which now is unused [1]!
Then Commons RDF decided that it didn't what to join Apache Commons
anymore, which was okay (at least for me).

Later Reto showed up and wanted to try things out in the Apache Commons
Sandbox. This is perfectly okay for us. Every Apache Committer may start
new ideas in our sandbox (in fact we lately granted commit access to all of
our repositories to all ASF committers [2]). However to actually grow out
of the sandbox and become a proper component, there has to be a community
around said component. At the moment, I don't see such a community around
the Apache Commons Sandbox RDF component. But who knows, maybe there will
be such a community one day? Maybe not. We do not force things. We just let
people work with the code (inside the sandbox) the way they like. The is no
threat to this component at all. We don't have an evil plan to destroy
Commons RDF.
The differences regarding how to implement the RDF spec is not of our
business. None of the current Apache Commons team know RDF. Who are we to
judge which approach is the right one?

I'm copying this message to the Apache Commons mailing list, so that
everybody is up to date. If you want to respond, please also copy your
response to the dev ML. If the Common ML is to noisy: we're using prefixes
on the ML. You just have to define a filter that delete all mail which do
not start with [RDF].

I hope we can settle this issue once and for all. Right now it feels like
"Apache Commons are the bad guys" and I don't think we deserve this.

Regards,
Benedikt

[1] https://issues.apache.org/jira/browse/INFRA-8068
[2] http://markmail.org/message/q5slpso253joca7n


[3] https://github.com/commons-rdf/commons-rdf/issues/43

-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by James Carman <ja...@carmanconsulting.com>.
On Saturday, January 17, 2015, Stian Soiland-Reyes <st...@apache.org> wrote:
>
>
> Just rdf@commons should do? Its both a topic and a component.
>

What about floor wax?  Dessert topping?

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by Stian Soiland-Reyes <st...@apache.org>.
On 15 Jan 2015 11:06, "Sergio Fernández" <wi...@apache.org> wrote:

> Therefore my proposal for Commons RDF is the following:
>
> * Commons RDF proposes an API that addresses portability issues. I'd
recommend to start form what we currently have at github which was actually
designed by committee and both Jena and Sesame already started to implement.
> * We evolve the current design in the context of Apache Commons Sandbox
> * We keep separated the API from the implementations:
> * We keep clear the point that the major established RDF Toolkits (Apache
Jena and OpenRDF Sesame) are the recommended implementations
> * We make an open call for contributing basic implementations to the
project. We can adopt the one provided by Stian, and also work with Reto to
move the Clerezza-based implementation (aka Apache Commons Sandbox RDF) to
that API (what seems to be what he is willing to do anyway). The feedback
from those implementations would be consider for evolving the API.

+1 to all the above.

>
> We can easily organize in different Maven artifacts if we all agree on
this setup.

+1 - I can have a first go at this if you want, including Reto's module.

> I just want to ask about the option of having a dedicated dev mailing
list, keeping the general style for announcements or things relevant for
the whole project

Just rdf@commons should do? Its both a topic and a component.

Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by Sergio Fernández <wi...@apache.org>.
Hi Benedikt,

On 15/01/15 09:40, Benedikt Ritter wrote:
> I just want to let you know, that I've joined the discussion, the github
> commons rdf community is currently having at github [3]. I think it is time
> for the PMC to take action here since it feels like there is a conflict in
> the beginning.

OK, although a bit too early, I'm fine jumping into dev@commons.a.o to 
discuss this in the apache way.

First I'd like to apologize with the Apache Commons community, because I 
wanted to keep this conflict out until we could have a solution, which 
honestly we do not have yet beyond a proposal under discussion:

https://github.com/commons-rdf/commons-rdf/issues/43#issuecomment-69916423

> I'm really confused by this whole situation. This is what happend from my
> POV: the Commons RDF community at one point had the idea of moving to
> Apache Commons (which in my eyes made sense, given that fact the Apache
> Commons is a place to share code between Apache projects). You really began
> pushing things, you even requested a git mirror on behalf of the Apache
> Commons project from Infra, which now is unused [1]!
> Then Commons RDF decided that it didn't what to join Apache Commons
> anymore, which was okay (at least for me).

Since I was the person who push for that step, I fell I need to properly 
explain it.

I think at that stage we had three issues: The first one was about git, 
and how the tool was used for agreements on the design. Second, the 
single mailing lists was understood as a barrier for communication. And 
the third, Commons RDF was not yet providing an implementation.

OK, the first one was easy to solve; even if we may loose the nice 
github interfaces, we can keep the workflow based on PRs, that's fine. 
The single mailing list was in fact seen as a kind of problem; on the 
one hand, getting so much noise, but on the other hand also generating 
noise irrelevant for another projects. And about the third one, once we 
got established the API we are in a much better position to provide 
basic implementations. And that's why temporally decided to go back to 
github.

> Later Reto showed up and wanted to try things out in the Apache Commons
> Sandbox. This is perfectly okay for us. Every Apache Committer may start
> new ideas in our sandbox (in fact we lately granted commit access to all of
> our repositories to all ASF committers [2]). However to actually grow out
> of the sandbox and become a proper component, there has to be a community
> around said component. At the moment, I don't see such a community around
> the Apache Commons Sandbox RDF component. But who knows, maybe there will
> be such a community one day? Maybe not. We do not force things. We just let
> people work with the code (inside the sandbox) the way they like. The is no
> threat to this component at all. We don't have an evil plan to destroy
> Commons RDF.
> The differences regarding how to implement the RDF spec is not of our
> business. None of the current Apache Commons team know RDF. Who are we to
> judge which approach is the right one?

We started Commons RDF with the vision of aligning, and allowing 
portability, across the two major and already established RDF libraries 
(Apache Jena and OpenRDF Sesame). I neither have nothing to say how each 
library interpreted and implemented the RDF specification. But I know 
quite well the troubles that that duality causes even for basic things. 
Therefore we started a trip together those tow project (Andy and Peter 
and traveling with us) for designed a basic API that can be considered 
"common". And that's what we have now at github.

I'm not against other implementations, more basic or bound to concrete 
use cases, that's good. But I think just yet-another API would not help. 
And here where we come closer to the point of conflict: the current code 
at Apache Commons Sandbox RDF proposes a new API as a Commons bound to 
an existing implementation (Clerezza) with a very low adoration in the 
developers community, forgetting the background and ignoring it makes 
the incompatibility issue even bigger.

Therefore my proposal for Commons RDF is the following:

* Commons RDF proposes an API that addresses portability issues. I'd 
recommend to start form what we currently have at github which was 
actually designed by committee and both Jena and Sesame already started 
to implement.
* We evolve the current design in the context of Apache Commons Sandbox
* We keep separated the API from the implementations:
* We keep clear the point that the major established RDF Toolkits 
(Apache Jena and OpenRDF Sesame) are the recommended implementations
* We make an open call for contributing basic implementations to the 
project. We can adopt the one provided by Stian, and also work with Reto 
to move the Clerezza-based implementation (aka Apache Commons Sandbox 
RDF) to that API (what seems to be what he is willing to do anyway). The 
feedback from those implementations would be consider for evolving the API.

We can easily organize in different Maven artifacts if we all agree on 
this setup.

> I hope we can settle this issue once and for all. Right now it feels like
> "Apache Commons are the bad guys" and I don't think we deserve this.

I think we never said that, and I personally do not have that feeling. 
We are people with experience in Apache, and we do respect each project, 
specially one as good as Apache Commons.

I just want to ask about the option of having a dedicated dev mailing 
list, keeping the general style for announcements or things relevant for 
the whole project

I really believe we can arrive somewhere.

Thanks for bring this discussion, Benedikt.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by Andy Seaborne <an...@apache.org>.
On 17/01/15 12:00, Bruno P. Kinoshita wrote:
> Hi Andy!
>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>
> That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it going to be necessary to change some classes in the core? I think it will be transparent for other modules like ARQ, Fuseki, Text. Is that right?

I don't think so - Jena's core is "generalized" RDF and this is important.

Just adding any new interfaces to the code Node (etc) objects isn't 
ideal: you get multiple method names for the same thing.  And the 
hashcode/equality contract to work across implementations (hashCode() of 
implementation A must be the same as hashCode() of implementation B when 
equality is the same ) is really quite tricky.

See also my comments about using classes not interfaces.

I personally do not see the worry about wrappers - for me the importance 
is the architectural difference of a presentation API, designed for 
applications to write code against, and systems API, designed to support 
the machinery.  Java is really rather good at optimizing away the cost 
of wrappers, including with multisite method dispatch optimizations and 
coping with dynamic loading code that changes assumptions at a later time.

So a new module that is "jena-commons-rdf" that provides an application 
presentation API woudl be the obvious route to me.  Fuseki etc

And this is only RDF, not Datasets or SPARQL.  We discussed that and 
fairly easily came to the conclusion that getting some common sooner was 
better than a complete set of APIs.  Some of the natural other ones are 
a lot more complicated - they would build on the terms provided by 
commons-rdf.

>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>
> Good point. I'm enjoying learning about Jena code for JENA-632. Even though datasets, streaming queries collections and all that part about journaling and graph persistence can be a bit scary.

:-)

Luckily, journalling and persistent is orthogonal to implementation 
JENA-632 though as a application feature mapped over the whole system, 
its a good way of seeing across several components.

> Probably that won't be covered in the commons-rdf, but I think that's correct.

I agree - there is a new world out here - a world of large memory 
machines, and quite likely, large scale persistent RAM in the not too 
distant future.  Given the longevity of shared APIs, it's very hard to 
find a balance across requirements and expectations.  The graph level is 
naturally driven by the specs but as soon as systems issues get thrown 
into the mix, the choice space is much larger.

	Andy

>
> Thanks!
> Bruno
>
>
> ----- Original Message -----
>> From: Andy Seaborne <an...@apache.org>
>> To: dev@commons.apache.org
>> Cc:
>> Sent: Saturday, January 17, 2015 7:40 AM
>> Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
>>
>> On 15/01/15 11:52, Bruno P. Kinoshita wrote:
>>
>>>   Hello!
>>>
>>>
>>>   I feel like I can't help much in the current discussion. But just
>> wanted to chime in
>>>   and tell that I'm +1 for a [rdf] component in Apache Commons. As a
>> commons committer I'd
>>>   like to help.
>>>
>>>   I started watching the GitHub repository and have subscribed to the ongoing
>> discussion. I'll
>>>
>>>   tryto contribute in some way; maybe testing and with small patches.
>>>
>>>
>>>   My go-to Maven dependency for RDF, Turtle, N3, working with ontologies,
>> reasoners, etc,
>>>
>>>   is Apache Jena. I think it would be very positive to have a common
>> interface that I could
>>>   use in my code (mainly crawlers and data munging for Hadoop jobs) and that
>> would work
>>>
>>>   with different implementations.
>>>
>>>
>>>   Thanks!
>>>
>>>   Bruno
>>
>> Since you mention Jena ... :-)
>>
>> Jena can (and does) support multiple APIs over a common core.
>>
>> A commons-rdf API can be added along side the existing APIs; that means
>> it is not a "big bang" to have commons-rdf interfaces supported.
>>
>> There is a lot more to working with RDF than the RDF API part - SPARQL
>> engines don't use that API if they want performance and/or scale. (1)
>> SPARQL queries collections of graphs and (2) for scale+persistence, you
>> need to work in parts at a level somewhat lower level than java objects,
>> and closer to the binary of persistence structures.
>>
>>      Andy
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br>.
Hi Andy!

> Jena can (and does) support multiple APIs over a common core.
> 
> A commons-rdf API can be added along side the existing APIs; that means 
> it is not a "big bang" to have commons-rdf interfaces supported.

That's great! Would the commons-rdf dependency go in jena-core/pom.xml? Is it going to be necessary to change some classes in the core? I think it will be transparent for other modules like ARQ, Fuseki, Text. Is that right?

> There is a lot more to working with RDF than the RDF API part - SPARQL 
> engines don't use that API if they want performance and/or scale. (1) 
> SPARQL queries collections of graphs and (2) for scale+persistence, you 
> need to work in parts at a level somewhat lower level than java objects, 
> and closer to the binary of persistence structures.

Good point. I'm enjoying learning about Jena code for JENA-632. Even though datasets, streaming queries collections and all that part about journaling and graph persistence can be a bit scary. Probably that won't be covered in the commons-rdf, but I think that's correct.

Thanks!
Bruno


----- Original Message -----
> From: Andy Seaborne <an...@apache.org>
> To: dev@commons.apache.org
> Cc: 
> Sent: Saturday, January 17, 2015 7:40 AM
> Subject: Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
> 
> On 15/01/15 11:52, Bruno P. Kinoshita wrote:
> 
>>  Hello!
>> 
>> 
>>  I feel like I can't help much in the current discussion. But just 
> wanted to chime in
>>  and tell that I'm +1 for a [rdf] component in Apache Commons. As a 
> commons committer I'd
>>  like to help.
>> 
>>  I started watching the GitHub repository and have subscribed to the ongoing 
> discussion. I'll
>> 
>>  tryto contribute in some way; maybe testing and with small patches.
>> 
>> 
>>  My go-to Maven dependency for RDF, Turtle, N3, working with ontologies, 
> reasoners, etc,
>> 
>>  is Apache Jena. I think it would be very positive to have a common 
> interface that I could
>>  use in my code (mainly crawlers and data munging for Hadoop jobs) and that 
> would work
>> 
>>  with different implementations.
>> 
>> 
>>  Thanks!
>> 
>>  Bruno
> 
> Since you mention Jena ... :-)
> 
> Jena can (and does) support multiple APIs over a common core.
> 
> A commons-rdf API can be added along side the existing APIs; that means 
> it is not a "big bang" to have commons-rdf interfaces supported.
> 
> There is a lot more to working with RDF than the RDF API part - SPARQL 
> engines don't use that API if they want performance and/or scale. (1) 
> SPARQL queries collections of graphs and (2) for scale+persistence, you 
> need to work in parts at a level somewhat lower level than java objects, 
> and closer to the binary of persistence structures.
> 
>     Andy
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by Andy Seaborne <an...@apache.org>.
On 15/01/15 11:52, Bruno P. Kinoshita wrote:
> Hello!
>
>
> I feel like I can't help much in the current discussion. But just wanted to chime in
> and tell that I'm +1 for a [rdf] component in Apache Commons. As a commons committer I'd
> like to help.
>
> I started watching the GitHub repository and have subscribed to the ongoing discussion. I'll
>
> tryto contribute in some way; maybe testing and with small patches.
>
>
> My go-to Maven dependency for RDF, Turtle, N3, working with ontologies, reasoners, etc,
>
> is Apache Jena. I think it would be very positive to have a common interface that I could
> use in my code (mainly crawlers and data munging for Hadoop jobs) and that would work
>
> with different implementations.
>
>
> Thanks!
>
> Bruno

Since you mention Jena ... :-)

Jena can (and does) support multiple APIs over a common core.

A commons-rdf API can be added along side the existing APIs; that means 
it is not a "big bang" to have commons-rdf interfaces supported.

There is a lot more to working with RDF than the RDF API part - SPARQL 
engines don't use that API if they want performance and/or scale. (1) 
SPARQL queries collections of graphs and (2) for scale+persistence, you 
need to work in parts at a level somewhat lower level than java objects, 
and closer to the binary of persistence structures.

	Andy


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br>.
Hello!


I feel like I can't help much in the current discussion. But just wanted to chime in
and tell that I'm +1 for a [rdf] component in Apache Commons. As a commons committer I'd
like to help.

I started watching the GitHub repository and have subscribed to the ongoing discussion. I'll 

tryto contribute in some way; maybe testing and with small patches.


My go-to Maven dependency for RDF, Turtle, N3, working with ontologies, reasoners, etc, 

is Apache Jena. I think it would be very positive to have a common interface that I could
use in my code (mainly crawlers and data munging for Hadoop jobs) and that would work 

with different implementations.


Thanks!

Bruno

>________________________________
> From: Benedikt Ritter <br...@apache.org>
>To: Commons Developers List <de...@commons.apache.org> 
>Sent: Thursday, January 15, 2015 6:40 AM
>Subject: [ALL][RDF] github Commons RDF vs. Apache Commons Sandbox RDF
> 
>
>Hi all,
>
>I just want to let you know, that I've joined the discussion, the github
>commons rdf community is currently having at github [3]. I think it is time
>for the PMC to take action here since it feels like there is a conflict in
>the beginning.
>
>Hello Commons RDF community,
>
>first of all, I'm speaking for myself an not on behalf of the Apache
>Commons PMC.
>
>I'm really confused by this whole situation. This is what happend from my
>POV: the Commons RDF community at one point had the idea of moving to
>Apache Commons (which in my eyes made sense, given that fact the Apache
>Commons is a place to share code between Apache projects). You really began
>pushing things, you even requested a git mirror on behalf of the Apache
>Commons project from Infra, which now is unused [1]!
>Then Commons RDF decided that it didn't what to join Apache Commons
>anymore, which was okay (at least for me).
>
>Later Reto showed up and wanted to try things out in the Apache Commons
>Sandbox. This is perfectly okay for us. Every Apache Committer may start
>new ideas in our sandbox (in fact we lately granted commit access to all of
>our repositories to all ASF committers [2]). However to actually grow out
>of the sandbox and become a proper component, there has to be a community
>around said component. At the moment, I don't see such a community around
>the Apache Commons Sandbox RDF component. But who knows, maybe there will
>be such a community one day? Maybe not. We do not force things. We just let
>people work with the code (inside the sandbox) the way they like. The is no
>threat to this component at all. We don't have an evil plan to destroy
>Commons RDF.
>The differences regarding how to implement the RDF spec is not of our
>business. None of the current Apache Commons team know RDF. Who are we to
>judge which approach is the right one?
>
>I'm copying this message to the Apache Commons mailing list, so that
>everybody is up to date. If you want to respond, please also copy your
>response to the dev ML. If the Common ML is to noisy: we're using prefixes
>on the ML. You just have to define a filter that delete all mail which do
>not start with [RDF].
>
>I hope we can settle this issue once and for all. Right now it feels like
>"Apache Commons are the bad guys" and I don't think we deserve this.
>
>Regards,
>Benedikt
>
>[1] https://issues.apache.org/jira/browse/INFRA-8068
>[2] http://markmail.org/message/q5slpso253joca7n
>
>
>[3] https://github.com/commons-rdf/commons-rdf/issues/43
>
>-- 
>http://people.apache.org/~britter/
>http://www.systemoutprintln.de/
>http://twitter.com/BenediktRitter
>http://github.com/britter
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org