You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Minto van der Sluis <mi...@xup.nl> on 2015/01/15 10:36:11 UTC

Re: your opinion on commons-rdf proposal

Sure!

Reto Gmür schreef op 15-1-2015 om 10:06:
> Hi Minto,
>
> Thanks a lot for your valuable comments. Would you mind reposting to
> the mailing list as to have tis discussion public?
>
> Cheers,
> Reto
>
> On Thu, Jan 15, 2015 at 9:01 AM, Minto van der Sluis <minto@xup.nl
> <ma...@xup.nl>> wrote:
>
>     Hi Reto,
>
>     Thanks for showing interest in my opinion.
>
>     First of all the whole discussion around commons-rdf involves way to
>     much religion. Religion as in: my implemention should be reference
>     implementation. IMO commons-rdf should be about designing the best RDF
>     API and not about making some implementation fit that API.
>
>     On the API itself:
>     1) I am glad you chose to derive from Collections. This opens up the
>     possibility to use Java 8 streams to improve performance especially in
>     the filter() method.
>     2) Hmm, is filter() still required if we can use java 8 streams
>     (collection.stream().filter())?
>     3) I dislike BlankNodeOrIri interface name. Judging from the
>     github:commons-rdf comments the name should be Subject. Taking your
>     comments Resource might be a better name. BTW, the comments for this
>     interface differ between your sandbox and the github commons-rdf.
>     4) Why does GraphEvent only has one triple? What if you remove/add a
>     large number triples?
>     5) Events are not ready for extension. AddEvent accually is something
>     like AddedTriple(s)Event. Same for remove. The (s) depends on the
>     outcome of the previous point. See next point for additional events.
>     6) The API misses facilities to access/create/query graphs. If
>     this gets
>     included you probably also end up with events like AddedGraphEvent
>     ditto
>     for remove. For this I envision something along the lines of JDBC and
>     DataSources.
>     7) Also the whole event mechanism might be extremely difficult to
>     realise. Of course from within the implementation it is easy, but
>     think
>     distributed here. Take for instance a sparql endpoints. It is
>     relativily
>     straightforward to create an implementation for this except for the
>     eventing part. I wouldn't know how to implement eventing without
>     polling
>     the sparql endpoint every so often. Shouldn't events be something
>     additional/optional.
>
>     So far for quickly scanning things.
>
>     Personally I'd also like to see a pure in memory based
>     implementation it
>     not only makes testing things easier for the API users, but also
>     helps focus
>     on what is best for a clean/clear API. Like I mentioned before,
>     the API
>     should be leading NOT the implementation. Also a test
>     compatibility kit
>     (TCK) might come in handy to ensure other implementations work as
>     expected.
>
>     And if we get this far we might as well try to make it a standard by
>     submitting a JSR ;-)
>
>     Regards,
>
>     Minto
>
>
>     Reto Gmür schreef op 14-1-2015 om 15:15:
>     > Hi Minto
>     >
>     > I would be very interested to learn abou your opinion on the
>     > commons-rdf proposal I recently committed.
>     >
>     > Cheers,
>     > Reto
>

Re: your opinion on commons-rdf proposal

Posted by Andy Seaborne <an...@apache.org>.

>> On Thu, Jan 15, 2015 at 9:01 AM, Minto van der Sluis <minto@xup.nl
>> <ma...@xup.nl>> wrote:
>>
>>      Hi Reto,
>>
>>      Thanks for showing interest in my opinion.
>>
>>      First of all the whole discussion around commons-rdf involves way to
>>      much religion. Religion as in: my implemention should be reference
>>      implementation. IMO commons-rdf should be about designing the best RDF
>>      API and not about making some implementation fit that API.
>>
>>      On the API itself:
>>      1) I am glad you chose to derive from Collections. This opens up the
>>      possibility to use Java 8 streams to improve performance especially in
>>      the filter() method.
>>      2) Hmm, is filter() still required if we can use java 8 streams
>>      (collection.stream().filter())?
>>      3) I dislike BlankNodeOrIri interface name. Judging from the
>>      github:commons-rdf comments the name should be Subject. Taking your
>>      comments Resource might be a better name. BTW, the comments for this
>>      interface differ between your sandbox and the github commons-rdf.

"Subject" might be backing into a corner.

A: IRI < BlankNodeOrIri

and it can occur in the object or predicate positions.

B: Literals as subjects are only blocked in RDF because of the 
peculiarities or the RDF/XML syntax.  They are arise naturally in rules 
(and to some extent query).

None of the APIs are generalised APIs (see the github discussions on that

https://github.com/commons-rdf/commons-rdf/issues/1

which is fine.

>>      4) Why does GraphEvent only has one triple? What if you remove/add a
>>      large number triples?

Yes - the experience from Jena and Jena users supports this. The graph 
level/triple action event mechanism is one point on a continuum.

Example 1:
Adding a foaf:Person (says, 5-10 triples) to a graph.

- Want to know when that start and finishes, regardless of triple count 
and order.  Just triples does indicate when it's finished without a 
convention on "last" triple.

- less interesting to see part data : indeed it is positively bad.

Example 2:
Bulk addition: reading in a new file into a graph has natural 
start/finish boundaries that can be of interest. An event on every 
triple of X million is not.


The event parts can be separate from the Graph interface - advantage, 
other objects types can share the event mechanism (caution point: 
references to objects in events can lead to GC fun and games).

Triple events can happen be added aspects on a interface - then the 
application does not incur the event system costs if it does not use it.

>>      5) Events are not ready for extension. AddEvent accually is something
>>      like AddedTriple(s)Event. Same for remove. The (s) depends on the
>>      outcome of the previous point. See next point for additional events.
>>      6) The API misses facilities to access/create/query graphs. If
>>      this gets
>>      included you probably also end up with events like AddedGraphEvent
>>      ditto
>>      for remove. For this I envision something along the lines of JDBC and
>>      DataSources.
>>      7) Also the whole event mechanism might be extremely difficult to
>>      realise. Of course from within the implementation it is easy, but
>>      think
>>      distributed here.

+1

Use cases I'm interested in include graphs across multiple machines and 
persistence data.


>> Take for instance a sparql endpoints. It is
>>      relativily
>>      straightforward to create an implementation for this except for the
>>      eventing part. I wouldn't know how to implement eventing without
>>      polling
>>      the sparql endpoint every so often. Shouldn't events be something
>>      additional/optional.
>>
>>      So far for quickly scanning things.
>>
>>      Personally I'd also like to see a pure in memory based
>>      implementation it
>>      not only makes testing things easier for the API users, but also
>>      helps focus
>>      on what is best for a clean/clear API. Like I mentioned before,
>>      the API
>>      should be leading NOT the implementation. Also a test
>>      compatibility kit
>>      (TCK) might come in handy to ensure other implementations work as
>>      expected.
>>
>>      And if we get this far we might as well try to make it a standard by
>>      submitting a JSR ;-)
>>
>>      Regards,
>>
>>      Minto
>>
>>
>>      Reto Gmür schreef op 14-1-2015 om 15:15:
>>      > Hi Minto
>>      >
>>      > I would be very interested to learn abou your opinion on the
>>      > commons-rdf proposal I recently committed.
>>      >
>>      > Cheers,
>>      > Reto
>>
>

Re: your opinion on commons-rdf proposal

Posted by Peter Ansell <an...@gmail.com>.

On 24 January 2015 at 01:12, Reto Gmür <re...@apache.org> wrote:
> Hi Minto
>
> Thanks for your comments.
>
>>     1) I am glad you chose to derive from Collections. This opens up the
>> >     possibility to use Java 8 streams to improve performance especially
>> in
>> >     the filter() method.
>> >     2) Hmm, is filter() still required if we can use java 8 streams
>> >     (collection.stream().filter())?
>>
>
> I think only a dedicated filter method can be implemented perfomantly (i.e.
> using indexex). Correct me if I'm wron, but I think with stream().filter()
> an implementation would have to apply the function to every triple.

No, that isn't how Stream works. The evaluation is designed to be
completely lazy and only perform actions when a terminal operation is
executed, at which point the entire sequence of any
filters/maps/limits/orders/distinct/etc. is known and it can be
optimised completely then.

In addition, the Commons RDF specifically does not extend the
Collections interface, but it has Stream providing methods.

Cheers,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: your opinion on commons-rdf proposal

Posted by Peter Ansell <an...@gmail.com>.

On 16 February 2015 at 07:53, Reto Gmür <re...@apache.org> wrote:
> My SMTP server was having some problems so the mail below was not posted.
>
> I've made some more commits today with some initial code of a Sparql backed
> implementation, but the interesting bits are still missing....
>
>
> Cheers,
> Reto
>
> On Sun, Feb 8, 2015 at 6:37 PM, Reto Gmür <re...@apache.org> wrote:
>
>> Hi Minto, all,
>>
>> As you suggested I removed the event listener support from the Graph
>> Interface, it is now part of the WatchableGraph extending interface.
>>
>> Also I've created a new impl.utils project providing mainly abstract
>> classes to facilitate implementations.With this implementors don't have to
>> care about locking and about graph-isomorphism (for .equals in
>> ImmutableGraph).
>>
>> As the final commons-rdf will take a while to emerge I think we should
>> integrate already intermediate steps in clerezza for a smoother transition
>> and especially to avoid working on different incompatible branches.

Hi Reto,

In order to avoid being seen to be biased to any of the
implementations, we have decided to only include a single actual
implementation, the "simple" implementation that Stian wrote
specifically to test the API. Although it was written specifically as
a test driver, it does provide an un-optimised in-memory model for
very light-weight use if people require that for very small sets.

Perhaps it would be more appropriate to include that implementation in Stanbol?

Cheers,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: your opinion on commons-rdf proposal

Posted by Reto Gmür <re...@apache.org>.

My SMTP server was having some problems so the mail below was not posted.

I've made some more commits today with some initial code of a Sparql backed
implementation, but the interesting bits are still missing....


Cheers,
Reto

On Sun, Feb 8, 2015 at 6:37 PM, Reto Gmür <re...@apache.org> wrote:

> Hi Minto, all,
>
> As you suggested I removed the event listener support from the Graph
> Interface, it is now part of the WatchableGraph extending interface.
>
> Also I've created a new impl.utils project providing mainly abstract
> classes to facilitate implementations.With this implementors don't have to
> care about locking and about graph-isomorphism (for .equals in
> ImmutableGraph).
>
> As the final commons-rdf will take a while to emerge I think we should
> integrate already intermediate steps in clerezza for a smoother transition
> and especially to avoid working on different incompatible branches.
>
> Cheers,
> Reto
>
> On Fri, Jan 23, 2015 at 2:12 PM, Reto Gmür <re...@apache.org> wrote:
>
>> Hi Minto
>>
>> Thanks for your comments.
>>
>> >     1) I am glad you chose to derive from Collections. This opens up the
>>> >     possibility to use Java 8 streams to improve performance
>>> especially in
>>> >     the filter() method.
>>> >     2) Hmm, is filter() still required if we can use java 8 streams
>>> >     (collection.stream().filter())?
>>>
>>
>> I think only a dedicated filter method can be implemented perfomantly
>> (i.e. using indexex). Correct me if I'm wron, but I think with
>> stream().filter() an implementation would have to apply the function to
>> every triple.
>>
>>
>>> >     3) I dislike BlankNodeOrIri interface name. Judging from the
>>> >     github:commons-rdf comments the name should be Subject. Taking your
>>> >     comments Resource might be a better name. BTW, the comments for
>>> this
>>> >     interface differ between your sandbox and the github commons-rdf.
>>>
>>
>> BlankNodeOrIri used to be called NonLiteral. The term "resource" is used
>> in RDFS and also includes literals. So the old Resource interface is
>> equivalent to the new RdfTerm interface. The API documentation needs to be
>> improved, as it still used old terms.
>>
>>
>>
>>
>>> >     4) Why does GraphEvent only has one triple? What if you remove/add
>>> a
>>> >     large number triples?
>>>
>>
>> If one requests synchronous notification one always gets one event at the
>> time (except maybe for addAll, removeAll, ratinAll and clear). With
>> asynchronous notification one will get a bigger list of events. I think it
>> is better to get add-events and remove-events together, rather than getting
>> a single add-event with all the added triples and a single remove event
>> with all the removed triples.
>>
>>
>>> >     5) Events are not ready for extension. AddEvent accually is
>>> something
>>> >     like AddedTriple(s)Event. Same for remove. The (s) depends on the
>>> >     outcome of the previous point. See next point for additional
>>> events.
>>>
>> >     6) The API misses facilities to access/create/query graphs. If
>>> >     this gets
>>> >     included you probably also end up with events like AddedGraphEvent
>>> >     ditto
>>> >     for remove. For this I envision something along the lines of JDBC
>>> and
>>> >     DataSources.
>>>
>> You're right. For now there is no DataSet (aka TcProvider) in the API.
>> The main reason for this was to keep the scope close to github proposal. If
>> we add DataSet we should add respective events (DataSetEvent).
>>
>>
>>
>>> >     7) Also the whole event mechanism might be extremely difficult to
>>> >     realise. Of course from within the implementation it is easy, but
>>> >     think
>>> >     distributed here. Take for instance a sparql endpoints. It is
>>> >     relativily
>>> >     straightforward to create an implementation for this except for the
>>> >     eventing part. I wouldn't know how to implement eventing without
>>> >     polling
>>> >     the sparql endpoint every so often. Shouldn't events be something
>>> >     additional/optional.
>>>
>> Having a graph implementation backed by a SPARQL endpoint is not trivial
>> (unless you don't care aboutr blank nodes).
>>
>> The question is if the API must guarantee that all changes to the graph
>> fire respective events or if it is acceptable for an implementation to only
>> notify about changes via the instance. I think that even the latter is
>> useful. And given some abstract implementation classes support doesn't cost
>> a lot of effort by the implementors.
>>
>> If on the other we remove that from the core API and provide a
>> WatchableGraph API as extension we could provide a wrapper for
>> non-watchable graphs. I think both approaches would work.
>>
>>
>>
>>
>>> >
>>> >     So far for quickly scanning things.
>>> >
>>> >     Personally I'd also like to see a pure in memory based
>>> >     implementation it
>>> >     not only makes testing things easier for the API users, but also
>>> >     helps focus
>>> >     on what is best for a clean/clear API.
>>>
>> I suggest we use what is now the IndexedMGraph in Apache Stanbol for
>> this. This provide a more acceptable performance than the SimpleMGraph from
>> clerezza.
>>
>>
>>> Like I mentioned before,
>>> >     the API
>>> >     should be leading NOT the implementation. Also a test
>>> >     compatibility kit
>>> >     (TCK) might come in handy to ensure other implementations work as
>>> >     expected.
>>>
>>
>> Currently in clerezza we have rdf.core.tests part of it could become a
>> part of commons-test.
>>
>> I agree this is very important to ensure interoperability.
>>
>>
>> Cheers,
>>
>> Reto
>>
>>
>>> >
>>> >     And if we get this far we might as well try to make it a standard
>>> by
>>> >     submitting a JSR ;-)
>>>
>>
>>
>>
>>> >
>>> >     Regards,
>>> >
>>> >     Minto
>>> >
>>> >
>>> >     Reto Gmür schreef op 14-1-2015 om 15:15:
>>> >     > Hi Minto
>>> >     >
>>> >     > I would be very interested to learn abou your opinion on the
>>> >     > commons-rdf proposal I recently committed.
>>> >     >
>>> >     > Cheers,
>>> >     > Reto
>>> >
>>>
>>
>>
>

Re: your opinion on commons-rdf proposal

Posted by Reto Gmür <re...@apache.org>.

My SMTP server was having some problems so the mail below was not posted.

I've made some more commits today with some initial code of a Sparql backed
implementation, but the interesting bits are still missing....


Cheers,
Reto

On Sun, Feb 8, 2015 at 6:37 PM, Reto Gmür <re...@apache.org> wrote:

> Hi Minto, all,
>
> As you suggested I removed the event listener support from the Graph
> Interface, it is now part of the WatchableGraph extending interface.
>
> Also I've created a new impl.utils project providing mainly abstract
> classes to facilitate implementations.With this implementors don't have to
> care about locking and about graph-isomorphism (for .equals in
> ImmutableGraph).
>
> As the final commons-rdf will take a while to emerge I think we should
> integrate already intermediate steps in clerezza for a smoother transition
> and especially to avoid working on different incompatible branches.
>
> Cheers,
> Reto
>
> On Fri, Jan 23, 2015 at 2:12 PM, Reto Gmür <re...@apache.org> wrote:
>
>> Hi Minto
>>
>> Thanks for your comments.
>>
>> >     1) I am glad you chose to derive from Collections. This opens up the
>>> >     possibility to use Java 8 streams to improve performance
>>> especially in
>>> >     the filter() method.
>>> >     2) Hmm, is filter() still required if we can use java 8 streams
>>> >     (collection.stream().filter())?
>>>
>>
>> I think only a dedicated filter method can be implemented perfomantly
>> (i.e. using indexex). Correct me if I'm wron, but I think with
>> stream().filter() an implementation would have to apply the function to
>> every triple.
>>
>>
>>> >     3) I dislike BlankNodeOrIri interface name. Judging from the
>>> >     github:commons-rdf comments the name should be Subject. Taking your
>>> >     comments Resource might be a better name. BTW, the comments for
>>> this
>>> >     interface differ between your sandbox and the github commons-rdf.
>>>
>>
>> BlankNodeOrIri used to be called NonLiteral. The term "resource" is used
>> in RDFS and also includes literals. So the old Resource interface is
>> equivalent to the new RdfTerm interface. The API documentation needs to be
>> improved, as it still used old terms.
>>
>>
>>
>>
>>> >     4) Why does GraphEvent only has one triple? What if you remove/add
>>> a
>>> >     large number triples?
>>>
>>
>> If one requests synchronous notification one always gets one event at the
>> time (except maybe for addAll, removeAll, ratinAll and clear). With
>> asynchronous notification one will get a bigger list of events. I think it
>> is better to get add-events and remove-events together, rather than getting
>> a single add-event with all the added triples and a single remove event
>> with all the removed triples.
>>
>>
>>> >     5) Events are not ready for extension. AddEvent accually is
>>> something
>>> >     like AddedTriple(s)Event. Same for remove. The (s) depends on the
>>> >     outcome of the previous point. See next point for additional
>>> events.
>>>
>> >     6) The API misses facilities to access/create/query graphs. If
>>> >     this gets
>>> >     included you probably also end up with events like AddedGraphEvent
>>> >     ditto
>>> >     for remove. For this I envision something along the lines of JDBC
>>> and
>>> >     DataSources.
>>>
>> You're right. For now there is no DataSet (aka TcProvider) in the API.
>> The main reason for this was to keep the scope close to github proposal. If
>> we add DataSet we should add respective events (DataSetEvent).
>>
>>
>>
>>> >     7) Also the whole event mechanism might be extremely difficult to
>>> >     realise. Of course from within the implementation it is easy, but
>>> >     think
>>> >     distributed here. Take for instance a sparql endpoints. It is
>>> >     relativily
>>> >     straightforward to create an implementation for this except for the
>>> >     eventing part. I wouldn't know how to implement eventing without
>>> >     polling
>>> >     the sparql endpoint every so often. Shouldn't events be something
>>> >     additional/optional.
>>>
>> Having a graph implementation backed by a SPARQL endpoint is not trivial
>> (unless you don't care aboutr blank nodes).
>>
>> The question is if the API must guarantee that all changes to the graph
>> fire respective events or if it is acceptable for an implementation to only
>> notify about changes via the instance. I think that even the latter is
>> useful. And given some abstract implementation classes support doesn't cost
>> a lot of effort by the implementors.
>>
>> If on the other we remove that from the core API and provide a
>> WatchableGraph API as extension we could provide a wrapper for
>> non-watchable graphs. I think both approaches would work.
>>
>>
>>
>>
>>> >
>>> >     So far for quickly scanning things.
>>> >
>>> >     Personally I'd also like to see a pure in memory based
>>> >     implementation it
>>> >     not only makes testing things easier for the API users, but also
>>> >     helps focus
>>> >     on what is best for a clean/clear API.
>>>
>> I suggest we use what is now the IndexedMGraph in Apache Stanbol for
>> this. This provide a more acceptable performance than the SimpleMGraph from
>> clerezza.
>>
>>
>>> Like I mentioned before,
>>> >     the API
>>> >     should be leading NOT the implementation. Also a test
>>> >     compatibility kit
>>> >     (TCK) might come in handy to ensure other implementations work as
>>> >     expected.
>>>
>>
>> Currently in clerezza we have rdf.core.tests part of it could become a
>> part of commons-test.
>>
>> I agree this is very important to ensure interoperability.
>>
>>
>> Cheers,
>>
>> Reto
>>
>>
>>> >
>>> >     And if we get this far we might as well try to make it a standard
>>> by
>>> >     submitting a JSR ;-)
>>>
>>
>>
>>
>>> >
>>> >     Regards,
>>> >
>>> >     Minto
>>> >
>>> >
>>> >     Reto Gmür schreef op 14-1-2015 om 15:15:
>>> >     > Hi Minto
>>> >     >
>>> >     > I would be very interested to learn abou your opinion on the
>>> >     > commons-rdf proposal I recently committed.
>>> >     >
>>> >     > Cheers,
>>> >     > Reto
>>> >
>>>
>>
>>
>

Re: your opinion on commons-rdf proposal

Posted by Reto Gmür <re...@apache.org>.

Hi Minto

Thanks for your comments.

>     1) I am glad you chose to derive from Collections. This opens up the
> >     possibility to use Java 8 streams to improve performance especially
> in
> >     the filter() method.
> >     2) Hmm, is filter() still required if we can use java 8 streams
> >     (collection.stream().filter())?
>

I think only a dedicated filter method can be implemented perfomantly (i.e.
using indexex). Correct me if I'm wron, but I think with stream().filter()
an implementation would have to apply the function to every triple.


> >     3) I dislike BlankNodeOrIri interface name. Judging from the
> >     github:commons-rdf comments the name should be Subject. Taking your
> >     comments Resource might be a better name. BTW, the comments for this
> >     interface differ between your sandbox and the github commons-rdf.
>

BlankNodeOrIri used to be called NonLiteral. The term "resource" is used in
RDFS and also includes literals. So the old Resource interface is
equivalent to the new RdfTerm interface. The API documentation needs to be
improved, as it still used old terms.




> >     4) Why does GraphEvent only has one triple? What if you remove/add a
> >     large number triples?
>

If one requests synchronous notification one always gets one event at the
time (except maybe for addAll, removeAll, ratinAll and clear). With
asynchronous notification one will get a bigger list of events. I think it
is better to get add-events and remove-events together, rather than getting
a single add-event with all the added triples and a single remove event
with all the removed triples.


> >     5) Events are not ready for extension. AddEvent accually is something
> >     like AddedTriple(s)Event. Same for remove. The (s) depends on the
> >     outcome of the previous point. See next point for additional events.
>
>     6) The API misses facilities to access/create/query graphs. If
> >     this gets
> >     included you probably also end up with events like AddedGraphEvent
> >     ditto
> >     for remove. For this I envision something along the lines of JDBC and
> >     DataSources.
>
You're right. For now there is no DataSet (aka TcProvider) in the API. The
main reason for this was to keep the scope close to github proposal. If we
add DataSet we should add respective events (DataSetEvent).



> >     7) Also the whole event mechanism might be extremely difficult to
> >     realise. Of course from within the implementation it is easy, but
> >     think
> >     distributed here. Take for instance a sparql endpoints. It is
> >     relativily
> >     straightforward to create an implementation for this except for the
> >     eventing part. I wouldn't know how to implement eventing without
> >     polling
> >     the sparql endpoint every so often. Shouldn't events be something
> >     additional/optional.
>
Having a graph implementation backed by a SPARQL endpoint is not trivial
(unless you don't care aboutr blank nodes).

The question is if the API must guarantee that all changes to the graph
fire respective events or if it is acceptable for an implementation to only
notify about changes via the instance. I think that even the latter is
useful. And given some abstract implementation classes support doesn't cost
a lot of effort by the implementors.

If on the other we remove that from the core API and provide a
WatchableGraph API as extension we could provide a wrapper for
non-watchable graphs. I think both approaches would work.




> >
> >     So far for quickly scanning things.
> >
> >     Personally I'd also like to see a pure in memory based
> >     implementation it
> >     not only makes testing things easier for the API users, but also
> >     helps focus
> >     on what is best for a clean/clear API.
>
I suggest we use what is now the IndexedMGraph in Apache Stanbol for this.
This provide a more acceptable performance than the SimpleMGraph from
clerezza.


> Like I mentioned before,
> >     the API
> >     should be leading NOT the implementation. Also a test
> >     compatibility kit
> >     (TCK) might come in handy to ensure other implementations work as
> >     expected.
>

Currently in clerezza we have rdf.core.tests part of it could become a part
of commons-test.

I agree this is very important to ensure interoperability.


Cheers,

Reto


> >
> >     And if we get this far we might as well try to make it a standard by
> >     submitting a JSR ;-)
>



> >
> >     Regards,
> >
> >     Minto
> >
> >
> >     Reto Gmür schreef op 14-1-2015 om 15:15:
> >     > Hi Minto
> >     >
> >     > I would be very interested to learn abou your opinion on the
> >     > commons-rdf proposal I recently committed.
> >     >
> >     > Cheers,
> >     > Reto
> >
>

Re: your opinion on commons-rdf proposal

Posted by Reto Gmür <re...@apache.org>.

Hi Minto

Thanks for your comments.

>     1) I am glad you chose to derive from Collections. This opens up the
> >     possibility to use Java 8 streams to improve performance especially
> in
> >     the filter() method.
> >     2) Hmm, is filter() still required if we can use java 8 streams
> >     (collection.stream().filter())?
>

I think only a dedicated filter method can be implemented perfomantly (i.e.
using indexex). Correct me if I'm wron, but I think with stream().filter()
an implementation would have to apply the function to every triple.


> >     3) I dislike BlankNodeOrIri interface name. Judging from the
> >     github:commons-rdf comments the name should be Subject. Taking your
> >     comments Resource might be a better name. BTW, the comments for this
> >     interface differ between your sandbox and the github commons-rdf.
>

BlankNodeOrIri used to be called NonLiteral. The term "resource" is used in
RDFS and also includes literals. So the old Resource interface is
equivalent to the new RdfTerm interface. The API documentation needs to be
improved, as it still used old terms.




> >     4) Why does GraphEvent only has one triple? What if you remove/add a
> >     large number triples?
>

If one requests synchronous notification one always gets one event at the
time (except maybe for addAll, removeAll, ratinAll and clear). With
asynchronous notification one will get a bigger list of events. I think it
is better to get add-events and remove-events together, rather than getting
a single add-event with all the added triples and a single remove event
with all the removed triples.


> >     5) Events are not ready for extension. AddEvent accually is something
> >     like AddedTriple(s)Event. Same for remove. The (s) depends on the
> >     outcome of the previous point. See next point for additional events.
>
>     6) The API misses facilities to access/create/query graphs. If
> >     this gets
> >     included you probably also end up with events like AddedGraphEvent
> >     ditto
> >     for remove. For this I envision something along the lines of JDBC and
> >     DataSources.
>
You're right. For now there is no DataSet (aka TcProvider) in the API. The
main reason for this was to keep the scope close to github proposal. If we
add DataSet we should add respective events (DataSetEvent).



> >     7) Also the whole event mechanism might be extremely difficult to
> >     realise. Of course from within the implementation it is easy, but
> >     think
> >     distributed here. Take for instance a sparql endpoints. It is
> >     relativily
> >     straightforward to create an implementation for this except for the
> >     eventing part. I wouldn't know how to implement eventing without
> >     polling
> >     the sparql endpoint every so often. Shouldn't events be something
> >     additional/optional.
>
Having a graph implementation backed by a SPARQL endpoint is not trivial
(unless you don't care aboutr blank nodes).

The question is if the API must guarantee that all changes to the graph
fire respective events or if it is acceptable for an implementation to only
notify about changes via the instance. I think that even the latter is
useful. And given some abstract implementation classes support doesn't cost
a lot of effort by the implementors.

If on the other we remove that from the core API and provide a
WatchableGraph API as extension we could provide a wrapper for
non-watchable graphs. I think both approaches would work.




> >
> >     So far for quickly scanning things.
> >
> >     Personally I'd also like to see a pure in memory based
> >     implementation it
> >     not only makes testing things easier for the API users, but also
> >     helps focus
> >     on what is best for a clean/clear API.
>
I suggest we use what is now the IndexedMGraph in Apache Stanbol for this.
This provide a more acceptable performance than the SimpleMGraph from
clerezza.


> Like I mentioned before,
> >     the API
> >     should be leading NOT the implementation. Also a test
> >     compatibility kit
> >     (TCK) might come in handy to ensure other implementations work as
> >     expected.
>

Currently in clerezza we have rdf.core.tests part of it could become a part
of commons-test.

I agree this is very important to ensure interoperability.


Cheers,

Reto


> >
> >     And if we get this far we might as well try to make it a standard by
> >     submitting a JSR ;-)
>



> >
> >     Regards,
> >
> >     Minto
> >
> >
> >     Reto Gmür schreef op 14-1-2015 om 15:15:
> >     > Hi Minto
> >     >
> >     > I would be very interested to learn abou your opinion on the
> >     > commons-rdf proposal I recently committed.
> >     >
> >     > Cheers,
> >     > Reto
> >
>

Re: your opinion on commons-rdf proposal

Posted by Andy Seaborne <an...@apache.org>.

>> On Thu, Jan 15, 2015 at 9:01 AM, Minto van der Sluis <minto@xup.nl
>> <ma...@xup.nl>> wrote:
>>
>>      Hi Reto,
>>
>>      Thanks for showing interest in my opinion.
>>
>>      First of all the whole discussion around commons-rdf involves way to
>>      much religion. Religion as in: my implemention should be reference
>>      implementation. IMO commons-rdf should be about designing the best RDF
>>      API and not about making some implementation fit that API.
>>
>>      On the API itself:
>>      1) I am glad you chose to derive from Collections. This opens up the
>>      possibility to use Java 8 streams to improve performance especially in
>>      the filter() method.
>>      2) Hmm, is filter() still required if we can use java 8 streams
>>      (collection.stream().filter())?
>>      3) I dislike BlankNodeOrIri interface name. Judging from the
>>      github:commons-rdf comments the name should be Subject. Taking your
>>      comments Resource might be a better name. BTW, the comments for this
>>      interface differ between your sandbox and the github commons-rdf.

"Subject" might be backing into a corner.

A: IRI < BlankNodeOrIri

and it can occur in the object or predicate positions.

B: Literals as subjects are only blocked in RDF because of the
peculiarities or the RDF/XML syntax.  They are arise naturally in rules
(and to some extent query).

None of the APIs are generalised APIs (see the github discussions on that

https://github.com/commons-rdf/commons-rdf/issues/1

which is fine.

>>      4) Why does GraphEvent only has one triple? What if you remove/add a
>>      large number triples?

Yes - the experience from Jena and Jena users supports this. The graph
level/triple action event mechanism is one point on a continuum.

Example 1:
Adding a foaf:Person (says, 5-10 triples) to a graph.

- Want to know when that start and finishes, regardless of triple count
and order.  Just triples does indicate when it's finished without a
convention on "last" triple.

- less interesting to see part data : indeed it is positively bad.

Example 2:
Bulk addition: reading in a new file into a graph has natural
start/finish boundaries that can be of interest. An event on every
triple of X million is not.


The event parts can be separate from the Graph interface - advantage,
other objects types can share the event mechanism (caution point:
references to objects in events can lead to GC fun and games).

Triple events can happen be added aspects on a interface - then the
application does not incur the event system costs if it does not use it.

>>      5) Events are not ready for extension. AddEvent accually is something
>>      like AddedTriple(s)Event. Same for remove. The (s) depends on the
>>      outcome of the previous point. See next point for additional events.
>>      6) The API misses facilities to access/create/query graphs. If
>>      this gets
>>      included you probably also end up with events like AddedGraphEvent
>>      ditto
>>      for remove. For this I envision something along the lines of JDBC and
>>      DataSources.
>>      7) Also the whole event mechanism might be extremely difficult to
>>      realise. Of course from within the implementation it is easy, but
>>      think
>>      distributed here.

+1

Use cases I'm interested in include graphs across multiple machines and
persistence data.

>> Take for instance a sparql endpoints. It is
>>      relativily
>>      straightforward to create an implementation for this except for the
>>      eventing part. I wouldn't know how to implement eventing without
>>      polling
>>      the sparql endpoint every so often. Shouldn't events be something
>>      additional/optional.
>>
>>      So far for quickly scanning things.
>>
>>      Personally I'd also like to see a pure in memory based
>>      implementation it
>>      not only makes testing things easier for the API users, but also
>>      helps focus
>>      on what is best for a clean/clear API. Like I mentioned before,
>>      the API
>>      should be leading NOT the implementation. Also a test
>>      compatibility kit
>>      (TCK) might come in handy to ensure other implementations work as
>>      expected.
>>
>>      And if we get this far we might as well try to make it a standard by
>>      submitting a JSR ;-)
>>
>>      Regards,
>>
>>      Minto
>>
>>
>>      Reto Gmür schreef op 14-1-2015 om 15:15:
>>      > Hi Minto
>>      >
>>      > I would be very interested to learn abou your opinion on the
>>      > commons-rdf proposal I recently committed.
>>      >
>>      > Cheers,
>>      > Reto
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org