You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2012/11/22 16:04:17 UTC

[Discuss] Apache Portable Uniform RDF Runtime (PURR)

Rob's comments on inverting the reader process [2] suggest to me pulling 
out an API and I wonder if we can identify a portability layer that 
enables some (not all) interoperability and mix-n-match.

The term "API" is creating some confusion in the discussions triggered 
by the Clerezza incubator project being noted [1][3] as "low activity"

To some, it's what the application sees -- a presentation API.  To 
others it is some kind of abstraction between machinery like storage, 
inference, parsing and writing.  They don't have to be the same.

Even if the only outcome if parser and stream processing mouldarity, I 
think it is worth doing. Just being able to add an external "parser" to 
Jena in a cleaner way that is currently possible is useful.

** Apache Portable Uniform RDF Runtime (PURR) **

(OK - the "U" is a bit forced :-)

To me, what we need is an abstraction that allows multiple 
implementations by swapping the jars (or OGSi bundles).  So PURR is a 
set of interfaces.  No state.  c.f. SLJ4J.

There would be many presentation APIs: Model-like, RDF-ORM, Ontology, 
and also for natural use in other JVM-based languages - Scala, Clojure, 
whatever is the next JVM language de jour.

It's not a full application library. It's rather low level.  Writing 
much code directly at the interface may not be pretty.

This is not the Jena graph SPI although that was trying to preform that 
purpose but has wider coverage of functionality.  I think we can go more 
minimal yet.

The Jena Graph SPI has a number of handlers - events, stats, 
transactions - which seem to make the problem too large.  These would be 
part of another subsystem ("extends PURR") and be different in different 
providers.  One of those would be Jena Graph.

PURR would provide the basic concepts from RDF:

Terms: IRIs, Literals, bNodes
Triples and Quads
Graph, Dataset
Factories for each.

and for each be quite vanilla.

e.g. a literal is a lexical form, a datatype and an optional language 
tag.  Immutable with getters.  Structural equality.  No value, XSD or 
otherwise.

(I'll do a quick sketch in another message - but don't read it as fixed, 
just a concrete discussion point)

Parsers:

Parsers, and in the general sense of anything that produced RDF from 
whatever input, be it an RDF syntax or mapping another data format (a 
conversion process), need and input stream and a factory, and emits
Triples, Quads comprising of terms.  That don't need a full "graph" - 
they need a destination to send Triples/Quads (or be pull parsers).

Writers:

Writing is not the reverse of parsing - parsers produce a stream, 
writers for Turtle etc need to poke around the graph to decide what will 
"look nice".  Even N-triples written clustered by subject can be useful.

Negatives:

1/ It's wildly ambitious and impractical to even consider portability 
and abstraction.  Too much time has passed.  Waste of effort.

2/ The portability layer is so narrow that it is not helpful.

3/ No SPARQL.
(counter: (1) SPARQL is a remote protocol - this is same-JVM).
(counter: (2) develop a SPARQL API using PURR basic terms)


Opinions?

	Andy


[1] http://wiki.apache.org/incubator/November2012

[2] http://s.apache.org/KCv
-->
http://mail-archives.apache.org/mod_mbox/jena-dev/201210.mbox/%3CC0B6979A3CA668458B697E4EA907CA940A01AF5C%40CFWEX01.americas.cray.com%3E

[3] http://s.apache.org/lK
-->
http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201211.mbox/%3CCAEWfVJ%3DcKATgo32u-AZDQKq%2BmsaVM_CWRnLo_OLdTYP1jFVzAw%40mail.gmail.com%3E

Re: Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 26, 2012 at 5:21 PM, Chris Dollin
<ch...@epimorphics.com>wrote:

> On Monday, November 26, 2012 04:46:32 PM Reto Bachmann-Gmür wrote:
> > On Sun, Nov 25, 2012 at 3:19 PM, Andy Seaborne <an...@apache.org> wrote:
> >
> > > Slight confusion between identity and equivalence.  Two graph are
> > > equivalent by bNode-isomorphism; they are still different.
> > >
> >
> > Object.equals should implement an equivalence relation. As two graphs are
> > equivalent by rdf semantics iff they are isomorphic equals should return
> > true for isomorphic graphs.
>
> .equals() is tossed around fairly casually in Java programs (perhaps that's
> just me); it's not clear to me that having a modelorgraph's .equals() being
> graph isomorphism -- potentially muchly expensive if bnodes are in play --
> is the most effective choice.
>

For jena 1 Model.equals used to be isomorphism . This was clearly bad as
Models are mutable objects and so you could end up with broken HashSets or
-Maps if they have Models as keys. In Clerezza the equivalent to a jena
Model or Graph is an MGraph (where the M stands for mutable) for this there
is no isomorphism based equality. But if you take a time-slice of it then
equals is based on isomorphism. This is because Garph represents a medium
level of abstraction between the serialized graph and the expressed
content. Might not be cheap to compute yet the problem is easy compared to
comparing the expressed semantic content. This level of abstraction is
handy when doing things like synchronization and versioning based on
decomposed graphs. Or in situation were you don't want to assume logical
omniscience, e.g. we might have a:

HashMap<Graph, Set<Citizen>> petitionSigner which maps from a petition to
the citizens that signed it, if a citizen signs a petition we look up the
key and add them to the value. Would two petitions be equals just because
they logically entail the same people would be surprised to see what they
subscribed to. On the other hand if two petitions would not be considered
equals if they came on two different sheets of paper (or memory segments or
data connections) it would be very hard to find some allies.

>
> ["equals should implement equivalence" doesn't entail "equals is
>  isomorphism". Identity is an equivalence relation too.]
>

Leibniz introduced the principle that "Two entities that do not have any
properties allowing to distinguish them should be seen as a single entity"
now immutable graphs have no properties distinguishing so they should be
considered identical.

Btw, for bnodes the Identity of Indiscernibles doesn't hold (similar to
[2]), as  the graph

____                                          _____
|       | ------ foaf: knows ------>|           |
|       |                                         |           |
|___|  <---- foaf: knows ------  |_____|

contains two bnodes which even though they are not distinguishable they are
not the same as the graph is true for more possible worlds than

____
|       | ------ foaf: knows ---|
|       |                                   |
|___|  <----------------------|

the graph with only one bnode.

(I hope my poor ascii arts survives transport)

Cheers,
Reto

1. http://plato.stanford.edu/entries/identity/
2. http://plato.stanford.edu/entries/identity-indiscernible/


> --
> "The wizard seemed quite willing when I talked to him."  /Howl's Moving
> Castle/
>
> Epimorphics Ltd, http://www.epimorphics.com
> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
> 6PT
> Epimorphics Ltd. is a limited company registered in England (number
> 7016688)
>
>

Re: Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Chris Dollin <ch...@epimorphics.com>.
On Monday, November 26, 2012 04:46:32 PM Reto Bachmann-Gmür wrote:
> On Sun, Nov 25, 2012 at 3:19 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> > Slight confusion between identity and equivalence.  Two graph are
> > equivalent by bNode-isomorphism; they are still different.
> >
> 
> Object.equals should implement an equivalence relation. As two graphs are
> equivalent by rdf semantics iff they are isomorphic equals should return
> true for isomorphic graphs. 

.equals() is tossed around fairly casually in Java programs (perhaps that's
just me); it's not clear to me that having a modelorgraph's .equals() being
graph isomorphism -- potentially muchly expensive if bnodes are in play --
is the most effective choice.

Chris

["equals should implement equivalence" doesn't entail "equals is
 isomorphism". Identity is an equivalence relation too.]

-- 
"The wizard seemed quite willing when I talked to him."  /Howl's Moving Castle/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Damian Steer <D....@bristol.ac.uk>.
On 28/11/12 14:16, Andy Seaborne wrote:
> On 28/11/12 10:59, Damian Steer wrote:

> See also JEP-169 "Value Objects". [1]
> 
> So presumably terms would be value-ish. Ditto triple and quad.
> 
>> Yes
> 
>> I didn't quite understand the implications of [1] but the
>> object-as-value and structural equality makes sense.

<https://blogs.oracle.com/jrose/entry/value_types_in_the_vm> is a dense
brain dump :-)

"As a compiler (JIT) geek, I have seen many hopefully “automagic”
scalarization or unboxing optimizations strain and fail to achieve their
promise.  I have concluded that user advice is necessary."

>> .filter.groupBy is more like a presentation API issue to be built on
>> top.  But if the basics needed can be added to PURR interfaces with too
>> much impact then fine.

The idea is you implement Iterable<Triple> and get the filter / group by
etc 'for free' ('free' does not entail performant, other conditions may
apply...).

(And if you're wondering how they'll add methods to Iterable without
breaking code see 'default methods')

Damian

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Andy Seaborne <an...@apache.org>.
On 28/11/12 10:59, Damian Steer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 25/11/12 14:19, Andy Seaborne wrote:
>
>> The obvious follow-through to me is that there should be classes
>> (final even), not interfaces, because they have strong equality
>> rules.
>
> See also JEP-169 "Value Objects". [1]
>
> So presumably terms would be value-ish. Ditto triple and quad.

Yes

I didn't quite understand the implications of [1] but the 
object-as-value and structural equality makes sense.  The JIT does go 
good stuff with final classes but I recall mention that a single 
implementation of an interface is treated as final and all the 
optimzations get done.  If another class is loaded of the same 
interface, the JITting is undone and redone for virtual methods call.

As the basic RDFterms are simple structs, inlining does have a 
significant part to play for tight loops.

That leaves the need for interfaces as being about to implement without 
a common required jar.

Not quite as SLF4J does it as LoggerFactory is a class (which goes 
looking for ILoggerFactory) but there again, no code changes for 
swapping implementations.

Add factory-as-interface and you could compile against one jar and use a 
completely different implementation at runtime.

Or runtime injection of dependencies (Guice) ...

> Graph and dataset collection-ish or even just iterable? I keep an eye
> the java lambda discussion, and it would be nice to fit with that. [2]
> (although I'm sure I've read an much updated version of this) Stuff like:
>
>      graph.filter(...).groupBy(...).into(anotherGraph);

:-)

I think the main objective is to simply capture the RDF abstract syntax 
and may useful things like prefixes.

.filter.groupBy is more like a presentation API issue to be built on 
top.  But if the basics needed can be added to PURR interfaces with too 
much impact then fine.

(and the scala interface is nearly done!)

> As for equals(), java collections seem to use a form of structural
> equivalence and some even work as hash keys (nice, but ewwwww).

Yes - for nodes and triples and quads that should work.

Java collections get "interesting" as you have to promise not to change 
them if used as hash keys.  Add unmodifiableGraph (and stil promise to 
mess with it directly).  Parser do like to add triples to graphs so that 
are mutable.  Immutablity and equivalance relationships should not be 
built into the lowest level IMHO.  Graphs can be big.

For me, the principle of "start simple and iterate" applies; just having 
the interfaces to see how useful they are would be a step forward.

	Andy

>
> Damian
>
> [1] <http://openjdk.java.net/jeps/169>
> [2]
> <http://cr.openjdk.java.net/~briangoetz/lambda/collections-overview.html>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with undefined - http://www.enigmail.net/
>
> iEYEARECAAYFAlC17qwACgkQAyLCB+mTtynaPgCeP6yK2BERomukW5F5f8+hYe3c
> 2lQAnjSElyKf/GGGExyye1+/DZRtNxaO
> =5lsS
> -----END PGP SIGNATURE-----
>


Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Damian Steer <d....@bristol.ac.uk>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/11/12 14:19, Andy Seaborne wrote:

> The obvious follow-through to me is that there should be classes
> (final even), not interfaces, because they have strong equality
> rules.

See also JEP-169 "Value Objects". [1]

So presumably terms would be value-ish. Ditto triple and quad.

Graph and dataset collection-ish or even just iterable? I keep an eye
the java lambda discussion, and it would be nice to fit with that. [2]
(although I'm sure I've read an much updated version of this) Stuff like:

    graph.filter(...).groupBy(...).into(anotherGraph);

As for equals(), java collections seem to use a form of structural
equivalence and some even work as hash keys (nice, but ewwwww).

Damian

[1] <http://openjdk.java.net/jeps/169>
[2]
<http://cr.openjdk.java.net/~briangoetz/lambda/collections-overview.html>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlC17qwACgkQAyLCB+mTtynaPgCeP6yK2BERomukW5F5f8+hYe3c
2lQAnjSElyKf/GGGExyye1+/DZRtNxaO
=5lsS
-----END PGP SIGNATURE-----

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Reto Bachmann-Gmür <re...@wymiwyg.com>.
On Sun, Nov 25, 2012 at 3:19 PM, Andy Seaborne <an...@apache.org> wrote:

> Slight confusion between identity and equivalence.  Two graph are
> equivalent by bNode-isomorphism; they are still different.
>

Object.equals should implement an equivalence relation. As two graphs are
equivalent by rdf semantics iff they are isomorphic equals should return
true for isomorphic graphs. A bnode by contrast should only b equals to
itself (this holds even if the bnode is tied to a graph, as two bnodes
might be indistinguishable but not equals even within the same graph).

Cheers,
Reto

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Andy Seaborne <an...@apache.org>.
On 23/11/12 10:49, Reto Bachmann-Gmür wrote:
> In clerezza we put a strong emphasis on identity criteria. This is also a
> reason why no factories are part of the API. Two nodes are identical and
> can be used interchangeably iff the are equals according to the relevant
> specs, so it shall not matter if you got your instance from a factory or
> implemented the interface yourself, the two instances behave the same in
> all contexts.

Same debate for Jena Graph SPI a long time ago.

The obvious follow-through to me is that there should be classes (final 
even), not interfaces, because they have strong equality rules.

Only if the interfaces are to be implemented directly, and not via a 
copy transformation, do interfaces seem to have a real reason for being 
there.

> Identity was also the reason for having the distinction
> between immutable and mutable graphs. The RDF specification define when two
> graphs are equals (they are if they are isomorphic) but this criterion can
> only be matched to the Object.equals if the graph aren't mutable (as you
> otherwise run into big problems).

Slight confusion between identity and equivalence.  Two graph are 
equivalent by bNode-isomorphism; they are still different.

Equivalence is context sensitive.  There may be other information, now 
or later, that breaks that equivalence.  But if two things have the same 
identity, that can't happen.

RDF 1.1 is (probably) going to spell this more clearly.

Work-in-progress:
http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes

	Andy


Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Andy,

The ide in clerezza is to have a minimal core API which is close to what is
the SPI in Jena. There we stick to spec in issues like litera as subject
but I think there are good resons in generalizing. This API is designed to
be very easy to implement. Applications would typically user rdf.utils or
rdf.utils.scala which offers a richer (and resource oriented api). So I
think the goald of purr and rdf.core are very similar.

The security stuff is just stnadrd java permssions. Like the
java.iopackage provides FilePermission the "java.rdf" package should
provide
GraphPermission. This has no other consequences on the interfaces and its
up to the implementation if they check permissions or not.

In clerezza we put a strong emphasis on identity criteria. This is also a
reason why no factories are part of the API. Two nodes are identical and
can be used interchangeably iff the are equals according to the relevant
specs, so it shall not matter if you got your instance from a factory or
implemented the interface yourself, the two instances behave the same in
all contexts. Identity was also the reason for having the distinction
between immutable and mutable graphs. The RDF specification define when two
graphs are equals (they are if they are isomorphic) but this criterion can
only be matched to the Object.equals if the graph aren't mutable (as you
otherwise run into big problems).

Reto

On Thu, Nov 22, 2012 at 9:55 PM, Andy Seaborne <an...@apache.org> wrote:

> Reto,
>
> There's lots to be learnt from Clerezza.  Clerezza is a "presentation API"
> - it is aimed at giving applications a programming model.  That it also
> claims to encapsulate other systems is, to the application code, secondary
> - users adopt the Clerezza API in their applications and all it's decisions
> e.g two forms of graph, mutable and imutable as part of the type system.
> Clerezza is stateful, has it's own access permission management model,
> priovides OSGi, and tries to map both ways - it use Jena as an
> implementation of Clerezza and also can expose Clerezza as a Jena facade.
>
> I think multiple presentation APIs is healthy.
>
> PURR is lower level. PURR is trying to be simple.  Too complicated (=
> large) and it will not make progress as too many decisions need to made.
> Instead, focus on the one task of being able to have a narrow interface for
> systems like parsers and be a target for presentation APIs.  PURR can be as
> simple as a name mapping layer, if implemented natively; if done
> non-natively, is a single-copy, no state layer.
>
> I'd expect that (in theory) Clerezza could be written over PURR and not
> need to manage the multiple backends itself.  PURR does not provide the
> variation of graphs that Clerezza does, and as I hope is clear from the
> sketch, and instead of deciding whether, say, literals-as-subjects are in
> or out, it takes a neutral/general approach.
>
> (I say "in theory" because (1) there would have to be real value to
> switching and it's not clear to me there is and (2) PURR is small so there
> might be other things not covered Clerezza would want to expose.)
>
>         Andy
>
>
> On 22/11/12 15:20, Reto Bachmann-Gmür wrote:
>
>> Very glad you've started such a uniformications discussion on the jena
>> mailing list. I think it would be good to have such an API adopted by Jena
>> as well. I think it would be important for such an API to be based on
>> standards and not specifically on triple-stor design as to allow exposing
>> other object structures as well as RDF through this API.
>>
>> A couple of thoughts:
>>
>> - To keep the API simple I think it should either be quads or datasets.
>> I'd
>> go for DataSets as this is part of Standards (Sparql) and see quads as a
>> way this can be implemented.
>> - Given a DataSet, why not allow Sparql queries against it? (With an
>> abstract implementation that locates a query engine and if no such engine
>> is found throws a NoQueryEngineFoundException)
>>
>> Apart from the above differences, is there any part of the clerezza rdf
>> api
>> you would implement fundamentally different (I agree the naming should be
>> revisited) than in the Clerezza api [1]?
>>
>> Cheers,
>> Reto
>>
>> http://incubator.apache.org/**clerezza/mvn-site/org.apache.**
>> clerezza.rdf.core/apidocs/**index.html<http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/index.html>
>>
>>
>> On Thu, Nov 22, 2012 at 4:04 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  Rob's comments on inverting the reader process [2] suggest to me pulling
>>> out an API and I wonder if we can identify a portability layer that
>>> enables
>>> some (not all) interoperability and mix-n-match.
>>>
>>> The term "API" is creating some confusion in the discussions triggered by
>>> the Clerezza incubator project being noted [1][3] as "low activity"
>>>
>>> To some, it's what the application sees -- a presentation API.  To others
>>> it is some kind of abstraction between machinery like storage, inference,
>>> parsing and writing.  They don't have to be the same.
>>>
>>> Even if the only outcome if parser and stream processing mouldarity, I
>>> think it is worth doing. Just being able to add an external "parser" to
>>> Jena in a cleaner way that is currently possible is useful.
>>>
>>> ** Apache Portable Uniform RDF Runtime (PURR) **
>>>
>>> (OK - the "U" is a bit forced :-)
>>>
>>> To me, what we need is an abstraction that allows multiple
>>> implementations
>>> by swapping the jars (or OGSi bundles).  So PURR is a set of interfaces.
>>>   No state.  c.f. SLJ4J.
>>>
>>> There would be many presentation APIs: Model-like, RDF-ORM, Ontology, and
>>> also for natural use in other JVM-based languages - Scala, Clojure,
>>> whatever is the next JVM language de jour.
>>>
>>> It's not a full application library. It's rather low level.  Writing much
>>> code directly at the interface may not be pretty.
>>>
>>> This is not the Jena graph SPI although that was trying to preform that
>>> purpose but has wider coverage of functionality.  I think we can go more
>>> minimal yet.
>>>
>>> The Jena Graph SPI has a number of handlers - events, stats, transactions
>>> - which seem to make the problem too large.  These would be part of
>>> another
>>> subsystem ("extends PURR") and be different in different providers.  One
>>> of
>>> those would be Jena Graph.
>>>
>>> PURR would provide the basic concepts from RDF:
>>>
>>> Terms: IRIs, Literals, bNodes
>>> Triples and Quads
>>> Graph, Dataset
>>> Factories for each.
>>>
>>> and for each be quite vanilla.
>>>
>>> e.g. a literal is a lexical form, a datatype and an optional language
>>> tag.
>>>   Immutable with getters.  Structural equality.  No value, XSD or
>>> otherwise.
>>>
>>> (I'll do a quick sketch in another message - but don't read it as fixed,
>>> just a concrete discussion point)
>>>
>>> Parsers:
>>>
>>> Parsers, and in the general sense of anything that produced RDF from
>>> whatever input, be it an RDF syntax or mapping another data format (a
>>> conversion process), need and input stream and a factory, and emits
>>> Triples, Quads comprising of terms.  That don't need a full "graph" -
>>> they
>>> need a destination to send Triples/Quads (or be pull parsers).
>>>
>>> Writers:
>>>
>>> Writing is not the reverse of parsing - parsers produce a stream, writers
>>> for Turtle etc need to poke around the graph to decide what will "look
>>> nice".  Even N-triples written clustered by subject can be useful.
>>>
>>> Negatives:
>>>
>>> 1/ It's wildly ambitious and impractical to even consider portability and
>>> abstraction.  Too much time has passed.  Waste of effort.
>>>
>>> 2/ The portability layer is so narrow that it is not helpful.
>>>
>>> 3/ No SPARQL.
>>> (counter: (1) SPARQL is a remote protocol - this is same-JVM).
>>> (counter: (2) develop a SPARQL API using PURR basic terms)
>>>
>>>
>>> Opinions?
>>>
>>>          Andy
>>>
>>>
>>> [1] http://wiki.apache.org/****incubator/November2012<http://wiki.apache.org/**incubator/November2012>
>>> <http://**wiki.apache.org/incubator/**November2012<http://wiki.apache.org/incubator/November2012>
>>> >
>>>
>>> [2] http://s.apache.org/KCv
>>> -->
>>> http://mail-archives.apache.****org/mod_mbox/jena-dev/201210.***
>>> *mbox/%**
>>> 3CC0B6979A3CA668458B697E4EA907****CA940A01AF5C%40CFWEX01.**
>>> americas.cray.com%3E<http://**mail-archives.apache.org/mod_**
>>> mbox/jena-dev/201210.mbox/%**3CC0B6979A3CA668458B697E4EA907**
>>> CA940A01AF5C%40CFWEX01.**americas.cray.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201210.mbox/%3CC0B6979A3CA668458B697E4EA907CA940A01AF5C%40CFWEX01.americas.cray.com%3E>
>>> >
>>>
>>> [3] http://s.apache.org/lK
>>> -->
>>> http://mail-archives.apache.****org/mod_mbox/incubator-**
>>> clerezza-dev/201211.mbox/%****3CCAEWfVJ%3DcKATgo32u-AZDQKq%****
>>> 2BmsaVM_CWRnLo_OLdTYP1jFVzAw%****40mail.gmail.com%3E<http://**
>>> mail-archives.apache.org/mod_**mbox/incubator-clerezza-dev/**
>>> 201211.mbox/%3CCAEWfVJ%**3DcKATgo32u-AZDQKq%2BmsaVM_**
>>> CWRnLo_OLdTYP1jFVzAw%40mail.**gmail.com%3E<http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201211.mbox/%3CCAEWfVJ%3DcKATgo32u-AZDQKq%2BmsaVM_CWRnLo_OLdTYP1jFVzAw%40mail.gmail.com%3E>
>>> >
>>>
>>>
>>
>

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Andy Seaborne <an...@apache.org>.
Reto,

There's lots to be learnt from Clerezza.  Clerezza is a "presentation 
API" - it is aimed at giving applications a programming model.  That it 
also claims to encapsulate other systems is, to the application code, 
secondary - users adopt the Clerezza API in their applications and all 
it's decisions e.g two forms of graph, mutable and imutable as part of 
the type system. Clerezza is stateful, has it's own access permission 
management model, priovides OSGi, and tries to map both ways - it use 
Jena as an implementation of Clerezza and also can expose Clerezza as a 
Jena facade.

I think multiple presentation APIs is healthy.

PURR is lower level. PURR is trying to be simple.  Too complicated (= 
large) and it will not make progress as too many decisions need to made. 
Instead, focus on the one task of being able to have a narrow interface 
for systems like parsers and be a target for presentation APIs.  PURR 
can be as simple as a name mapping layer, if implemented natively; if 
done non-natively, is a single-copy, no state layer.

I'd expect that (in theory) Clerezza could be written over PURR and not 
need to manage the multiple backends itself.  PURR does not provide the 
variation of graphs that Clerezza does, and as I hope is clear from the 
sketch, and instead of deciding whether, say, literals-as-subjects are 
in or out, it takes a neutral/general approach.

(I say "in theory" because (1) there would have to be real value to 
switching and it's not clear to me there is and (2) PURR is small so 
there might be other things not covered Clerezza would want to expose.)

	Andy

On 22/11/12 15:20, Reto Bachmann-Gmür wrote:
> Very glad you've started such a uniformications discussion on the jena
> mailing list. I think it would be good to have such an API adopted by Jena
> as well. I think it would be important for such an API to be based on
> standards and not specifically on triple-stor design as to allow exposing
> other object structures as well as RDF through this API.
>
> A couple of thoughts:
>
> - To keep the API simple I think it should either be quads or datasets. I'd
> go for DataSets as this is part of Standards (Sparql) and see quads as a
> way this can be implemented.
> - Given a DataSet, why not allow Sparql queries against it? (With an
> abstract implementation that locates a query engine and if no such engine
> is found throws a NoQueryEngineFoundException)
>
> Apart from the above differences, is there any part of the clerezza rdf api
> you would implement fundamentally different (I agree the naming should be
> revisited) than in the Clerezza api [1]?
>
> Cheers,
> Reto
>
> http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/index.html
>
>
> On Thu, Nov 22, 2012 at 4:04 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> Rob's comments on inverting the reader process [2] suggest to me pulling
>> out an API and I wonder if we can identify a portability layer that enables
>> some (not all) interoperability and mix-n-match.
>>
>> The term "API" is creating some confusion in the discussions triggered by
>> the Clerezza incubator project being noted [1][3] as "low activity"
>>
>> To some, it's what the application sees -- a presentation API.  To others
>> it is some kind of abstraction between machinery like storage, inference,
>> parsing and writing.  They don't have to be the same.
>>
>> Even if the only outcome if parser and stream processing mouldarity, I
>> think it is worth doing. Just being able to add an external "parser" to
>> Jena in a cleaner way that is currently possible is useful.
>>
>> ** Apache Portable Uniform RDF Runtime (PURR) **
>>
>> (OK - the "U" is a bit forced :-)
>>
>> To me, what we need is an abstraction that allows multiple implementations
>> by swapping the jars (or OGSi bundles).  So PURR is a set of interfaces.
>>   No state.  c.f. SLJ4J.
>>
>> There would be many presentation APIs: Model-like, RDF-ORM, Ontology, and
>> also for natural use in other JVM-based languages - Scala, Clojure,
>> whatever is the next JVM language de jour.
>>
>> It's not a full application library. It's rather low level.  Writing much
>> code directly at the interface may not be pretty.
>>
>> This is not the Jena graph SPI although that was trying to preform that
>> purpose but has wider coverage of functionality.  I think we can go more
>> minimal yet.
>>
>> The Jena Graph SPI has a number of handlers - events, stats, transactions
>> - which seem to make the problem too large.  These would be part of another
>> subsystem ("extends PURR") and be different in different providers.  One of
>> those would be Jena Graph.
>>
>> PURR would provide the basic concepts from RDF:
>>
>> Terms: IRIs, Literals, bNodes
>> Triples and Quads
>> Graph, Dataset
>> Factories for each.
>>
>> and for each be quite vanilla.
>>
>> e.g. a literal is a lexical form, a datatype and an optional language tag.
>>   Immutable with getters.  Structural equality.  No value, XSD or otherwise.
>>
>> (I'll do a quick sketch in another message - but don't read it as fixed,
>> just a concrete discussion point)
>>
>> Parsers:
>>
>> Parsers, and in the general sense of anything that produced RDF from
>> whatever input, be it an RDF syntax or mapping another data format (a
>> conversion process), need and input stream and a factory, and emits
>> Triples, Quads comprising of terms.  That don't need a full "graph" - they
>> need a destination to send Triples/Quads (or be pull parsers).
>>
>> Writers:
>>
>> Writing is not the reverse of parsing - parsers produce a stream, writers
>> for Turtle etc need to poke around the graph to decide what will "look
>> nice".  Even N-triples written clustered by subject can be useful.
>>
>> Negatives:
>>
>> 1/ It's wildly ambitious and impractical to even consider portability and
>> abstraction.  Too much time has passed.  Waste of effort.
>>
>> 2/ The portability layer is so narrow that it is not helpful.
>>
>> 3/ No SPARQL.
>> (counter: (1) SPARQL is a remote protocol - this is same-JVM).
>> (counter: (2) develop a SPARQL API using PURR basic terms)
>>
>>
>> Opinions?
>>
>>          Andy
>>
>>
>> [1] http://wiki.apache.org/**incubator/November2012<http://wiki.apache.org/incubator/November2012>
>>
>> [2] http://s.apache.org/KCv
>> -->
>> http://mail-archives.apache.**org/mod_mbox/jena-dev/201210.**mbox/%**
>> 3CC0B6979A3CA668458B697E4EA907**CA940A01AF5C%40CFWEX01.**
>> americas.cray.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201210.mbox/%3CC0B6979A3CA668458B697E4EA907CA940A01AF5C%40CFWEX01.americas.cray.com%3E>
>>
>> [3] http://s.apache.org/lK
>> -->
>> http://mail-archives.apache.**org/mod_mbox/incubator-**
>> clerezza-dev/201211.mbox/%**3CCAEWfVJ%3DcKATgo32u-AZDQKq%**
>> 2BmsaVM_CWRnLo_OLdTYP1jFVzAw%**40mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201211.mbox/%3CCAEWfVJ%3DcKATgo32u-AZDQKq%2BmsaVM_CWRnLo_OLdTYP1jFVzAw%40mail.gmail.com%3E>
>>
>


Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Very glad you've started such a uniformications discussion on the jena
mailing list. I think it would be good to have such an API adopted by Jena
as well. I think it would be important for such an API to be based on
standards and not specifically on triple-stor design as to allow exposing
other object structures as well as RDF through this API.

A couple of thoughts:

- To keep the API simple I think it should either be quads or datasets. I'd
go for DataSets as this is part of Standards (Sparql) and see quads as a
way this can be implemented.
- Given a DataSet, why not allow Sparql queries against it? (With an
abstract implementation that locates a query engine and if no such engine
is found throws a NoQueryEngineFoundException)

Apart from the above differences, is there any part of the clerezza rdf api
you would implement fundamentally different (I agree the naming should be
revisited) than in the Clerezza api [1]?

Cheers,
Reto

http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/index.html


On Thu, Nov 22, 2012 at 4:04 PM, Andy Seaborne <an...@apache.org> wrote:

> Rob's comments on inverting the reader process [2] suggest to me pulling
> out an API and I wonder if we can identify a portability layer that enables
> some (not all) interoperability and mix-n-match.
>
> The term "API" is creating some confusion in the discussions triggered by
> the Clerezza incubator project being noted [1][3] as "low activity"
>
> To some, it's what the application sees -- a presentation API.  To others
> it is some kind of abstraction between machinery like storage, inference,
> parsing and writing.  They don't have to be the same.
>
> Even if the only outcome if parser and stream processing mouldarity, I
> think it is worth doing. Just being able to add an external "parser" to
> Jena in a cleaner way that is currently possible is useful.
>
> ** Apache Portable Uniform RDF Runtime (PURR) **
>
> (OK - the "U" is a bit forced :-)
>
> To me, what we need is an abstraction that allows multiple implementations
> by swapping the jars (or OGSi bundles).  So PURR is a set of interfaces.
>  No state.  c.f. SLJ4J.
>
> There would be many presentation APIs: Model-like, RDF-ORM, Ontology, and
> also for natural use in other JVM-based languages - Scala, Clojure,
> whatever is the next JVM language de jour.
>
> It's not a full application library. It's rather low level.  Writing much
> code directly at the interface may not be pretty.
>
> This is not the Jena graph SPI although that was trying to preform that
> purpose but has wider coverage of functionality.  I think we can go more
> minimal yet.
>
> The Jena Graph SPI has a number of handlers - events, stats, transactions
> - which seem to make the problem too large.  These would be part of another
> subsystem ("extends PURR") and be different in different providers.  One of
> those would be Jena Graph.
>
> PURR would provide the basic concepts from RDF:
>
> Terms: IRIs, Literals, bNodes
> Triples and Quads
> Graph, Dataset
> Factories for each.
>
> and for each be quite vanilla.
>
> e.g. a literal is a lexical form, a datatype and an optional language tag.
>  Immutable with getters.  Structural equality.  No value, XSD or otherwise.
>
> (I'll do a quick sketch in another message - but don't read it as fixed,
> just a concrete discussion point)
>
> Parsers:
>
> Parsers, and in the general sense of anything that produced RDF from
> whatever input, be it an RDF syntax or mapping another data format (a
> conversion process), need and input stream and a factory, and emits
> Triples, Quads comprising of terms.  That don't need a full "graph" - they
> need a destination to send Triples/Quads (or be pull parsers).
>
> Writers:
>
> Writing is not the reverse of parsing - parsers produce a stream, writers
> for Turtle etc need to poke around the graph to decide what will "look
> nice".  Even N-triples written clustered by subject can be useful.
>
> Negatives:
>
> 1/ It's wildly ambitious and impractical to even consider portability and
> abstraction.  Too much time has passed.  Waste of effort.
>
> 2/ The portability layer is so narrow that it is not helpful.
>
> 3/ No SPARQL.
> (counter: (1) SPARQL is a remote protocol - this is same-JVM).
> (counter: (2) develop a SPARQL API using PURR basic terms)
>
>
> Opinions?
>
>         Andy
>
>
> [1] http://wiki.apache.org/**incubator/November2012<http://wiki.apache.org/incubator/November2012>
>
> [2] http://s.apache.org/KCv
> -->
> http://mail-archives.apache.**org/mod_mbox/jena-dev/201210.**mbox/%**
> 3CC0B6979A3CA668458B697E4EA907**CA940A01AF5C%40CFWEX01.**
> americas.cray.com%3E<http://mail-archives.apache.org/mod_mbox/jena-dev/201210.mbox/%3CC0B6979A3CA668458B697E4EA907CA940A01AF5C%40CFWEX01.americas.cray.com%3E>
>
> [3] http://s.apache.org/lK
> -->
> http://mail-archives.apache.**org/mod_mbox/incubator-**
> clerezza-dev/201211.mbox/%**3CCAEWfVJ%3DcKATgo32u-AZDQKq%**
> 2BmsaVM_CWRnLo_OLdTYP1jFVzAw%**40mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201211.mbox/%3CCAEWfVJ%3DcKATgo32u-AZDQKq%2BmsaVM_CWRnLo_OLdTYP1jFVzAw%40mail.gmail.com%3E>
>

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 26, 2012 at 8:48 PM, Claude Warren <cl...@xenei.com> wrote:

> my 2 cents:
>
> First:
> I prefer interface based approaches as I find that over time I have
> had fewer problems with them.  For things that should be calculated in
> a specific way a static Utility class within the interface can provide
> implementations of many things.
>
> Second:
> if equals() is being defined then hashCode() must also so that
> x.equals(y) => x.hashcode() == y.hashcode().


Yes, that's why an interface defining equals so that instances of different
implementations can be equals must also define hashCode() as this is done
by:

http://incubator.apache.org/clerezza/mvn-site/org.apache.clerezza.rdf.core/apidocs/org/apache/clerezza/rdf/core/Graph.html#hashCode%28%29


> The converse is not necessarily true.
>
Cannot be as there are more than 2^32 graphs.

Reto

On Mon, Nov 26, 2012 at 4:30 PM, Sergio Fernández
<se...@salzburgresearch.at> wrote:
> Hi,
>
>
> On 26/11/12 04:13, Paolo Castagna wrote:
>>
>> I suppose this will be a module within Jena. Would Any23 or Marmotta use
>> it
>> or contribute to it?
>
>
> For sure. In fact, we are asserting that in the proposal:
>
> "Apache Jena could become the RDF API used throughout Marmotta; an
> architectural decision is yet to be taken."
>
> Right now we are using Sesame because a simple technical reason: FMPOV
> intefaces-based approach is much simpler and cleaner approach. But, as
soon
> as we can achieve an agreement on that, and of course from Marmotta we'll
> support and contribute to it, the easier our life.
>
> Cheers,
>
> --
> Sergio Fernández
> Salzburg Research
> +43 662 2288 318
> Jakob-Haringer Strasse 5/II
> A-5020 Salzburg (Austria)
> http://www.salzburgresearch.at



--
> I like: Like Like - The likeliest place on the web
> Identity: https://www.identify.nu/user.php?claude@xenei.com
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Claude Warren <cl...@xenei.com>.
my 2 cents:

First:
I prefer interface based approaches as I find that over time I have
had fewer problems with them.  For things that should be calculated in
a specific way a static Utility class within the interface can provide
implementations of many things.

Second:
if equals() is being defined then hashCode() must also so that
x.equals(y) => x.hashcode() == y.hashcode().  The converse is not
necessarily true.

-- Claude


On Mon, Nov 26, 2012 at 4:30 PM, Sergio Fernández
<se...@salzburgresearch.at> wrote:
> Hi,
>
>
> On 26/11/12 04:13, Paolo Castagna wrote:
>>
>> I suppose this will be a module within Jena. Would Any23 or Marmotta use
>> it
>> or contribute to it?
>
>
> For sure. In fact, we are asserting that in the proposal:
>
> "Apache Jena could become the RDF API used throughout Marmotta; an
> architectural decision is yet to be taken."
>
> Right now we are using Sesame because a simple technical reason: FMPOV
> intefaces-based approach is much simpler and cleaner approach. But, as soon
> as we can achieve an agreement on that, and of course from Marmotta we'll
> support and contribute to it, the easier our life.
>
> Cheers,
>
> --
> Sergio Fernández
> Salzburg Research
> +43 662 2288 318
> Jakob-Haringer Strasse 5/II
> A-5020 Salzburg (Austria)
> http://www.salzburgresearch.at



-- 
I like: Like Like - The likeliest place on the web
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Andy Seaborne <an...@apache.org>.
On 26/11/12 16:30, Sergio Fernández wrote:
> Hi,
>
> On 26/11/12 04:13, Paolo Castagna wrote:
>> I suppose this will be a module within Jena. Would Any23 or Marmotta
>> use it
>> or contribute to it?
>
> For sure. In fact, we are asserting that in the proposal:
>
> "Apache Jena could become the RDF API used throughout Marmotta; an
> architectural decision is yet to be taken."
>
> Right now we are using Sesame because a simple technical reason: FMPOV
> intefaces-based approach is much simpler and cleaner approach. But, as
> soon as we can achieve an agreement on that, and of course from Marmotta
> we'll support and contribute to it, the easier our life.
>
> Cheers,

As Marmotta champion:

It's nice to add in possible connections but they are just that 
"possible".  Apache does not require reuse. or even have a technical 
opinion.  This sometimes confuses people outside who come from "big 
architecture" backgrounds.

Although of course it's easier to fix/improve/influence another Apache 
project than an outside one.

</champion>

When Marmotta has booted up on Aapche, let's talk about what 
requirements you have of API and what facilities Marmotta uses.

I've had an enquiry about implementing LDP on Fuseki but it seemed to me 
that the first step is work out what LDP (+ a lot of extra stuff you'll 
need to make it usable - it's not a complete system spec) is.  I thought 
there was a bit of a gap between an LDP system and raw dataset storage.

	Andy



Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Sergio Fernández <se...@salzburgresearch.at>.
Hi,

On 26/11/12 04:13, Paolo Castagna wrote:
> I suppose this will be a module within Jena. Would Any23 or Marmotta use it
> or contribute to it?

For sure. In fact, we are asserting that in the proposal:

"Apache Jena could become the RDF API used throughout Marmotta; an 
architectural decision is yet to be taken."

Right now we are using Sesame because a simple technical reason: FMPOV 
intefaces-based approach is much simpler and cleaner approach. But, as 
soon as we can achieve an agreement on that, and of course from Marmotta 
we'll support and contribute to it, the easier our life.

Cheers,

-- 
Sergio Fernández
Salzburg Research
+43 662 2288 318
Jakob-Haringer Strasse 5/II
A-5020 Salzburg (Austria)
http://www.salzburgresearch.at

Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Andy Seaborne <an...@apache.org>.
On 26/11/12 03:13, Paolo Castagna wrote:
>> >Opinions?
> Considering projects such as Any23 (currently not using Jena) and Marmotta
> (about entering incubation and not using Jena), it's s good thing to try
> doing.
>
> I suppose this will be a module within Jena. Would Any23 or Marmotta use it
> or contribute to it?

Ideally, it's long term future would be separate from Jena otherwise 
it's harder to use.

In the immediate term, it has to be proven to useful at least in Jena.

	Andy

>
> Paolo
>
>> >
>> >         Andy


Re: [Discuss] Apache Portable Uniform RDF Runtime (PURR)

Posted by Paolo Castagna <ca...@gmail.com>.
On 22 Nov 2012 15:04, "Andy Seaborne" <an...@apache.org> wrote:
>
> Rob's comments on inverting the reader process [2] suggest to me pulling
out an API and I wonder if we can identify a portability layer that enables
some (not all) interoperability and mix-n-match.
>
> The term "API" is creating some confusion in the discussions triggered by
the Clerezza incubator project being noted [1][3] as "low activity"
>
> To some, it's what the application sees -- a presentation API.  To others
it is some kind of abstraction between machinery like storage, inference,
parsing and writing.  They don't have to be the same.
>
> Even if the only outcome if parser and stream processing mouldarity, I
think it is worth doing. Just being able to add an external "parser" to
Jena in a cleaner way that is currently possible is useful.
>
> ** Apache Portable Uniform RDF Runtime (PURR) **
>
> (OK - the "U" is a bit forced :-)
>
> To me, what we need is an abstraction that allows multiple
implementations by swapping the jars (or OGSi bundles).  So PURR is a set
of interfaces.  No state.  c.f. SLJ4J.
>
> There would be many presentation APIs: Model-like, RDF-ORM, Ontology, and
also for natural use in other JVM-based languages - Scala, Clojure,
whatever is the next JVM language de jour.
>
> It's not a full application library. It's rather low level.  Writing much
code directly at the interface may not be pretty.
>
> This is not the Jena graph SPI although that was trying to preform that
purpose but has wider coverage of functionality.  I think we can go more
minimal yet.
>
> The Jena Graph SPI has a number of handlers - events, stats, transactions
- which seem to make the problem too large.  These would be part of another
subsystem ("extends PURR") and be different in different providers.  One of
those would be Jena Graph.
>
> PURR would provide the basic concepts from RDF:
>
> Terms: IRIs, Literals, bNodes
> Triples and Quads
> Graph, Dataset
> Factories for each.
>
> and for each be quite vanilla.
>
> e.g. a literal is a lexical form, a datatype and an optional language
tag.  Immutable with getters.  Structural equality.  No value, XSD or
otherwise.
>
> (I'll do a quick sketch in another message - but don't read it as fixed,
just a concrete discussion point)
>
> Parsers:
>
> Parsers, and in the general sense of anything that produced RDF from
whatever input, be it an RDF syntax or mapping another data format (a
conversion process), need and input stream and a factory, and emits
> Triples, Quads comprising of terms.  That don't need a full "graph" -
they need a destination to send Triples/Quads (or be pull parsers).
>
> Writers:
>
> Writing is not the reverse of parsing - parsers produce a stream, writers
for Turtle etc need to poke around the graph to decide what will "look
nice".  Even N-triples written clustered by subject can be useful.
>
> Negatives:
>
> 1/ It's wildly ambitious and impractical to even consider portability and
abstraction.  Too much time has passed.  Waste of effort.
>
> 2/ The portability layer is so narrow that it is not helpful.
>
> 3/ No SPARQL.
> (counter: (1) SPARQL is a remote protocol - this is same-JVM).
> (counter: (2) develop a SPARQL API using PURR basic terms)
>
>
> Opinions?

Considering projects such as Any23 (currently not using Jena) and Marmotta
(about entering incubation and not using Jena), it's s good thing to try
doing.

I suppose this will be a module within Jena. Would Any23 or Marmotta use it
or contribute to it?

Paolo

>
>         Andy
>
>
> [1] http://wiki.apache.org/incubator/November2012
>
> [2] http://s.apache.org/KCv
> -->
>
http://mail-archives.apache.org/mod_mbox/jena-dev/201210.mbox/%3CC0B6979A3CA668458B697E4EA907CA940A01AF5C%40CFWEX01.americas.cray.com%3E
>
> [3] http://s.apache.org/lK
> -->
>
http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201211.mbox/%3CCAEWfVJ%3DcKATgo32u-AZDQKq%2BmsaVM_CWRnLo_OLdTYP1jFVzAw%40mail.gmail.com%3E