You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@clerezza.apache.org by Reto Bachmann-Gmür <re...@apache.org> on 2012/11/08 13:00:27 UTC

Future of Clerezza and Stanbol

Ok, sorry for jumping into this discussion so lately. I've been having
quite some discussion on the matter here at apacheconeu. Also I had
prositive feedback from my resentation of Clerezza yesterday.

I think two things:
- For high level platform component it is often not clear if the fit better
into Stanbol or into Clerezza
- The RDF Api shoud actually be independen both from triple store provider
as well as from consumer

So I think a good solution would be to have the RDF liraries comprising:
- A modular and very spec oriented API for RDF and related standards
- A set of serializing and parsing providers
- Adapters to triple stores (where the api isn't provided by the triple
store)
basically that's what in the org.apache.clerezza.rdf.* packages

That's the stuff that would fit well into Stanbol. Provided that stanbol
drops it's interretation of "REST" as "not for humans" and want to go to
allow integrating (wherever possible as modular and optional components)
media types designed for human consumptions and support REST approaches
there as well (thinking of the current back-button unfriendly UI).
- Scala Server Pages
- TypeRendering (selection of templates based on the rdf type of the
returned response)
- Security (already integrated to some degree, code based security to run
bundles in a sandboxed manner is not)
- Shell (already ships in the stanbol launcher, so here it's about
'adopting' the sources)
- Dev tools: rapid development support (create sample projects, have source
files as bundles)

To the attic:
- Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
same support (jax-rs components asosgi services) is now provided by apache
wink
-  jssr 223 support

In my opinion there is no urgent need for action, it is true that there
hasn't been a lot of action in clerezza but imho the project os going on
even at a low pace  (as other projects like e.g. the recently graduated
wink).

Cheers,
Reto

On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <bdelacretaz@apache.org
> wrote:

> On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org> wrote:
> > ...It's good to have the existing released artifacts remain - what about
> after
> > the donation?
> >
> > Presumably the moved modules will be released by the new host - will they
> > use group id org.apache.clerezza? or move to the new host project group
> id?
> > I'd suggest renaming the group to the new project but realise it is a bit
> > more disruptive...
>
> I think that's really up to whatever project adopts that code. In
> theory package names should change but that's probably not convenient.
>
> Or maybe it's time to create a semantic module or two at
> http://commons.apache.org/ ? If existing committers are willing to
> support that with their work it should be easy to make it happen.
>
> -Bertrand
>

Re: Future of Clerezza and Stanbol

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> ...Not sure about no urgent need for action. Maybe we should list the
> requirements to fulfil in order to be able to graduate. Wonder if we are able to meet
> them...

As a mentor, I think Clerezza is technically ready to graduate.

My worry is graduating with something that doesn't look sustainable,
as it looks to me like the people working on the CMS parts of Clerezza
are not active anymore, and I'm not sure if the other parts are large
enough to warrant a top-level project.

At this point, best might be for someone who's more familiar than me
with the Clerezza code to create a (wiki?) list of the existing module
categories (reusable RDF libs, triple store adapters, templating,
security, CMS, etc.) indicate who/which projects areusing those
modules, and who from the Clerezza committers intends to continue
working on them. We could then better see the options.

-Bertrand

Re: Future of Clerezza and Stanbol

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> Comments inline...

Let's continue the discussion on clerezza-dev@incubator.apache.org for
those who are interested.

-Bertrand

Re: Features

Posted by Reto Bachmann-Gmür <re...@apache.org>.
In Clerezza the graph check for read-permission on read and
readwrite-permission on add and delete. As the name suggests readwrite
permission implies read permission.

The reason for that is that the add-method returns false is a triple was
already in the graph so even a pure write permission would leak information
on what's in the graph.

I think security should be done with standard jaas and is independent of
particular interfaces in the api.

Cheers,
Reto
On Nov 16, 2012 2:45 PM, "Claude Warren" <cl...@xenei.com> wrote:

> >> Immutable Graphs: I had really problems to get this right and the
> >> current Clerezza API does not help with that task (resulting in things
> >> like read-only mutable graphs that are no Graphs as they only provide
> >> a read-only view on a Graph that might still be changed by other
> >> means). I think read-only Graphs (like
> >> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> >> use case to protect a returned graph from modifications by the caller
> >> of the method is much more prominent as truly immutable graphs.
>
> I am currently working on a set of dynamic proxies in an attempt to
> add security all all layers of the Jena stack.  I currently have the
> graph layer complete and the model layer 50% done.
>
> My thought is that in addition to having read only you might want to
> have write only (I know that sounds strange but I've seen such in
> DBs).  The upshot is that I would put full CRUD restriction
> capabilities within the system.
>
> I'm not sure that it will work but I thought I would give it a try.  I
> think that something needs to be done in this arena to go along with
> the Fuseki security discussion I saw awhile back.
>
> Claude
>

Re: Features

Posted by Andy Seaborne <an...@apache.org>.
On 16/11/12 13:44, Claude Warren wrote:
>>> Immutable Graphs: I had really problems to get this right and the
>>> current Clerezza API does not help with that task (resulting in things
>>> like read-only mutable graphs that are no Graphs as they only provide
>>> a read-only view on a Graph that might still be changed by other
>>> means). I think read-only Graphs (like
>>> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
>>> use case to protect a returned graph from modifications by the caller
>>> of the method is much more prominent as truly immutable graphs.
>
> I am currently working on a set of dynamic proxies in an attempt to
> add security all all layers of the Jena stack.  I currently have the
> graph layer complete and the model layer 50% done.
>
> My thought is that in addition to having read only you might want to
> have write only (I know that sounds strange but I've seen such in
> DBs).  The upshot is that I would put full CRUD restriction
> capabilities within the system.
>
> I'm not sure that it will work but I thought I would give it a try.  I
> think that something needs to be done in this arena to go along with
> the Fuseki security discussion I saw awhile back.
>
> Claude
>

Excellent.

In case it helps:

com.hp.hpl.jena.sparql.core.DatasetGraphReadOnly

and friend:

com.hp.hpl.jena.sparql.graph.GraphReadOnly

Not perfect.


TDB, for read transactions enforces readonly at a lower level: 
BlockMgrBuilderReadonly and NodeTableBuilderReadonly.

As BlockMgrs and NodeTables are all datastructures for TRDB, if those 
two are readonly, the DB is immutable (for a view on to it).

	Andy


Re: Features

Posted by Claude Warren <cl...@xenei.com>.
>> Immutable Graphs: I had really problems to get this right and the
>> current Clerezza API does not help with that task (resulting in things
>> like read-only mutable graphs that are no Graphs as they only provide
>> a read-only view on a Graph that might still be changed by other
>> means). I think read-only Graphs (like
>> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
>> use case to protect a returned graph from modifications by the caller
>> of the method is much more prominent as truly immutable graphs.

I am currently working on a set of dynamic proxies in an attempt to
add security all all layers of the Jena stack.  I currently have the
graph layer complete and the model layer 50% done.

My thought is that in addition to having read only you might want to
have write only (I know that sounds strange but I've seen such in
DBs).  The upshot is that I would put full CRUD restriction
capabilities within the system.

I'm not sure that it will work but I thought I would give it a try.  I
think that something needs to be done in this arena to go along with
the Fuseki security discussion I saw awhile back.

Claude

Features

Posted by Andy Seaborne <an...@apache.org>.
(NB multiple dev@ mailing lists)

On 13/11/12 12:13, Rupert Westenthaler wrote:
> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:

Thanks - these are interesting to hear.

>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.

Yes - a factory is needed if interfaces.

Whether they should interfaces or whether fixed classes is an 
interesting design point.  I can see arguments both ways.

The argument for interfaces is presumably different implementations for 
different storage layers (e.g. with hidden internal pointers related to 
the storage).  It is also a case of "it's the Java way".

But two RDFTerms (resources) are equal by value - if they have the same 
IRI they are equal and equality is tied to putting in Java collections.

I think the consequence is that a specific subsystem can't assume that 
RDF terms passed to it automatically must have come from that component. 
  Theer's not

And some RDF terms are

[[RDF Term is the term invented in SPARQL to cover 
resources/bnodes/literals because there wasn't one in RDF : resource is 
used for "web resource" so either the thing being described, not its 
name, and/or as a general concept, not specific to RDF ]]


Interesting design point for literals is value vs lexical form/datatype. 
It is the value that matters (OK - should matter), whether it's written 
"+1"^^xsd:integer  or "01"^^xsd:byte.  Does any one have a use case 
example where the derived datatype matters semantically?

> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.

That is what an RDF dataset is supposed to be, but it's not completely 
transparent - just working with the default graph is very much like 
working with one graph.

The full-blown quads-in-graph would be N3-style formulae, where a graph 
nodes can be a graph.  Also called "graph literals".

At this point, they are not going to happen for RDF but if building an 
API or component, I would at least put the hooks in for it to prepare 
for a possible future.

> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.

Agreed - and memory is a precious resource at scale.  It's usually 
better to give it to the data storage to avoid I/O.  Too much overhead 
in higher level APIs keeping state competes with the I/O caching.

	Andy

>
> best
> Rupert


Features

Posted by Andy Seaborne <an...@apache.org>.
(NB multiple dev@ mailing lists)

On 13/11/12 12:13, Rupert Westenthaler wrote:
> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:

Thanks - these are interesting to hear.

>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.

Yes - a factory is needed if interfaces.

Whether they should interfaces or whether fixed classes is an 
interesting design point.  I can see arguments both ways.

The argument for interfaces is presumably different implementations for 
different storage layers (e.g. with hidden internal pointers related to 
the storage).  It is also a case of "it's the Java way".

But two RDFTerms (resources) are equal by value - if they have the same 
IRI they are equal and equality is tied to putting in Java collections.

I think the consequence is that a specific subsystem can't assume that 
RDF terms passed to it automatically must have come from that component. 
  Theer's not

And some RDF terms are

[[RDF Term is the term invented in SPARQL to cover 
resources/bnodes/literals because there wasn't one in RDF : resource is 
used for "web resource" so either the thing being described, not its 
name, and/or as a general concept, not specific to RDF ]]


Interesting design point for literals is value vs lexical form/datatype. 
It is the value that matters (OK - should matter), whether it's written 
"+1"^^xsd:integer  or "01"^^xsd:byte.  Does any one have a use case 
example where the derived datatype matters semantically?

> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.

That is what an RDF dataset is supposed to be, but it's not completely 
transparent - just working with the default graph is very much like 
working with one graph.

The full-blown quads-in-graph would be N3-style formulae, where a graph 
nodes can be a graph.  Also called "graph literals".

At this point, they are not going to happen for RDF but if building an 
API or component, I would at least put the hooks in for it to prepare 
for a possible future.

> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.

Agreed - and memory is a precious resource at scale.  It's usually 
better to give it to the data storage to avoid I/O.  Too much overhead 
in higher level APIs keeping state competes with the I/O caching.

	Andy

>
> best
> Rupert


Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Wed, Nov 14, 2012 at 2:13 PM, Andy Seaborne <an...@apache.org> wrote:

> On 13/11/12 13:21, Reto Bachmann-Gmür wrote:
>
>> SPARQL: I would not deal with parsing SPARQL queries but rather
>>> >forward them as is to the underlaying implementation. If doing so the
>>> >API would only need to border with result sets. This would also avoid
>>> >the need to deal with "Datasets". This is not arguing against a
>>> >fallback (e.g. the trick Clerezza does by using the Jena SPARQL
>>> >implementation) but in practice efficient SPARQL executions can only
>>> >happen natively within the TripleStore. Trying to do otherwise will
>>> >only trick users into use cases that will not scale.
>>> >
>>>
>> +1 for sparql fastlane, some parsing is still needed to see if the query
>> is
>> against graphs that all come from one and the same backend and which one
>> this is.
>>
>
> I don't understand that (a lack of knowledge of zz probbaly) - what is the
> parser looking for?  The target for a query execution can be defined
> separately from the query, and so the query string does not need analysis.
>  Sort of local version of protocol: endpoint + query.
>

You need to know to which backend to forward the query and if the query is
against graphs from multiple backends it cannot be fastlaned.

Reto

>
>         Andy
>

Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
On 13/11/12 13:21, Reto Bachmann-Gmür wrote:
>> SPARQL: I would not deal with parsing SPARQL queries but rather
>> >forward them as is to the underlaying implementation. If doing so the
>> >API would only need to border with result sets. This would also avoid
>> >the need to deal with "Datasets". This is not arguing against a
>> >fallback (e.g. the trick Clerezza does by using the Jena SPARQL
>> >implementation) but in practice efficient SPARQL executions can only
>> >happen natively within the TripleStore. Trying to do otherwise will
>> >only trick users into use cases that will not scale.
>> >
> +1 for sparql fastlane, some parsing is still needed to see if the query is
> against graphs that all come from one and the same backend and which one
> this is.

I don't understand that (a lack of knowledge of zz probbaly) - what is 
the parser looking for?  The target for a query execution can be defined 
separately from the query, and so the query string does not need 
analysis.  Sort of local version of protocol: endpoint + query.

	Andy

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Tue, Nov 13, 2012 at 2:54 PM, Minto van der Sluis <mi...@xup.nl> wrote:

>
> Is it possible to have multiple backends at runtime?

Yes, If you have multiple multiple instances of WeightedTcProvider the
graphs will all be available via the TcManager, new Graphs will always be
created in the TcProvider with the highest Weight. In fact in a standard
clerezza instance there are multiple backends, one in which new graphs are
created (the TDB based one) and other providing read-only "virtual" graphs.


> How can this be achieved? Is it also possible to move graphs from one
> backend to another? This might be what I am looking for, moving a graph
> from an in-memory to a persistent backend.
>

Moving a graph to be available to a new Backend (TcProvider) but under the
same name would be trickier. Just copying the triples is trivial:

targetMGraph.addAll(sourceGraph).

Cheers,
Reto

Re: Future of Clerezza and Stanbol

Posted by Minto van der Sluis <mi...@xup.nl>.
> +1 for sparql fastlane, some parsing is still needed to see if the query is
> against graphs that all come from one and the same backend and which one
> this is.
>
> Cheers,
> Reto
>
Is it possible to have multiple backends at runtime? How can this be achieved? Is it also possible to move graphs from one backend to another? This might be what I am looking for, moving a graph from an in-memory to a persistent backend.

Regards,

Minto


Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Tue, Nov 13, 2012 at 1:13 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:
>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.
>
> Such a factory should not be tied to Graph, as any Resource object should
be usable with any Graph


> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>

Wenn for the mapped nodes no mapping is needed, a map is need only for the
bnodes originating from another source and only as long as these object ar
referenced from elsewere (so the map has to contain only week references).
These is typically a small number of bnodes and an id wouldn't help unless
you assume id-based cross graph identity of bnodes (or you tie the bndoes
to a graph but if you do this you need no id either).


>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.


> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>

I suggest you propose usecases for which the implementation with different
APIs can be proposed and the pros and cons evaluated. I think a quad-view
can easily be implemented on top of a DataSet and of coure the
implementation is free to use quads internaly.


>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>

It's about the different identity criterion. The identity of graph is
clearly defined in the RDF specs but applies only to graphs that do not
change over time (respectively to time-slices of graphs that do). As I
already wrote a motivating usecase here is to have easy way to do
synchronization and diffs over decomposed graphs (i.e. the individual
immutable graphs are MSG as they are used in RDFSync). of Course this is no
hindrance and ortogonal to havin a
TripleCollection.getImmutableTripleCollection(...).


>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.
>
+1 for sparql fastlane, some parsing is still needed to see if the query is
against graphs that all come from one and the same backend and which one
this is.

Cheers,
Reto

>
> best
> Rupert
>
> On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <re...@wymiwyg.com>
> wrote:
> > On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:
> >
> >> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> >>
> >>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org>
> wrote:
> >>>
> >>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
> >>>>
> >>>>  RDF libs:
> >>>>> ====
> >>>>>
> >>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> >>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
> >>>>> Standards to evolve quite a bit in the coming years and I do have
> >>>>> concern that the Clerezza RDF modules will be updated/extended to
> >>>>> provide implementations of those. One example of such an situation is
> >>>>> SPARQL 1.1 that is around for quite some time and is still not
> >>>>> supported by Clerezza. While I do like the small API, the flexibility
> >>>>> to use different TripleStores and that Clerezza comes with OSGI
> >>>>> support I think given the current situation we would need to discuss
> >>>>> all options and those do also include a switch to Apache Jena or
> >>>>> Sesame. Especially Sesame would be an attractive option as their RDF
> >>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> >>>>> counterparts (Model [2] and Graph [3]) are considerable different and
> >>>>> more complex interfaces. In addition Jena will only change to
> >>>>> org.apache packages with the next major release so a switch before
> >>>>> that release would mean two incompatible API changes.
> >>>>>
> >>>>>
> >>>> Jena isn't changing the packaging as such -- what we've discussed is
> >>>> providing a package for the current API and then a new, org.apache
> API.
> >>>>   The new API may be much the same as the existing one or it may be
> >>>> different - that depends on contributions made!
> >>>>
> >>>>
> >>> I didn't know about jena planning to introduce such a common API.
> >>>
> >>>
> >>>> I'd like to hear more about your experiences esp. with Graph API as
> that
> >>>> is supposed to be quite simple - it's targeted at storage extensions
> as
> >>>> well as supporting the richer Model API.  Personally, aside from the
> fact
> >>>> that Clerreza enforces slot constraints (no literals as subjects), the
> >>>> Jena
> >>>> Graph API and Clerezza RDF core API seem reasonably aligned.
> >>>>
> >>>>
> >>> Yes the slot constraints comes from the RDF abstract syntax. In my
> opinion
> >>> it's something one could decide to relax, by adding appropriate
> owl:sameAs
> >>> bnode any graph could be transformed to an rdf-abstract-syntax
> compliant
> >>> one. So maybe have a GnereicTripleCollection that can be converted to
> an
> >>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> >>> this is allowed by the abstract syntax might be the easiest.
> >>>
> >>
> >> At the core, unconstrained slots has worked best for us.
> >>
> >
> > The question is shall this be part of a common API. For machinering doing
> > inference and dealing with the meaning of RDF graphs resources should
> also
> > be associated to a set of IRIs (that serialize into oswl:sameAs).
> >
> >
> >>
> >> Then either:
> >>
> >> 1/ have a test like:
> >>   Triple.isValidRDF
> >>
> >> 2/ Layer an app API to impose the constraints (but it's easy to run out
> of
> >> good names).
> >>
> >
> > The clerezza API would be such a layer.
> >
> >
> >>
> >>
> >> The Graph/Node/Triple level in Jena is an API but it's primary role is
> the
> >> other side, to storage and inference, not apps.
> >>
> >> Generality gives
> >> A/ Future proofing (not perfect)
> >> B/ Arises in inference and query naturally.
> >> C/ using RDF structures for processing RDF
> >>
> >> Nodes in triples can be variables, and I would have found it useful to
> >> have marker nodes to be able to build structures e.g. "known to be
> bound at
> >> this point in a query".  As it was, I ended up creating parallel
> structures.
> >>
> >>
> >>  Where I see advantages of the clerezza API:
> >>> - Bases on collections framework so standard tools can be used for
> graphs
> >>>
> >>
> >> Given a core system API, a scala and clojure and even different Java
> APIs
> >> for difefrent styles are all possible.
> >>
> >
> > Right. That's why I propose having a minimum API and decorators as to
> > provide scala interfacing or the resource api for java ( which
> corresponds
> > more or less to the W3C RDF API draft)
> >
> >
> >>
> >> A universal API across systems is about plugging in machinery (parser,
> >> query engines, storage, inference).  It's good to separate that from
> >> application APIs otherwise there is a design tension.
> >
> > I'm wondering if there need to be specia hooks for inference or if this
> > cannot just as well be done by simply wrapping the graphs.
> >
> >
> >>
> >>
> >>  - Immutable graphs follow identity criterion of RDF semantics, this
> allows
> >>> graph component to be added to sets and more straight forwardly
> implement
> >>> diff and patch algorithms
> >>> - BNode have no ids: apart from promoting the usage of URIs where this
> is
> >>> appropriate it allows behind the scenes leanification and saves memory
> >>> where the backend doesn't hast such ids.
> >>>
> >>
> >> We have argued about this before.
> >>
> >> + As you have objects, there is a concept of identity (you can tell two
> >> bNodes apart).
> >>
> > No, two bnodes might be indistinguisgibe as in
> >
> > a :knows b
> > b : knows a
> >
> > You cannot tell them apart even though none of them can be leanified away
> >
> >
> >> + For persistence, an internal id is necessary to reconstruct
> consistently
> >> with caches.
> >>
> >
> > Here we are talking about some implementation stuff that imho should be
> > separate from API discussion. Do you accept my Toy-usecase challenge [1],
> > if we leave the classical dedicate triple store usecase scenario the id
> > quickly becomes something that makes things harder rather than easier.
> >
> >
> >> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going
> to
> >> be removed.  It's information reduction, not data reduction.
> >>
> >
> > It simply arises from bnodes being existential variables. If they are
> > eredined to be something else then I have difficulties to see what
> > advantages they wold still offer to named nodes (maybe in some slolem:
> uri
> > scheme)
> >
> >
> >> + There will be a have a skolemization Note from RDF-WG to deal with the
> >> practical matters of dealing with bNodes.
> >>
> >> RDF as data model for linked data.
> >>
> >> Its a datastructure with good properties for combining.  And it has
> links.
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>>> (for generalised systems such as rules engine - and for SPARQL -
> triples
> >>>> can arise with extras like literals as subjects; they get removed
> later)
> >>>>
> >>>
> >>>
> >>> If this shall be an API for interoperability based on RDF standard I'm
> >>> wonder if is shall be possible to expose such intermediate constructs.
> >>>
> >>
> >> My suggestion is that the API for interoperability is designed to
> support
> >> RDF standards.
> >>
> >> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
> >>
> >
> > Datasets are an element of the relevant sparql spec, I don't see Quads.
> >
> >
> >>
> >> But also storage, SPARQL (Query and Update), and web access (e.g.
> conneg).
> >>
> >
> > Clerezza is very stong on conneg but I don't think this would be part of
> > the rdf core api, but rather of the parts that could be part of Stanbol
> and
> > provide a Linked Data Platform Container (LDPC).
> >
> >
> > Reto
> >
> > 1.
> >
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Tue, Nov 13, 2012 at 1:13 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:
>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.
>
> Such a factory should not be tied to Graph, as any Resource object should
be usable with any Graph


> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>

Wenn for the mapped nodes no mapping is needed, a map is need only for the
bnodes originating from another source and only as long as these object ar
referenced from elsewere (so the map has to contain only week references).
These is typically a small number of bnodes and an id wouldn't help unless
you assume id-based cross graph identity of bnodes (or you tie the bndoes
to a graph but if you do this you need no id either).


>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.


> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>

I suggest you propose usecases for which the implementation with different
APIs can be proposed and the pros and cons evaluated. I think a quad-view
can easily be implemented on top of a DataSet and of coure the
implementation is free to use quads internaly.


>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>

It's about the different identity criterion. The identity of graph is
clearly defined in the RDF specs but applies only to graphs that do not
change over time (respectively to time-slices of graphs that do). As I
already wrote a motivating usecase here is to have easy way to do
synchronization and diffs over decomposed graphs (i.e. the individual
immutable graphs are MSG as they are used in RDFSync). of Course this is no
hindrance and ortogonal to havin a
TripleCollection.getImmutableTripleCollection(...).


>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.
>
+1 for sparql fastlane, some parsing is still needed to see if the query is
against graphs that all come from one and the same backend and which one
this is.

Cheers,
Reto

>
> best
> Rupert
>
> On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <re...@wymiwyg.com>
> wrote:
> > On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:
> >
> >> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> >>
> >>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org>
> wrote:
> >>>
> >>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
> >>>>
> >>>>  RDF libs:
> >>>>> ====
> >>>>>
> >>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> >>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
> >>>>> Standards to evolve quite a bit in the coming years and I do have
> >>>>> concern that the Clerezza RDF modules will be updated/extended to
> >>>>> provide implementations of those. One example of such an situation is
> >>>>> SPARQL 1.1 that is around for quite some time and is still not
> >>>>> supported by Clerezza. While I do like the small API, the flexibility
> >>>>> to use different TripleStores and that Clerezza comes with OSGI
> >>>>> support I think given the current situation we would need to discuss
> >>>>> all options and those do also include a switch to Apache Jena or
> >>>>> Sesame. Especially Sesame would be an attractive option as their RDF
> >>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> >>>>> counterparts (Model [2] and Graph [3]) are considerable different and
> >>>>> more complex interfaces. In addition Jena will only change to
> >>>>> org.apache packages with the next major release so a switch before
> >>>>> that release would mean two incompatible API changes.
> >>>>>
> >>>>>
> >>>> Jena isn't changing the packaging as such -- what we've discussed is
> >>>> providing a package for the current API and then a new, org.apache
> API.
> >>>>   The new API may be much the same as the existing one or it may be
> >>>> different - that depends on contributions made!
> >>>>
> >>>>
> >>> I didn't know about jena planning to introduce such a common API.
> >>>
> >>>
> >>>> I'd like to hear more about your experiences esp. with Graph API as
> that
> >>>> is supposed to be quite simple - it's targeted at storage extensions
> as
> >>>> well as supporting the richer Model API.  Personally, aside from the
> fact
> >>>> that Clerreza enforces slot constraints (no literals as subjects), the
> >>>> Jena
> >>>> Graph API and Clerezza RDF core API seem reasonably aligned.
> >>>>
> >>>>
> >>> Yes the slot constraints comes from the RDF abstract syntax. In my
> opinion
> >>> it's something one could decide to relax, by adding appropriate
> owl:sameAs
> >>> bnode any graph could be transformed to an rdf-abstract-syntax
> compliant
> >>> one. So maybe have a GnereicTripleCollection that can be converted to
> an
> >>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> >>> this is allowed by the abstract syntax might be the easiest.
> >>>
> >>
> >> At the core, unconstrained slots has worked best for us.
> >>
> >
> > The question is shall this be part of a common API. For machinering doing
> > inference and dealing with the meaning of RDF graphs resources should
> also
> > be associated to a set of IRIs (that serialize into oswl:sameAs).
> >
> >
> >>
> >> Then either:
> >>
> >> 1/ have a test like:
> >>   Triple.isValidRDF
> >>
> >> 2/ Layer an app API to impose the constraints (but it's easy to run out
> of
> >> good names).
> >>
> >
> > The clerezza API would be such a layer.
> >
> >
> >>
> >>
> >> The Graph/Node/Triple level in Jena is an API but it's primary role is
> the
> >> other side, to storage and inference, not apps.
> >>
> >> Generality gives
> >> A/ Future proofing (not perfect)
> >> B/ Arises in inference and query naturally.
> >> C/ using RDF structures for processing RDF
> >>
> >> Nodes in triples can be variables, and I would have found it useful to
> >> have marker nodes to be able to build structures e.g. "known to be
> bound at
> >> this point in a query".  As it was, I ended up creating parallel
> structures.
> >>
> >>
> >>  Where I see advantages of the clerezza API:
> >>> - Bases on collections framework so standard tools can be used for
> graphs
> >>>
> >>
> >> Given a core system API, a scala and clojure and even different Java
> APIs
> >> for difefrent styles are all possible.
> >>
> >
> > Right. That's why I propose having a minimum API and decorators as to
> > provide scala interfacing or the resource api for java ( which
> corresponds
> > more or less to the W3C RDF API draft)
> >
> >
> >>
> >> A universal API across systems is about plugging in machinery (parser,
> >> query engines, storage, inference).  It's good to separate that from
> >> application APIs otherwise there is a design tension.
> >
> > I'm wondering if there need to be specia hooks for inference or if this
> > cannot just as well be done by simply wrapping the graphs.
> >
> >
> >>
> >>
> >>  - Immutable graphs follow identity criterion of RDF semantics, this
> allows
> >>> graph component to be added to sets and more straight forwardly
> implement
> >>> diff and patch algorithms
> >>> - BNode have no ids: apart from promoting the usage of URIs where this
> is
> >>> appropriate it allows behind the scenes leanification and saves memory
> >>> where the backend doesn't hast such ids.
> >>>
> >>
> >> We have argued about this before.
> >>
> >> + As you have objects, there is a concept of identity (you can tell two
> >> bNodes apart).
> >>
> > No, two bnodes might be indistinguisgibe as in
> >
> > a :knows b
> > b : knows a
> >
> > You cannot tell them apart even though none of them can be leanified away
> >
> >
> >> + For persistence, an internal id is necessary to reconstruct
> consistently
> >> with caches.
> >>
> >
> > Here we are talking about some implementation stuff that imho should be
> > separate from API discussion. Do you accept my Toy-usecase challenge [1],
> > if we leave the classical dedicate triple store usecase scenario the id
> > quickly becomes something that makes things harder rather than easier.
> >
> >
> >> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going
> to
> >> be removed.  It's information reduction, not data reduction.
> >>
> >
> > It simply arises from bnodes being existential variables. If they are
> > eredined to be something else then I have difficulties to see what
> > advantages they wold still offer to named nodes (maybe in some slolem:
> uri
> > scheme)
> >
> >
> >> + There will be a have a skolemization Note from RDF-WG to deal with the
> >> practical matters of dealing with bNodes.
> >>
> >> RDF as data model for linked data.
> >>
> >> Its a datastructure with good properties for combining.  And it has
> links.
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>>> (for generalised systems such as rules engine - and for SPARQL -
> triples
> >>>> can arise with extras like literals as subjects; they get removed
> later)
> >>>>
> >>>
> >>>
> >>> If this shall be an API for interoperability based on RDF standard I'm
> >>> wonder if is shall be possible to expose such intermediate constructs.
> >>>
> >>
> >> My suggestion is that the API for interoperability is designed to
> support
> >> RDF standards.
> >>
> >> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
> >>
> >
> > Datasets are an element of the relevant sparql spec, I don't see Quads.
> >
> >
> >>
> >> But also storage, SPARQL (Query and Update), and web access (e.g.
> conneg).
> >>
> >
> > Clerezza is very stong on conneg but I don't think this would be part of
> > the rdf core api, but rather of the parts that could be part of Stanbol
> and
> > provide a Linked Data Platform Container (LDPC).
> >
> >
> > Reto
> >
> > 1.
> >
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

I would like to share some thoughts/comments and suggestions from my side:

ResourceFactory: Clerezza is missing a Factory for RDF resources. I
would like to have such a Factory. The Factory should be obtainable
via the Graph - the Collection of Triples. IMO such a Factory is
required if all resource types (IRI, Bnode, Literal) are represented
by interfaces.

BNodes: If Bnode is an interface than any implementation is free to
internally use a "bnode-id". One argument pro such ids (that was not
yet mentioned) is that such id's allow you to avoid in-memory mappings
for bnodes when wrapping an native implementation. In Clerezza you
currently need to have this Bidi maps.

Triple, Quads: While for some use cases the Triple-in-Graph based API
(Quad := Triple t =
TripleStore#getGraph(context).filter(subject,predicate,object)) is
sufficient this is no longer the case as soon as Applications want to
work with an Graph that contains Quads with several contexts. So I
would vote for having support for Quads.

Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
looks at the Triples) and Graph (how RDF looks at the Triples) are not
so different. Because of that I would like to have a single domain
object fitting for both. The API should focus on the Graph aspects (as
Clerezza does) while still allowing efficient implementations that do
not load all triples into memory (e.g. use closeable iterators)

Immutable Graphs: I had really problems to get this right and the
current Clerezza API does not help with that task (resulting in things
like read-only mutable graphs that are no Graphs as they only provide
a read-only view on a Graph that might still be changed by other
means). I think read-only Graphs (like
Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
use case to protect a returned graph from modifications by the caller
of the method is much more prominent as truly immutable graphs.

SPARQL: I would not deal with parsing SPARQL queries but rather
forward them as is to the underlaying implementation. If doing so the
API would only need to border with result sets. This would also avoid
the need to deal with "Datasets". This is not arguing against a
fallback (e.g. the trick Clerezza does by using the Jena SPARQL
implementation) but in practice efficient SPARQL executions can only
happen natively within the TripleStore. Trying to do otherwise will
only trick users into use cases that will not scale.

best
Rupert

On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <re...@wymiwyg.com> wrote:
> On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>>
>>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>>
>>>>  RDF libs:
>>>>> ====
>>>>>
>>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>>> Standards to evolve quite a bit in the coming years and I do have
>>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>>> provide implementations of those. One example of such an situation is
>>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>>> support I think given the current situation we would need to discuss
>>>>> all options and those do also include a switch to Apache Jena or
>>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>>> more complex interfaces. In addition Jena will only change to
>>>>> org.apache packages with the next major release so a switch before
>>>>> that release would mean two incompatible API changes.
>>>>>
>>>>>
>>>> Jena isn't changing the packaging as such -- what we've discussed is
>>>> providing a package for the current API and then a new, org.apache API.
>>>>   The new API may be much the same as the existing one or it may be
>>>> different - that depends on contributions made!
>>>>
>>>>
>>> I didn't know about jena planning to introduce such a common API.
>>>
>>>
>>>> I'd like to hear more about your experiences esp. with Graph API as that
>>>> is supposed to be quite simple - it's targeted at storage extensions as
>>>> well as supporting the richer Model API.  Personally, aside from the fact
>>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>>> Jena
>>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>>
>>>>
>>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>>> it's something one could decide to relax, by adding appropriate owl:sameAs
>>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>>> one. So maybe have a GnereicTripleCollection that can be converted to an
>>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>>> this is allowed by the abstract syntax might be the easiest.
>>>
>>
>> At the core, unconstrained slots has worked best for us.
>>
>
> The question is shall this be part of a common API. For machinering doing
> inference and dealing with the meaning of RDF graphs resources should also
> be associated to a set of IRIs (that serialize into oswl:sameAs).
>
>
>>
>> Then either:
>>
>> 1/ have a test like:
>>   Triple.isValidRDF
>>
>> 2/ Layer an app API to impose the constraints (but it's easy to run out of
>> good names).
>>
>
> The clerezza API would be such a layer.
>
>
>>
>>
>> The Graph/Node/Triple level in Jena is an API but it's primary role is the
>> other side, to storage and inference, not apps.
>>
>> Generality gives
>> A/ Future proofing (not perfect)
>> B/ Arises in inference and query naturally.
>> C/ using RDF structures for processing RDF
>>
>> Nodes in triples can be variables, and I would have found it useful to
>> have marker nodes to be able to build structures e.g. "known to be bound at
>> this point in a query".  As it was, I ended up creating parallel structures.
>>
>>
>>  Where I see advantages of the clerezza API:
>>> - Bases on collections framework so standard tools can be used for graphs
>>>
>>
>> Given a core system API, a scala and clojure and even different Java APIs
>> for difefrent styles are all possible.
>>
>
> Right. That's why I propose having a minimum API and decorators as to
> provide scala interfacing or the resource api for java ( which corresponds
> more or less to the W3C RDF API draft)
>
>
>>
>> A universal API across systems is about plugging in machinery (parser,
>> query engines, storage, inference).  It's good to separate that from
>> application APIs otherwise there is a design tension.
>
> I'm wondering if there need to be specia hooks for inference or if this
> cannot just as well be done by simply wrapping the graphs.
>
>
>>
>>
>>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>>> graph component to be added to sets and more straight forwardly implement
>>> diff and patch algorithms
>>> - BNode have no ids: apart from promoting the usage of URIs where this is
>>> appropriate it allows behind the scenes leanification and saves memory
>>> where the backend doesn't hast such ids.
>>>
>>
>> We have argued about this before.
>>
>> + As you have objects, there is a concept of identity (you can tell two
>> bNodes apart).
>>
> No, two bnodes might be indistinguisgibe as in
>
> a :knows b
> b : knows a
>
> You cannot tell them apart even though none of them can be leanified away
>
>
>> + For persistence, an internal id is necessary to reconstruct consistently
>> with caches.
>>
>
> Here we are talking about some implementation stuff that imho should be
> separate from API discussion. Do you accept my Toy-usecase challenge [1],
> if we leave the classical dedicate triple store usecase scenario the id
> quickly becomes something that makes things harder rather than easier.
>
>
>> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
>> be removed.  It's information reduction, not data reduction.
>>
>
> It simply arises from bnodes being existential variables. If they are
> eredined to be something else then I have difficulties to see what
> advantages they wold still offer to named nodes (maybe in some slolem: uri
> scheme)
>
>
>> + There will be a have a skolemization Note from RDF-WG to deal with the
>> practical matters of dealing with bNodes.
>>
>> RDF as data model for linked data.
>>
>> Its a datastructure with good properties for combining.  And it has links.
>>
>>
>>
>>>
>>>
>>>
>>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>>> can arise with extras like literals as subjects; they get removed later)
>>>>
>>>
>>>
>>> If this shall be an API for interoperability based on RDF standard I'm
>>> wonder if is shall be possible to expose such intermediate constructs.
>>>
>>
>> My suggestion is that the API for interoperability is designed to support
>> RDF standards.
>>
>> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>>
>
> Datasets are an element of the relevant sparql spec, I don't see Quads.
>
>
>>
>> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>>
>
> Clerezza is very stong on conneg but I don't think this would be part of
> the rdf core api, but rather of the parts that could be part of Stanbol and
> provide a Linked Data Platform Container (LDPC).
>
>
> Reto
>
> 1.
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

I would like to share some thoughts/comments and suggestions from my side:

ResourceFactory: Clerezza is missing a Factory for RDF resources. I
would like to have such a Factory. The Factory should be obtainable
via the Graph - the Collection of Triples. IMO such a Factory is
required if all resource types (IRI, Bnode, Literal) are represented
by interfaces.

BNodes: If Bnode is an interface than any implementation is free to
internally use a "bnode-id". One argument pro such ids (that was not
yet mentioned) is that such id's allow you to avoid in-memory mappings
for bnodes when wrapping an native implementation. In Clerezza you
currently need to have this Bidi maps.

Triple, Quads: While for some use cases the Triple-in-Graph based API
(Quad := Triple t =
TripleStore#getGraph(context).filter(subject,predicate,object)) is
sufficient this is no longer the case as soon as Applications want to
work with an Graph that contains Quads with several contexts. So I
would vote for having support for Quads.

Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
looks at the Triples) and Graph (how RDF looks at the Triples) are not
so different. Because of that I would like to have a single domain
object fitting for both. The API should focus on the Graph aspects (as
Clerezza does) while still allowing efficient implementations that do
not load all triples into memory (e.g. use closeable iterators)

Immutable Graphs: I had really problems to get this right and the
current Clerezza API does not help with that task (resulting in things
like read-only mutable graphs that are no Graphs as they only provide
a read-only view on a Graph that might still be changed by other
means). I think read-only Graphs (like
Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
use case to protect a returned graph from modifications by the caller
of the method is much more prominent as truly immutable graphs.

SPARQL: I would not deal with parsing SPARQL queries but rather
forward them as is to the underlaying implementation. If doing so the
API would only need to border with result sets. This would also avoid
the need to deal with "Datasets". This is not arguing against a
fallback (e.g. the trick Clerezza does by using the Jena SPARQL
implementation) but in practice efficient SPARQL executions can only
happen natively within the TripleStore. Trying to do otherwise will
only trick users into use cases that will not scale.

best
Rupert

On Tue, Nov 13, 2012 at 9:08 AM, Reto Bachmann-Gmür <re...@wymiwyg.com> wrote:
> On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>>
>>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>>
>>>>  RDF libs:
>>>>> ====
>>>>>
>>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>>> Standards to evolve quite a bit in the coming years and I do have
>>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>>> provide implementations of those. One example of such an situation is
>>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>>> support I think given the current situation we would need to discuss
>>>>> all options and those do also include a switch to Apache Jena or
>>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>>> more complex interfaces. In addition Jena will only change to
>>>>> org.apache packages with the next major release so a switch before
>>>>> that release would mean two incompatible API changes.
>>>>>
>>>>>
>>>> Jena isn't changing the packaging as such -- what we've discussed is
>>>> providing a package for the current API and then a new, org.apache API.
>>>>   The new API may be much the same as the existing one or it may be
>>>> different - that depends on contributions made!
>>>>
>>>>
>>> I didn't know about jena planning to introduce such a common API.
>>>
>>>
>>>> I'd like to hear more about your experiences esp. with Graph API as that
>>>> is supposed to be quite simple - it's targeted at storage extensions as
>>>> well as supporting the richer Model API.  Personally, aside from the fact
>>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>>> Jena
>>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>>
>>>>
>>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>>> it's something one could decide to relax, by adding appropriate owl:sameAs
>>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>>> one. So maybe have a GnereicTripleCollection that can be converted to an
>>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>>> this is allowed by the abstract syntax might be the easiest.
>>>
>>
>> At the core, unconstrained slots has worked best for us.
>>
>
> The question is shall this be part of a common API. For machinering doing
> inference and dealing with the meaning of RDF graphs resources should also
> be associated to a set of IRIs (that serialize into oswl:sameAs).
>
>
>>
>> Then either:
>>
>> 1/ have a test like:
>>   Triple.isValidRDF
>>
>> 2/ Layer an app API to impose the constraints (but it's easy to run out of
>> good names).
>>
>
> The clerezza API would be such a layer.
>
>
>>
>>
>> The Graph/Node/Triple level in Jena is an API but it's primary role is the
>> other side, to storage and inference, not apps.
>>
>> Generality gives
>> A/ Future proofing (not perfect)
>> B/ Arises in inference and query naturally.
>> C/ using RDF structures for processing RDF
>>
>> Nodes in triples can be variables, and I would have found it useful to
>> have marker nodes to be able to build structures e.g. "known to be bound at
>> this point in a query".  As it was, I ended up creating parallel structures.
>>
>>
>>  Where I see advantages of the clerezza API:
>>> - Bases on collections framework so standard tools can be used for graphs
>>>
>>
>> Given a core system API, a scala and clojure and even different Java APIs
>> for difefrent styles are all possible.
>>
>
> Right. That's why I propose having a minimum API and decorators as to
> provide scala interfacing or the resource api for java ( which corresponds
> more or less to the W3C RDF API draft)
>
>
>>
>> A universal API across systems is about plugging in machinery (parser,
>> query engines, storage, inference).  It's good to separate that from
>> application APIs otherwise there is a design tension.
>
> I'm wondering if there need to be specia hooks for inference or if this
> cannot just as well be done by simply wrapping the graphs.
>
>
>>
>>
>>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>>> graph component to be added to sets and more straight forwardly implement
>>> diff and patch algorithms
>>> - BNode have no ids: apart from promoting the usage of URIs where this is
>>> appropriate it allows behind the scenes leanification and saves memory
>>> where the backend doesn't hast such ids.
>>>
>>
>> We have argued about this before.
>>
>> + As you have objects, there is a concept of identity (you can tell two
>> bNodes apart).
>>
> No, two bnodes might be indistinguisgibe as in
>
> a :knows b
> b : knows a
>
> You cannot tell them apart even though none of them can be leanified away
>
>
>> + For persistence, an internal id is necessary to reconstruct consistently
>> with caches.
>>
>
> Here we are talking about some implementation stuff that imho should be
> separate from API discussion. Do you accept my Toy-usecase challenge [1],
> if we leave the classical dedicate triple store usecase scenario the id
> quickly becomes something that makes things harder rather than easier.
>
>
>> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
>> be removed.  It's information reduction, not data reduction.
>>
>
> It simply arises from bnodes being existential variables. If they are
> eredined to be something else then I have difficulties to see what
> advantages they wold still offer to named nodes (maybe in some slolem: uri
> scheme)
>
>
>> + There will be a have a skolemization Note from RDF-WG to deal with the
>> practical matters of dealing with bNodes.
>>
>> RDF as data model for linked data.
>>
>> Its a datastructure with good properties for combining.  And it has links.
>>
>>
>>
>>>
>>>
>>>
>>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>>> can arise with extras like literals as subjects; they get removed later)
>>>>
>>>
>>>
>>> If this shall be an API for interoperability based on RDF standard I'm
>>> wonder if is shall be possible to expose such intermediate constructs.
>>>
>>
>> My suggestion is that the API for interoperability is designed to support
>> RDF standards.
>>
>> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>>
>
> Datasets are an element of the relevant sparql spec, I don't see Quads.
>
>
>>
>> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>>
>
> Clerezza is very stong on conneg but I don't think this would be part of
> the rdf core api, but rather of the parts that could be part of Stanbol and
> provide a Linked Data Platform Container (LDPC).
>
>
> Reto
>
> 1.
> http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@wymiwyg.com>.
On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:

> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>
>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>
>>>  RDF libs:
>>>> ====
>>>>
>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>> Standards to evolve quite a bit in the coming years and I do have
>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>> provide implementations of those. One example of such an situation is
>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>> support I think given the current situation we would need to discuss
>>>> all options and those do also include a switch to Apache Jena or
>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>> more complex interfaces. In addition Jena will only change to
>>>> org.apache packages with the next major release so a switch before
>>>> that release would mean two incompatible API changes.
>>>>
>>>>
>>> Jena isn't changing the packaging as such -- what we've discussed is
>>> providing a package for the current API and then a new, org.apache API.
>>>   The new API may be much the same as the existing one or it may be
>>> different - that depends on contributions made!
>>>
>>>
>> I didn't know about jena planning to introduce such a common API.
>>
>>
>>> I'd like to hear more about your experiences esp. with Graph API as that
>>> is supposed to be quite simple - it's targeted at storage extensions as
>>> well as supporting the richer Model API.  Personally, aside from the fact
>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>> Jena
>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>
>>>
>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>> it's something one could decide to relax, by adding appropriate owl:sameAs
>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>> one. So maybe have a GnereicTripleCollection that can be converted to an
>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>> this is allowed by the abstract syntax might be the easiest.
>>
>
> At the core, unconstrained slots has worked best for us.
>

The question is shall this be part of a common API. For machinering doing
inference and dealing with the meaning of RDF graphs resources should also
be associated to a set of IRIs (that serialize into oswl:sameAs).


>
> Then either:
>
> 1/ have a test like:
>   Triple.isValidRDF
>
> 2/ Layer an app API to impose the constraints (but it's easy to run out of
> good names).
>

The clerezza API would be such a layer.


>
>
> The Graph/Node/Triple level in Jena is an API but it's primary role is the
> other side, to storage and inference, not apps.
>
> Generality gives
> A/ Future proofing (not perfect)
> B/ Arises in inference and query naturally.
> C/ using RDF structures for processing RDF
>
> Nodes in triples can be variables, and I would have found it useful to
> have marker nodes to be able to build structures e.g. "known to be bound at
> this point in a query".  As it was, I ended up creating parallel structures.
>
>
>  Where I see advantages of the clerezza API:
>> - Bases on collections framework so standard tools can be used for graphs
>>
>
> Given a core system API, a scala and clojure and even different Java APIs
> for difefrent styles are all possible.
>

Right. That's why I propose having a minimum API and decorators as to
provide scala interfacing or the resource api for java ( which corresponds
more or less to the W3C RDF API draft)


>
> A universal API across systems is about plugging in machinery (parser,
> query engines, storage, inference).  It's good to separate that from
> application APIs otherwise there is a design tension.

I'm wondering if there need to be specia hooks for inference or if this
cannot just as well be done by simply wrapping the graphs.


>
>
>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>> graph component to be added to sets and more straight forwardly implement
>> diff and patch algorithms
>> - BNode have no ids: apart from promoting the usage of URIs where this is
>> appropriate it allows behind the scenes leanification and saves memory
>> where the backend doesn't hast such ids.
>>
>
> We have argued about this before.
>
> + As you have objects, there is a concept of identity (you can tell two
> bNodes apart).
>
No, two bnodes might be indistinguisgibe as in

a :knows b
b : knows a

You cannot tell them apart even though none of them can be leanified away


> + For persistence, an internal id is necessary to reconstruct consistently
> with caches.
>

Here we are talking about some implementation stuff that imho should be
separate from API discussion. Do you accept my Toy-usecase challenge [1],
if we leave the classical dedicate triple store usecase scenario the id
quickly becomes something that makes things harder rather than easier.


> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
> be removed.  It's information reduction, not data reduction.
>

It simply arises from bnodes being existential variables. If they are
eredined to be something else then I have difficulties to see what
advantages they wold still offer to named nodes (maybe in some slolem: uri
scheme)


> + There will be a have a skolemization Note from RDF-WG to deal with the
> practical matters of dealing with bNodes.
>
> RDF as data model for linked data.
>
> Its a datastructure with good properties for combining.  And it has links.
>
>
>
>>
>>
>>
>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>> can arise with extras like literals as subjects; they get removed later)
>>>
>>
>>
>> If this shall be an API for interoperability based on RDF standard I'm
>> wonder if is shall be possible to expose such intermediate constructs.
>>
>
> My suggestion is that the API for interoperability is designed to support
> RDF standards.
>
> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>

Datasets are an element of the relevant sparql spec, I don't see Quads.


>
> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>

Clerezza is very stong on conneg but I don't think this would be part of
the rdf core api, but rather of the parts that could be part of Stanbol and
provide a Linked Data Platform Container (LDPC).


Reto

1.
http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@wymiwyg.com>.
On Mon, Nov 12, 2012 at 10:40 PM, Andy Seaborne <an...@apache.org> wrote:

> On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
>
>> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  On 09/11/12 09:56, Rupert Westenthaler wrote:
>>>
>>>  RDF libs:
>>>> ====
>>>>
>>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>>> Standards to evolve quite a bit in the coming years and I do have
>>>> concern that the Clerezza RDF modules will be updated/extended to
>>>> provide implementations of those. One example of such an situation is
>>>> SPARQL 1.1 that is around for quite some time and is still not
>>>> supported by Clerezza. While I do like the small API, the flexibility
>>>> to use different TripleStores and that Clerezza comes with OSGI
>>>> support I think given the current situation we would need to discuss
>>>> all options and those do also include a switch to Apache Jena or
>>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>>> more complex interfaces. In addition Jena will only change to
>>>> org.apache packages with the next major release so a switch before
>>>> that release would mean two incompatible API changes.
>>>>
>>>>
>>> Jena isn't changing the packaging as such -- what we've discussed is
>>> providing a package for the current API and then a new, org.apache API.
>>>   The new API may be much the same as the existing one or it may be
>>> different - that depends on contributions made!
>>>
>>>
>> I didn't know about jena planning to introduce such a common API.
>>
>>
>>> I'd like to hear more about your experiences esp. with Graph API as that
>>> is supposed to be quite simple - it's targeted at storage extensions as
>>> well as supporting the richer Model API.  Personally, aside from the fact
>>> that Clerreza enforces slot constraints (no literals as subjects), the
>>> Jena
>>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>>
>>>
>> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
>> it's something one could decide to relax, by adding appropriate owl:sameAs
>> bnode any graph could be transformed to an rdf-abstract-syntax compliant
>> one. So maybe have a GnereicTripleCollection that can be converted to an
>> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
>> this is allowed by the abstract syntax might be the easiest.
>>
>
> At the core, unconstrained slots has worked best for us.
>

The question is shall this be part of a common API. For machinering doing
inference and dealing with the meaning of RDF graphs resources should also
be associated to a set of IRIs (that serialize into oswl:sameAs).


>
> Then either:
>
> 1/ have a test like:
>   Triple.isValidRDF
>
> 2/ Layer an app API to impose the constraints (but it's easy to run out of
> good names).
>

The clerezza API would be such a layer.


>
>
> The Graph/Node/Triple level in Jena is an API but it's primary role is the
> other side, to storage and inference, not apps.
>
> Generality gives
> A/ Future proofing (not perfect)
> B/ Arises in inference and query naturally.
> C/ using RDF structures for processing RDF
>
> Nodes in triples can be variables, and I would have found it useful to
> have marker nodes to be able to build structures e.g. "known to be bound at
> this point in a query".  As it was, I ended up creating parallel structures.
>
>
>  Where I see advantages of the clerezza API:
>> - Bases on collections framework so standard tools can be used for graphs
>>
>
> Given a core system API, a scala and clojure and even different Java APIs
> for difefrent styles are all possible.
>

Right. That's why I propose having a minimum API and decorators as to
provide scala interfacing or the resource api for java ( which corresponds
more or less to the W3C RDF API draft)


>
> A universal API across systems is about plugging in machinery (parser,
> query engines, storage, inference).  It's good to separate that from
> application APIs otherwise there is a design tension.

I'm wondering if there need to be specia hooks for inference or if this
cannot just as well be done by simply wrapping the graphs.


>
>
>  - Immutable graphs follow identity criterion of RDF semantics, this allows
>> graph component to be added to sets and more straight forwardly implement
>> diff and patch algorithms
>> - BNode have no ids: apart from promoting the usage of URIs where this is
>> appropriate it allows behind the scenes leanification and saves memory
>> where the backend doesn't hast such ids.
>>
>
> We have argued about this before.
>
> + As you have objects, there is a concept of identity (you can tell two
> bNodes apart).
>
No, two bnodes might be indistinguisgibe as in

a :knows b
b : knows a

You cannot tell them apart even though none of them can be leanified away


> + For persistence, an internal id is necessary to reconstruct consistently
> with caches.
>

Here we are talking about some implementation stuff that imho should be
separate from API discussion. Do you accept my Toy-usecase challenge [1],
if we leave the classical dedicate triple store usecase scenario the id
quickly becomes something that makes things harder rather than easier.


> + Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going to
> be removed.  It's information reduction, not data reduction.
>

It simply arises from bnodes being existential variables. If they are
eredined to be something else then I have difficulties to see what
advantages they wold still offer to named nodes (maybe in some slolem: uri
scheme)


> + There will be a have a skolemization Note from RDF-WG to deal with the
> practical matters of dealing with bNodes.
>
> RDF as data model for linked data.
>
> Its a datastructure with good properties for combining.  And it has links.
>
>
>
>>
>>
>>
>>> (for generalised systems such as rules engine - and for SPARQL - triples
>>> can arise with extras like literals as subjects; they get removed later)
>>>
>>
>>
>> If this shall be an API for interoperability based on RDF standard I'm
>> wonder if is shall be possible to expose such intermediate constructs.
>>
>
> My suggestion is that the API for interoperability is designed to support
> RDF standards.
>
> The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.
>

Datasets are an element of the relevant sparql spec, I don't see Quads.


>
> But also storage, SPARQL (Query and Update), and web access (e.g. conneg).
>

Clerezza is very stong on conneg but I don't think this would be part of
the rdf core api, but rather of the parts that could be part of Stanbol and
provide a Linked Data Platform Container (LDPC).


Reto

1.
http://mail-archives.apache.org/mod_mbox/stanbol-dev/201211.mbox/%3CCALvhUEUfOd-mLBh-%3DXkwbLAJHBcboE963hDxv6g0jHNPj6cxPQ%40mail.gmail.com%3E

Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 09/11/12 09:56, Rupert Westenthaler wrote:
>>
>>> RDF libs:
>>> ====
>>>
>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>> Standards to evolve quite a bit in the coming years and I do have
>>> concern that the Clerezza RDF modules will be updated/extended to
>>> provide implementations of those. One example of such an situation is
>>> SPARQL 1.1 that is around for quite some time and is still not
>>> supported by Clerezza. While I do like the small API, the flexibility
>>> to use different TripleStores and that Clerezza comes with OSGI
>>> support I think given the current situation we would need to discuss
>>> all options and those do also include a switch to Apache Jena or
>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>> more complex interfaces. In addition Jena will only change to
>>> org.apache packages with the next major release so a switch before
>>> that release would mean two incompatible API changes.
>>>
>>
>> Jena isn't changing the packaging as such -- what we've discussed is
>> providing a package for the current API and then a new, org.apache API.
>>   The new API may be much the same as the existing one or it may be
>> different - that depends on contributions made!
>>
>
> I didn't know about jena planning to introduce such a common API.
>
>>
>> I'd like to hear more about your experiences esp. with Graph API as that
>> is supposed to be quite simple - it's targeted at storage extensions as
>> well as supporting the richer Model API.  Personally, aside from the fact
>> that Clerreza enforces slot constraints (no literals as subjects), the Jena
>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>
>
> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
> it's something one could decide to relax, by adding appropriate owl:sameAs
> bnode any graph could be transformed to an rdf-abstract-syntax compliant
> one. So maybe have a GnereicTripleCollection that can be converted to an
> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> this is allowed by the abstract syntax might be the easiest.

At the core, unconstrained slots has worked best for us.

Then either:

1/ have a test like:
   Triple.isValidRDF

2/ Layer an app API to impose the constraints (but it's easy to run out 
of good names).


The Graph/Node/Triple level in Jena is an API but it's primary role is 
the other side, to storage and inference, not apps.

Generality gives
A/ Future proofing (not perfect)
B/ Arises in inference and query naturally.
C/ using RDF structures for processing RDF

Nodes in triples can be variables, and I would have found it useful to 
have marker nodes to be able to build structures e.g. "known to be bound 
at this point in a query".  As it was, I ended up creating parallel 
structures.

> Where I see advantages of the clerezza API:
> - Bases on collections framework so standard tools can be used for graphs

Given a core system API, a scala and clojure and even different Java 
APIs for difefrent styles are all possible.

A universal API across systems is about plugging in machinery (parser, 
query engines, storage, inference).  It's good to separate that from 
application APIs otherwise there is a design tension.

> - Immutable graphs follow identity criterion of RDF semantics, this allows
> graph component to be added to sets and more straight forwardly implement
> diff and patch algorithms
> - BNode have no ids: apart from promoting the usage of URIs where this is
> appropriate it allows behind the scenes leanification and saves memory
> where the backend doesn't hast such ids.

We have argued about this before.

+ As you have objects, there is a concept of identity (you can tell two 
bNodes apart).
+ For persistence, an internal id is necessary to reconstruct 
consistently with caches.
+ Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going 
to be removed.  It's information reduction, not data reduction.
+ There will be a have a skolemization Note from RDF-WG to deal with the 
practical matters of dealing with bNodes.

RDF as data model for linked data.

Its a datastructure with good properties for combining.  And it has links.

>
>
>
>>
>> (for generalised systems such as rules engine - and for SPARQL - triples
>> can arise with extras like literals as subjects; they get removed later)
>
>
> If this shall be an API for interoperability based on RDF standard I'm
> wonder if is shall be possible to expose such intermediate constructs.

My suggestion is that the API for interoperability is designed to 
support RDF standards.

The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.

But also storage, SPARQL (Query and Update), and web access (e.g. conneg).

(and inference but it seems to me that inference have adopted more 
"individual" (data object), not triplem, styles)

>
> Reto
>

	Andy


Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
On 12/11/12 19:42, Reto Bachmann-Gmür wrote:
> On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 09/11/12 09:56, Rupert Westenthaler wrote:
>>
>>> RDF libs:
>>> ====
>>>
>>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>>> Standards to evolve quite a bit in the coming years and I do have
>>> concern that the Clerezza RDF modules will be updated/extended to
>>> provide implementations of those. One example of such an situation is
>>> SPARQL 1.1 that is around for quite some time and is still not
>>> supported by Clerezza. While I do like the small API, the flexibility
>>> to use different TripleStores and that Clerezza comes with OSGI
>>> support I think given the current situation we would need to discuss
>>> all options and those do also include a switch to Apache Jena or
>>> Sesame. Especially Sesame would be an attractive option as their RDF
>>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>>> counterparts (Model [2] and Graph [3]) are considerable different and
>>> more complex interfaces. In addition Jena will only change to
>>> org.apache packages with the next major release so a switch before
>>> that release would mean two incompatible API changes.
>>>
>>
>> Jena isn't changing the packaging as such -- what we've discussed is
>> providing a package for the current API and then a new, org.apache API.
>>   The new API may be much the same as the existing one or it may be
>> different - that depends on contributions made!
>>
>
> I didn't know about jena planning to introduce such a common API.
>
>>
>> I'd like to hear more about your experiences esp. with Graph API as that
>> is supposed to be quite simple - it's targeted at storage extensions as
>> well as supporting the richer Model API.  Personally, aside from the fact
>> that Clerreza enforces slot constraints (no literals as subjects), the Jena
>> Graph API and Clerezza RDF core API seem reasonably aligned.
>>
>
> Yes the slot constraints comes from the RDF abstract syntax. In my opinion
> it's something one could decide to relax, by adding appropriate owl:sameAs
> bnode any graph could be transformed to an rdf-abstract-syntax compliant
> one. So maybe have a GnereicTripleCollection that can be converted to an
> RDFTRipleCollection - not sure. Just sticking to the spec and wait till
> this is allowed by the abstract syntax might be the easiest.

At the core, unconstrained slots has worked best for us.

Then either:

1/ have a test like:
   Triple.isValidRDF

2/ Layer an app API to impose the constraints (but it's easy to run out 
of good names).


The Graph/Node/Triple level in Jena is an API but it's primary role is 
the other side, to storage and inference, not apps.

Generality gives
A/ Future proofing (not perfect)
B/ Arises in inference and query naturally.
C/ using RDF structures for processing RDF

Nodes in triples can be variables, and I would have found it useful to 
have marker nodes to be able to build structures e.g. "known to be bound 
at this point in a query".  As it was, I ended up creating parallel 
structures.

> Where I see advantages of the clerezza API:
> - Bases on collections framework so standard tools can be used for graphs

Given a core system API, a scala and clojure and even different Java 
APIs for difefrent styles are all possible.

A universal API across systems is about plugging in machinery (parser, 
query engines, storage, inference).  It's good to separate that from 
application APIs otherwise there is a design tension.

> - Immutable graphs follow identity criterion of RDF semantics, this allows
> graph component to be added to sets and more straight forwardly implement
> diff and patch algorithms
> - BNode have no ids: apart from promoting the usage of URIs where this is
> appropriate it allows behind the scenes leanification and saves memory
> where the backend doesn't hast such ids.

We have argued about this before.

+ As you have objects, there is a concept of identity (you can tell two 
bNodes apart).
+ For persistence, an internal id is necessary to reconstruct 
consistently with caches.
+ Leaning isn't a core feature of RDF.  In fact, IIRC, mention is going 
to be removed.  It's information reduction, not data reduction.
+ There will be a have a skolemization Note from RDF-WG to deal with the 
practical matters of dealing with bNodes.

RDF as data model for linked data.

Its a datastructure with good properties for combining.  And it has links.

>
>
>
>>
>> (for generalised systems such as rules engine - and for SPARQL - triples
>> can arise with extras like literals as subjects; they get removed later)
>
>
> If this shall be an API for interoperability based on RDF standard I'm
> wonder if is shall be possible to expose such intermediate constructs.

My suggestion is that the API for interoperability is designed to 
support RDF standards.

The key elements are IRIs, literals, Triples, Quads, Graphs, Datasets.

But also storage, SPARQL (Query and Update), and web access (e.g. conneg).

(and inference but it seems to me that inference have adopted more 
"individual" (data object), not triplem, styles)

>
> Reto
>

	Andy


Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:

> On 09/11/12 09:56, Rupert Westenthaler wrote:
>
>> RDF libs:
>> ====
>>
>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>> Standards to evolve quite a bit in the coming years and I do have
>> concern that the Clerezza RDF modules will be updated/extended to
>> provide implementations of those. One example of such an situation is
>> SPARQL 1.1 that is around for quite some time and is still not
>> supported by Clerezza. While I do like the small API, the flexibility
>> to use different TripleStores and that Clerezza comes with OSGI
>> support I think given the current situation we would need to discuss
>> all options and those do also include a switch to Apache Jena or
>> Sesame. Especially Sesame would be an attractive option as their RDF
>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>> counterparts (Model [2] and Graph [3]) are considerable different and
>> more complex interfaces. In addition Jena will only change to
>> org.apache packages with the next major release so a switch before
>> that release would mean two incompatible API changes.
>>
>
> Jena isn't changing the packaging as such -- what we've discussed is
> providing a package for the current API and then a new, org.apache API.
>  The new API may be much the same as the existing one or it may be
> different - that depends on contributions made!
>

I didn't know about jena planning to introduce such a common API.

>
> I'd like to hear more about your experiences esp. with Graph API as that
> is supposed to be quite simple - it's targeted at storage extensions as
> well as supporting the richer Model API.  Personally, aside from the fact
> that Clerreza enforces slot constraints (no literals as subjects), the Jena
> Graph API and Clerezza RDF core API seem reasonably aligned.
>

Yes the slot constraints comes from the RDF abstract syntax. In my opinion
it's something one could decide to relax, by adding appropriate owl:sameAs
bnode any graph could be transformed to an rdf-abstract-syntax compliant
one. So maybe have a GnereicTripleCollection that can be converted to an
RDFTRipleCollection - not sure. Just sticking to the spec and wait till
this is allowed by the abstract syntax might be the easiest.

Where I see advantages of the clerezza API:
- Bases on collections framework so standard tools can be used for graphs
- Immutable graphs follow identity criterion of RDF semantics, this allows
graph component to be added to sets and more straight forwardly implement
diff and patch algorithms
- BNode have no ids: apart from promoting the usage of URIs where this is
appropriate it allows behind the scenes leanification and saves memory
where the backend doesn't hast such ids.



>
> (for generalised systems such as rules engine - and for SPARQL - triples
> can arise with extras like literals as subjects; they get removed later)


If this shall be an API for interoperability based on RDF standard I'm
wonder if is shall be possible to expose such intermediate constructs.

Reto

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 12, 2012 at 5:46 PM, Andy Seaborne <an...@apache.org> wrote:

> On 09/11/12 09:56, Rupert Westenthaler wrote:
>
>> RDF libs:
>> ====
>>
>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>> Standards to evolve quite a bit in the coming years and I do have
>> concern that the Clerezza RDF modules will be updated/extended to
>> provide implementations of those. One example of such an situation is
>> SPARQL 1.1 that is around for quite some time and is still not
>> supported by Clerezza. While I do like the small API, the flexibility
>> to use different TripleStores and that Clerezza comes with OSGI
>> support I think given the current situation we would need to discuss
>> all options and those do also include a switch to Apache Jena or
>> Sesame. Especially Sesame would be an attractive option as their RDF
>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>> counterparts (Model [2] and Graph [3]) are considerable different and
>> more complex interfaces. In addition Jena will only change to
>> org.apache packages with the next major release so a switch before
>> that release would mean two incompatible API changes.
>>
>
> Jena isn't changing the packaging as such -- what we've discussed is
> providing a package for the current API and then a new, org.apache API.
>  The new API may be much the same as the existing one or it may be
> different - that depends on contributions made!
>

I didn't know about jena planning to introduce such a common API.

>
> I'd like to hear more about your experiences esp. with Graph API as that
> is supposed to be quite simple - it's targeted at storage extensions as
> well as supporting the richer Model API.  Personally, aside from the fact
> that Clerreza enforces slot constraints (no literals as subjects), the Jena
> Graph API and Clerezza RDF core API seem reasonably aligned.
>

Yes the slot constraints comes from the RDF abstract syntax. In my opinion
it's something one could decide to relax, by adding appropriate owl:sameAs
bnode any graph could be transformed to an rdf-abstract-syntax compliant
one. So maybe have a GnereicTripleCollection that can be converted to an
RDFTRipleCollection - not sure. Just sticking to the spec and wait till
this is allowed by the abstract syntax might be the easiest.

Where I see advantages of the clerezza API:
- Bases on collections framework so standard tools can be used for graphs
- Immutable graphs follow identity criterion of RDF semantics, this allows
graph component to be added to sets and more straight forwardly implement
diff and patch algorithms
- BNode have no ids: apart from promoting the usage of URIs where this is
appropriate it allows behind the scenes leanification and saves memory
where the backend doesn't hast such ids.



>
> (for generalised systems such as rules engine - and for SPARQL - triples
> can arise with extras like literals as subjects; they get removed later)


If this shall be an API for interoperability based on RDF standard I'm
wonder if is shall be possible to expose such intermediate constructs.

Reto

Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
On 09/11/12 09:56, Rupert Westenthaler wrote:
> RDF libs:
> ====
>
> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> if it makes sense to manage an own RDF API. I expect the Semantic Web
> Standards to evolve quite a bit in the coming years and I do have
> concern that the Clerezza RDF modules will be updated/extended to
> provide implementations of those. One example of such an situation is
> SPARQL 1.1 that is around for quite some time and is still not
> supported by Clerezza. While I do like the small API, the flexibility
> to use different TripleStores and that Clerezza comes with OSGI
> support I think given the current situation we would need to discuss
> all options and those do also include a switch to Apache Jena or
> Sesame. Especially Sesame would be an attractive option as their RDF
> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> counterparts (Model [2] and Graph [3]) are considerable different and
> more complex interfaces. In addition Jena will only change to
> org.apache packages with the next major release so a switch before
> that release would mean two incompatible API changes.

Jena isn't changing the packaging as such -- what we've discussed is 
providing a package for the current API and then a new, org.apache API. 
  The new API may be much the same as the existing one or it may be 
different - that depends on contributions made!

I'd like to hear more about your experiences esp. with Graph API as that 
is supposed to be quite simple - it's targeted at storage extensions as 
well as supporting the richer Model API.  Personally, aside from the 
fact that Clerreza enforces slot constraints (no literals as subjects), 
the Jena Graph API and Clerezza RDF core API seem reasonably aligned.

(for generalised systems such as rules engine - and for SPARQL - triples 
can arise with extras like literals as subjects; they get removed later)

	Andy




Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
On 11/11/12 23:22, Rupert Westenthaler wrote:
> Hi all ,
>
> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
>> implementation of rdf related APIs, usable with and without OSGi
>
> For me this immediately raises the question: Why should the Clerezza
> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> based on Jena and Sesame? Creating an Apache commons project based on
> an RDF API that is only used by a very low percentage of all Java RDF
> applications is not feasible. Generally I see not much room for a
> commons RDF project as long as there is not a commonly agreed RDF API
> for Java.

Very good point.

There is a finite and bounded supply of energy of people to work on such 
a thing and to make it work for the communities that use it.   For all 
of us, work on A means less work on B.


An "RDF API" for applications needs to be more than RDF. A SPARQL engine 
is not simply abstracted from the storage by some "list(s,p,o)" API 
call.  It will die at scale, where scale here includes in-memory usage.

My personal opinion is that wrapper APIs are not the way to go - they 
end up as a new API in themselves and the fact they are backed by 
different systems is really an implementation detail.  They end up 
having design opinions and gradually require more and more maintenace as 
the add more and more.

API bridges are better (mapping one API to another - we are really 
talking about a small number of APIs, not 10s) as they expose the 
advantages of each system.

The ideal is a set of interfaces systems can agree on.  I'm going to be 
contributing to the interfacization of the Graph API in Jena - if you 
have thoughts, send email to a list.

	Andy

PS See the work being done by Stephen Allen on coarse grained APIs:

http://mail-archives.apache.org/mod_mbox/jena-dev/201206.mbox/%3CCAPTxtVOMMWxfk2%2B4ciCExUBZyxsDKvuO0QshXF8uKhaD8txXjA%40mail.gmail.com%3E



Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 12, 2012 at 12:22 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all ,
>
> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
> > - clerezza.rdf graudates as commons.rdf: a modular java/scala
> > implementation of rdf related APIs, usable with and without OSGi
>
> For me this immediately raises the question: Why should the Clerezza
> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> based on Jena and Sesame? Creating an Apache commons project based on
> an RDF API that is only used by a very low percentage of all Java RDF
> applications is not feasible. Generally I see not much room for a
> commons RDF project as long as there is not a commonly agreed RDF API
> for Java.
>

The Sesame and the Jena API are API for building application on top of
tripestores. In Clerezza we wanted to be independent of the triple-store
and also be able to expose other data-sources as RDF (this can also be done
with the other api's but requires more memory and typically more code). We
evaluated rdf2go but dismissed it for various reasons so we decided to
introduce an API that's as closed as possible to the relevant standards.
The standards evolve and new standard emerge (e.g. the W3C RDF API spec
draft or SPARQL 1.1) for this and for the experience by users using it the
libraries should evolve. I think that an apache commons project is a more
flexible approach than JCP.


Cheers,
Reto

>
> On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ
> <ch...@googlemail.com> wrote:
> >
> > Having the clerezza platform in Stanbol and thinking in the long term
> about
> > merging and using this stuff is a good choice. This can not be done with
> > some simple imports and we should carefully evaluate what will be the
> right
> > way to go in Stanbol.
>
> I would still suggest to do this within an own branch as this makes it
> easier to commit/review unfinished stuff. In addition we will need a
> branch for making a vote (I guess both for Clerezza and Stanbol) on
> the proposed changes.
>
> The following list tries to sum-up discussed points (please
> refine/complete)
>
> * apache.commons.web:
>     + Jersey -> Apache Wink
>     + replace Viewable with LDViewable
>     + Stanbol Web UI should become optional
>     * add type based Rendering (at a later time)
> * apache.commons.security:
>     + move security from Clerezza to Stanbol
>     + based on Servlet filter
> * Scala: no change needed
>     * TODO: observe the PermGen space issue
> * Shell: no change needed
> * Development Tools
>     * add Bundle-Dev-Tools to shell
>     * add Maven Archetype support to Stanbol
> * Clerezza RDF framework:
>     ? Is community strong enough to manage its own RDF framework
>     ? Where to manage the code
>     + SPARQL 1.1 via fast lane (direct access to the native SPARQL
> implementations)
>     + Update to the newest Jena versions
>     + Merge Indexed in-memory TripleCollections to clerezza
>     + finish and release the SingleTdbDatasetTcProvider
>     + add support for JSON-LD parsing/serializing
> ? Clerezza Platform: Can someone make a list what else is present in
> Clerezza
>
> best
> Rupert
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 12 November 2012 09:22, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all ,
>
> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
>> implementation of rdf related APIs, usable with and without OSGi
>
> For me this immediately raises the question: Why should the Clerezza
> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> based on Jena and Sesame? Creating an Apache commons project based on
> an RDF API that is only used by a very low percentage of all Java RDF
> applications is not feasible. Generally I see not much room for a
> commons RDF project as long as there is not a commonly agreed RDF API
> for Java.

Given that the only current widely used Jena/Sesame abstraction layer,
the RDF2GO project, are no longer actively maintaining their
libraries, there is room in the market for a Jena/Sesame wrapper API,
if it is kept up to date with recent versions of both libraries.

Given that the underlying OpenRDF API is made up of interfaces, as
opposed to Jena that uses classes for important parts of their API, it
may be viable to switch to using the OpenRDF API with a non-Sesame
implementation for the Repository, and/or Sail APIs and work on a
conversion layer between Jena and Sesame where necessary, possibly
based on the library that Andy Seaborne and I have worked on at GitHub
in the past [1].

Cheers,

Peter

[1] https://github.com/ansell/JenaSesame

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 12 November 2012 09:22, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all ,
>
> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
>> - clerezza.rdf graudates as commons.rdf: a modular java/scala
>> implementation of rdf related APIs, usable with and without OSGi
>
> For me this immediately raises the question: Why should the Clerezza
> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> based on Jena and Sesame? Creating an Apache commons project based on
> an RDF API that is only used by a very low percentage of all Java RDF
> applications is not feasible. Generally I see not much room for a
> commons RDF project as long as there is not a commonly agreed RDF API
> for Java.

Given that the only current widely used Jena/Sesame abstraction layer,
the RDF2GO project, are no longer actively maintaining their
libraries, there is room in the market for a Jena/Sesame wrapper API,
if it is kept up to date with recent versions of both libraries.

Given that the underlying OpenRDF API is made up of interfaces, as
opposed to Jena that uses classes for important parts of their API, it
may be viable to switch to using the OpenRDF API with a non-Sesame
implementation for the Repository, and/or Sail APIs and work on a
conversion layer between Jena and Sesame where necessary, possibly
based on the library that Andy Seaborne and I have worked on at GitHub
in the past [1].

Cheers,

Peter

[1] https://github.com/ansell/JenaSesame

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Mon, Nov 12, 2012 at 12:22 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all ,
>
> On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
> > - clerezza.rdf graudates as commons.rdf: a modular java/scala
> > implementation of rdf related APIs, usable with and without OSGi
>
> For me this immediately raises the question: Why should the Clerezza
> API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
> based on Jena and Sesame? Creating an Apache commons project based on
> an RDF API that is only used by a very low percentage of all Java RDF
> applications is not feasible. Generally I see not much room for a
> commons RDF project as long as there is not a commonly agreed RDF API
> for Java.
>

The Sesame and the Jena API are API for building application on top of
tripestores. In Clerezza we wanted to be independent of the triple-store
and also be able to expose other data-sources as RDF (this can also be done
with the other api's but requires more memory and typically more code). We
evaluated rdf2go but dismissed it for various reasons so we decided to
introduce an API that's as closed as possible to the relevant standards.
The standards evolve and new standard emerge (e.g. the W3C RDF API spec
draft or SPARQL 1.1) for this and for the experience by users using it the
libraries should evolve. I think that an apache commons project is a more
flexible approach than JCP.


Cheers,
Reto

>
> On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ
> <ch...@googlemail.com> wrote:
> >
> > Having the clerezza platform in Stanbol and thinking in the long term
> about
> > merging and using this stuff is a good choice. This can not be done with
> > some simple imports and we should carefully evaluate what will be the
> right
> > way to go in Stanbol.
>
> I would still suggest to do this within an own branch as this makes it
> easier to commit/review unfinished stuff. In addition we will need a
> branch for making a vote (I guess both for Clerezza and Stanbol) on
> the proposed changes.
>
> The following list tries to sum-up discussed points (please
> refine/complete)
>
> * apache.commons.web:
>     + Jersey -> Apache Wink
>     + replace Viewable with LDViewable
>     + Stanbol Web UI should become optional
>     * add type based Rendering (at a later time)
> * apache.commons.security:
>     + move security from Clerezza to Stanbol
>     + based on Servlet filter
> * Scala: no change needed
>     * TODO: observe the PermGen space issue
> * Shell: no change needed
> * Development Tools
>     * add Bundle-Dev-Tools to shell
>     * add Maven Archetype support to Stanbol
> * Clerezza RDF framework:
>     ? Is community strong enough to manage its own RDF framework
>     ? Where to manage the code
>     + SPARQL 1.1 via fast lane (direct access to the native SPARQL
> implementations)
>     + Update to the newest Jena versions
>     + Merge Indexed in-memory TripleCollections to clerezza
>     + finish and release the SingleTdbDatasetTcProvider
>     + add support for JSON-LD parsing/serializing
> ? Clerezza Platform: Can someone make a list what else is present in
> Clerezza
>
> best
> Rupert
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all ,

On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
> - clerezza.rdf graudates as commons.rdf: a modular java/scala
> implementation of rdf related APIs, usable with and without OSGi

For me this immediately raises the question: Why should the Clerezza
API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
based on Jena and Sesame? Creating an Apache commons project based on
an RDF API that is only used by a very low percentage of all Java RDF
applications is not feasible. Generally I see not much room for a
commons RDF project as long as there is not a commonly agreed RDF API
for Java.

On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ
<ch...@googlemail.com> wrote:
>
> Having the clerezza platform in Stanbol and thinking in the long term about
> merging and using this stuff is a good choice. This can not be done with
> some simple imports and we should carefully evaluate what will be the right
> way to go in Stanbol.

I would still suggest to do this within an own branch as this makes it
easier to commit/review unfinished stuff. In addition we will need a
branch for making a vote (I guess both for Clerezza and Stanbol) on
the proposed changes.

The following list tries to sum-up discussed points (please refine/complete)

* apache.commons.web:
    + Jersey -> Apache Wink
    + replace Viewable with LDViewable
    + Stanbol Web UI should become optional
    * add type based Rendering (at a later time)
* apache.commons.security:
    + move security from Clerezza to Stanbol
    + based on Servlet filter
* Scala: no change needed
    * TODO: observe the PermGen space issue
* Shell: no change needed
* Development Tools
    * add Bundle-Dev-Tools to shell
    * add Maven Archetype support to Stanbol
* Clerezza RDF framework:
    ? Is community strong enough to manage its own RDF framework
    ? Where to manage the code
    + SPARQL 1.1 via fast lane (direct access to the native SPARQL
implementations)
    + Update to the newest Jena versions
    + Merge Indexed in-memory TripleCollections to clerezza
    + finish and release the SingleTdbDatasetTcProvider
    + add support for JSON-LD parsing/serializing
? Clerezza Platform: Can someone make a list what else is present in Clerezza

best
Rupert

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ <christ.fabian@googlemail.com
> wrote:

> Sounds like a reasonable way to go. Having the RDF stuff in commons is
> hopefully useful for many other projects. Would this also include the
> JSONLD serializer and parser that is more and more needed by Stanbol and
> others?
>

Yes, I think it would be reasonable if besides the APIs this project would
contain adapters, serializers, parsers and in-memory storage objects. Of
course these should ship as separate jars so that one can also just have
the bare-bones APIs.

Cheers,
Reto

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ <christ.fabian@googlemail.com
> wrote:

> Sounds like a reasonable way to go. Having the RDF stuff in commons is
> hopefully useful for many other projects. Would this also include the
> JSONLD serializer and parser that is more and more needed by Stanbol and
> others?
>

Yes, I think it would be reasonable if besides the APIs this project would
contain adapters, serializers, parsers and in-memory storage objects. Of
course these should ship as separate jars so that one can also just have
the bare-bones APIs.

Cheers,
Reto

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all ,

On Sun, Nov 11, 2012 at 4:47 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
> - clerezza.rdf graudates as commons.rdf: a modular java/scala
> implementation of rdf related APIs, usable with and without OSGi

For me this immediately raises the question: Why should the Clerezza
API become commons.rdf if 90+% (just a guess) of the Java RDF stuff is
based on Jena and Sesame? Creating an Apache commons project based on
an RDF API that is only used by a very low percentage of all Java RDF
applications is not feasible. Generally I see not much room for a
commons RDF project as long as there is not a commonly agreed RDF API
for Java.

On Sun, Nov 11, 2012 at 5:40 PM, Fabian Christ
<ch...@googlemail.com> wrote:
>
> Having the clerezza platform in Stanbol and thinking in the long term about
> merging and using this stuff is a good choice. This can not be done with
> some simple imports and we should carefully evaluate what will be the right
> way to go in Stanbol.

I would still suggest to do this within an own branch as this makes it
easier to commit/review unfinished stuff. In addition we will need a
branch for making a vote (I guess both for Clerezza and Stanbol) on
the proposed changes.

The following list tries to sum-up discussed points (please refine/complete)

* apache.commons.web:
    + Jersey -> Apache Wink
    + replace Viewable with LDViewable
    + Stanbol Web UI should become optional
    * add type based Rendering (at a later time)
* apache.commons.security:
    + move security from Clerezza to Stanbol
    + based on Servlet filter
* Scala: no change needed
    * TODO: observe the PermGen space issue
* Shell: no change needed
* Development Tools
    * add Bundle-Dev-Tools to shell
    * add Maven Archetype support to Stanbol
* Clerezza RDF framework:
    ? Is community strong enough to manage its own RDF framework
    ? Where to manage the code
    + SPARQL 1.1 via fast lane (direct access to the native SPARQL
implementations)
    + Update to the newest Jena versions
    + Merge Indexed in-memory TripleCollections to clerezza
    + finish and release the SingleTdbDatasetTcProvider
    + add support for JSON-LD parsing/serializing
? Clerezza Platform: Can someone make a list what else is present in Clerezza

best
Rupert

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Fabian Christ <ch...@googlemail.com>.
2012/11/11 Reto Bachmann-Gmür <re...@apache.org>

> - clerezza.rdf graudates as commons.rdf: a modular java/scala
> implementation of rdf related APIs, usable with and without OSGi
> - the clerezza platform becomes a subproject of stanbol but not part of the
> stanbol build, active clerezza committers would join stanbol
>
> These solution would end the now very long incubation period while taking
> into account that because of some architectural choices clereza and stanbol
> cannot immediately be fully merged to one platform, at least not keeping
> the modularity requirements of stanbol. The Stanbol platform would not
> depend on clerezza bundles (but on commons.rdf) and the parts of clerezza
> that fit into the stanbol architecture would gradually be moved to stanbol.
> At the end clerezza might disappear completly or remain as a tinier
> application on top of Stanbol.
>

Sounds like a reasonable way to go. Having the RDF stuff in commons is
hopefully useful for many other projects. Would this also include the
JSONLD serializer and parser that is more and more needed by Stanbol and
others?

Having the clerezza platform in Stanbol and thinking in the long term about
merging and using this stuff is a good choice. This can not be done with
some simple imports and we should carefully evaluate what will be the right
way to go in Stanbol.

Best,
 - Fabian

-- 
Fabian
http://twitter.com/fctwitt

Re: Future of Clerezza and Stanbol

Posted by Fabian Christ <ch...@googlemail.com>.
2012/11/11 Reto Bachmann-Gmür <re...@apache.org>

> - clerezza.rdf graudates as commons.rdf: a modular java/scala
> implementation of rdf related APIs, usable with and without OSGi
> - the clerezza platform becomes a subproject of stanbol but not part of the
> stanbol build, active clerezza committers would join stanbol
>
> These solution would end the now very long incubation period while taking
> into account that because of some architectural choices clereza and stanbol
> cannot immediately be fully merged to one platform, at least not keeping
> the modularity requirements of stanbol. The Stanbol platform would not
> depend on clerezza bundles (but on commons.rdf) and the parts of clerezza
> that fit into the stanbol architecture would gradually be moved to stanbol.
> At the end clerezza might disappear completly or remain as a tinier
> application on top of Stanbol.
>

Sounds like a reasonable way to go. Having the RDF stuff in commons is
hopefully useful for many other projects. Would this also include the
JSONLD serializer and parser that is more and more needed by Stanbol and
others?

Having the clerezza platform in Stanbol and thinking in the long term about
merging and using this stuff is a good choice. This can not be done with
some simple imports and we should carefully evaluate what will be the right
way to go in Stanbol.

Best,
 - Fabian

-- 
Fabian
http://twitter.com/fctwitt

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi,

Templating: LdViewable is there in stanbol and actually using it brings us
a big step forward (and drops the jersey dependency). I don't think a
branch is needed for this. One could enhance LDViewable to be a generic
replacement for jersey viewable support pojo renderinh alongside
rdf-rendering, this would make transition easier. TypeBasedRendering would
be an additional step forward allowing the jax-rs resource to be completely
rendering agnostic and just return GraphNodes, I suggest to introduce this
at a later point and for now have LDViewable as best practice.

Scala: Running for stanbol instances on a server is probably a corner
usecase. The big permgenspace need is an issue that hopefully gets fixed in
future version, I don't think it is pertinent to design choices of the
language so I'm confident there will be some improvement in future (see
also: http://www.scala-lang.org/node/8229).

Security: Some ports were needed from clerezza to stanbol because the
clerezza implementation is based on wrhapi rather than servlet filters.
Which the switch from triaxrs to wink in clerezza the servlet filter
versions will be needed there as well.  just didn't ported over to stanbol
the classes that need no modification. With the switch to wink in clerezza
the stanbol packages will be of use there. This example shows how close the
two project are and that's often hard to say what best suit where.

So what would be a possible solution:

- clerezza.rdf graudates as commons.rdf: a modular java/scala
implementation of rdf related APIs, usable with and without OSGi
- the clerezza platform becomes a subproject of stanbol but not part of the
stanbol build, active clerezza committers would join stanbol

These solution would end the now very long incubation period while taking
into account that because of some architectural choices clereza and stanbol
cannot immediately be fully merged to one platform, at least not keeping
the modularity requirements of stanbol. The Stanbol platform would not
depend on clerezza bundles (but on commons.rdf) and the parts of clerezza
that fit into the stanbol architecture would gradually be moved to stanbol.
At the end clerezza might disappear completly or remain as a tinier
application on top of Stanbol.

Just an idea and not meant to imply that Clerezza coudln't grauduate by
itself. But given interest of both communities I think such a merger could
be beneficial.

WDYT?

Reto





On Sun, Nov 11, 2012 at 9:36 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Reto, all
>
> ad (2) "Type-Based rendering":
>
> You are right MessageBodyWriters might not easily work as responses for
> different types will use the same Java class and Content-Type.
>
> If it is not much work I suggest to activate both variants (LDVieable) and
> the
> template based mechanism in the suggested branch to refactor the Stanbol
> RESTful/Web stuff. Maybe a clever use of the OSGI Whiteboard Pattern
> could allow to include the hooks required for template based framework
> without
> creating a strong dependency to any implementation.
>
>
> ad Scala:
>
> > Do you know
> > about user having a concrete issue with the additional ram requirement or
> > is it more the fact that's not nice having memory used without clear
> reason
> > that's bothering you?
>
> It is mainly the 2nd. Stanbol runs on about 70MByte PermGen and as soon as
> Scala starts this rises to ~200MByte.
>
> However in case of the dev.iks-project.eu server where we had to run four
> Stanbol instances on 9Gbyte RAM it was also a real concern related to
> memory.
> In the meantime we where able to increase the RAM of the virtual host
> to 15GByte.
>
>
> ad Security:
>
> I was not aware that the user/pwd was loaded from an RDF graph. I
> was thinking that just the system properties used to init the system
> where loaded
> via Clerezza specific keys. My point was that the from the Stanbol Commons
> Security module(s) to Clerezza seam to be very weak (in the sense that
> they are only
> using very few things of the referenced Clerezza modules). So my
> question was if
> we can/should remove them or if there are good reasons to keep them.
>
>
> ad "Stanbol is no semantic CMS": you are right
>
> > [..] the vision of an ecosystem of modular semantic and
> > restful osgi components [..]
>
> describes it much better. The important thing is that Stanbol can be used
> with
> any CMS and does not require users to replace an existing CMS stack.
>
> best
> Rupert
>
> On Sun, Nov 11, 2012 at 2:09 AM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
>
> > (2) Type-Based rendering is [...]not something that can be implemented
> just by
> > adding MessageBodyWriters as different RDF resources do not result in
> > different java classes. For a framework providing resources as RDF typed
> > based rendering seems the straight forward approach to allow these
> > resources to be rendered in non rdf formats as well. For this we can
> still
> > use Freemarker (with LDPath templates) but our legacy template that are
> > require the class with the application logic to provide special hooks to
> > the templates goes against the concept of having a plugable UI that can
> be
> > left away for instances only to be used by machines. Keep in mind that an
> > infrastructure for providing templates in a better way is already there
> > since the introduction of LDVieable. Type Based rendering goes one step
> > further as the jax-rs root resource would no longer have to provide the
> > abstract template-path.
> >
> > (3)
> > JSR-223 support: I suggested to drop this.
> >
> > Scala support: I'm wondering myself why there is such a big PermGenSpace
> > need. I've just update clerezza trunk to use scala 2.9.2 this might have
> > improved things a bit. As the compiler classloading mechanism is changed
> in
> > 2.10 I guess a bigger improvement might come with that version. Do you
> know
> > about user having a concrete issue with the additional ram requirement or
> > is it more the fact that's not nice having memory used without clear
> reason
> > that's bothering you?
> >
> > Shell: The felix webconsole is there to install bunde, configure services
> > and so on. What you can do with the shell is actually invoking these
> > service's methods and explore exported package structures. Especially
> when
> > exploring API's I'm not yet familiar with the shell has been of great
> > benefit to me. Of course it's a module one can turn off.
> >
> > Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven
> skeletons
> > can also be used as prototypes for the bundle-dev-tool (just some maven
> > magic needed). Of course it's question of style and size of the module if
> > one want the dynamic update and things working independently of the pom
> > dependencies or prefers to compile and redeploy. In the trunk version of
> > dev-tools there's also instant update for static files which makes it
> > particularly convenient when editing css and javascript. As long as no
> > duplication of archetype/skeleton is needed I don't see why not offer
> both
> > maven archetypes and skeletons.
> >
> > Security:
> > You're suggesting one should configure the user, their password and
> > permission in some config files rather than storing them in RDF and
> having
> > a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
> > when we're talking about some launchers being stateless we mean that
> usage
> > of  the (main) functionality it offers doesn't alter the state of the
> > system. If you intepret "stateless" very strictly then you would have to
> > drop most parts of the felix webconsole as http requests to install
> bundle
> > or configure services aren't stateless. For the user-configuration a
> simple
> > file-based TcProvider would of course be enough so no TDB is needed for
> > that.
> >
> > I think we should see where we want to go as a community. For me the
> > important thing is that Stanbol remains very modular. I think statements
> > like "Stanbol is no semantic CMS" do not bring us further. It's important
> > that the stanbol services can be used as services and that many services
> > are stateless. But the contenthub is a component to manage content (the
> > entityhub to some degree as well), do we want to mandate a horrible user
> > interface just to comply with some catchphrase about what Stanbol is not?
> > Or do we want to reduce Stanbol to the be just the Enhancer and let the
> > other stuff to other projects?
> >
> > I'd rather go for the vision of an ecosystem of modular semantic and
> > restful osgi components, but if the community wants to focus on the
> > enhancer I think a clear statement should be made to avoid unnecessary
> > arguments about memory consumption.
> >
> > Cheers,
> > Reto
> >
> >
> > On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> let me share my throughs. Because this mail is rather long I tried to
> >> split it up in three separate section (1) RDF (2) RESTful/ Web
> >> Interface and (3) other related topics
> >>
> >>
> >> RDF libs:
> >> ====
> >>
> >> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> >> if it makes sense to manage an own RDF API. I expect the Semantic Web
> >> Standards to evolve quite a bit in the coming years and I do have
> >> concern that the Clerezza RDF modules will be updated/extended to
> >> provide implementations of those. One example of such an situation is
> >> SPARQL 1.1 that is around for quite some time and is still not
> >> supported by Clerezza. While I do like the small API, the flexibility
> >> to use different TripleStores and that Clerezza comes with OSGI
> >> support I think given the current situation we would need to discuss
> >> all options and those do also include a switch to Apache Jena or
> >> Sesame. Especially Sesame would be an attractive option as their RDF
> >> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> >> counterparts (Model [2] and Graph [3]) are considerable different and
> >> more complex interfaces. In addition Jena will only change to
> >> org.apache packages with the next major release so a switch before
> >> that release would mean two incompatible API changes.
> >>
> >> My personal opinion is that we should keep using Clerezza for now.
> >> Invest some effort to improve the Clerezza RDF modules and than see
> >> how it further develops. Such an Effort should include
> >>
> >> *  to implement SPQRAL fast lane (as already discussed with Reto
> >> during ApacheCon). Fast lane would allow Clerezza to use the native
> >> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> >> parses those parts of the SPARQL query to understand the RDF graph to
> >> execute the Query on. This information is than used to parse the query
> >> to the native SPARQL engine via an extended Interface of the
> >> TcProvide. The Clerezza SPARQL implementation would only be used in
> >> case the TcProvider does not provide a native SPARQL implementation of
> >> if the Query spans RDF graphs managed by different TcProvider
> >> instances. By that Clerezza users would be able to use any SPARQL
> >> feature provided by the used TripleStore.
> >> * update to the newest Jena versions (see also STANBOL-621; Peter
> >> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> >> Jena bundle used for the Stanbol/LMF integration [5])
> >> * finish and release the SingleTdbDatasetTcProvider.java
> >> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> >> component
> >> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> >> code base to Clerezza and release it so that we can use it from their
> >> in Stanbol
> >> * provide an Clerezza JsonLD parser/serializer. This is critical for
> >> Stanbol as several CMS use this as preferred RDF serialization.
> >>
> >> [1]
> >>
> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
> >> [2]
> >>
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
> >> [3]
> >>
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
> >> [4]
> >>
> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
> >> [5]
> >>
> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
> >>
> >>
> >> RESTful API / Web Interface:
> >> =====================
> >>
> >> There are several shortcomings of the current implementation of the
> >> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
> >> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
> >>
> >> * Jersey's use of java.util.ServiceLoader forces the use manual
> >> configuration of the JAX-RS components. A switch to an OSGI compatible
> >> implementation such as Apache Wink would be very welcome
> >> * The RESTful API documentation is currently written as HTML into
> >> Freemarker templates. This makes it really hard to maintain this
> >> documentation. I would really appreciate the possibility to use
> >> markdown (as used on the Webpage) for that
> >> * For Stanbol deployments of Stanbol it should be possible to exclude
> >> the WebUI so that only the RESTful services are available
> >>
> >> regarding :
> >>
> >> > Stanbol drops it's interretation of "REST" as "not for humans" and
> want
> >> to go to
> >> > allow integrating (wherever possible as modular and optional
> components)
> >> > media types designed for human consumptions and support REST
> approaches
> >> > there as well (thinking of the current back-button unfriendly UI).
> >>
> >> Adding support for a simple Table based representation of RDF data
> >> would indeed be an important feature. However having Resource (Entity)
> >> type specific rendering is out of the scope of Apache Stanbol (at
> >> least in my opinion). However AFAIK as soon as we switch to an OSGI
> >> compatible JAX-RS implementation users could add those easily by
> >> providing the according JAX-RS MessageBodyWriter.
> >>
> >> If there are people who would like to work it would be really great.
> >> If we could (re)use some stuff from Clerezza - even better. But things
> >> would need to keep simple as Stanbol is no semantic CMS.
> >>
> >> I would suggest to start development in an own branch and than have a
> >> discussion/vote based on an early prototype/demonstration.
> >>
> >>
> >> Other Topics
> >> =========
> >>
> >> ### Scala and jsr 223 (scripting in the JVM)
> >>
> >> I do have an issue with Scala as it adds >150MByte to the PermGen as
> >> soon as it is loaded. But as long as it is an optional dependency and
> >> users are aware of that when adding the dependency I am fine with it.
> >>
> >> ###  Shell
> >>
> >> Personally I do not find the shell very useful. For installing
> >> Bundles/Service configurations I prefer to use the Apache Sling
> >> FileInstaller. For deployment during development I like to use the
> >> Sling Maven Installer plugin. For creating new Stanbol Modules I
> >> rather suggest to create an extensive list of Maven Archetype (e.g.
> >> for Stanbol EnhancementEngines).
> >>
> >> As the Shell also depends on Scala the "+150MByte to the PermGen"
> >> issue also applies to the Shell.
> >>
> >> ### Security
> >>
> >> Having a security model in Apache Stanbol might be important for some
> >> use cases. Because of this I consider this an important topic. However
> >> one I have very little experience with.
> >>
> >> I would like to get rid of the dependencies to
> >> org.apache.clerezza:patform (AFAIK this is only needed for the
> >> configuration and this could be easily provided by the
> >> sling.properties file at runtime. Defaults can be provided in the
> >> commons.properties file already included in all Stanbol Launchers. I
> >> would also suggest to move the PermissionParser utility over to the
> >> Apache Stanbol Security modules.
> >> This two changes would allow to activate the security module also for
> >> the Stable (Stateless) launcher.
> >>
> >>
> >> best
> >> Rupert
> >>
> >>
> >> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> >> > Comments inline...
> >> >
> >> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
> >> wrote:
> >> >
> >> >> Ok, sorry for jumping into this discussion so lately. I've been
> having
> >> >> quite some discussion on the matter here at apacheconeu. Also I had
> >> >> prositive feedback from my resentation of Clerezza yesterday.
> >> >>
> >> >> I think two things:
> >> >> - For high level platform component it is often not clear if the fit
> >> better
> >> >> into Stanbol or into Clerezza
> >> >> - The RDF Api shoud actually be independen both from triple store
> >> provider
> >> >> as well as from consumer
> >> >>
> >> >> So I think a good solution would be to have the RDF liraries
> comprising:
> >> >> - A modular and very spec oriented API for RDF and related standards
> >> >> - A set of serializing and parsing providers
> >> >> - Adapters to triple stores (where the api isn't provided by the
> triple
> >> >> store)
> >> >> basically that's what in the org.apache.clerezza.rdf.* packages
> >> >>
> >> >> That's the stuff that would fit well into Stanbol. Provided that
> stanbol
> >> >> drops it's interretation of "REST" as "not for humans" and want to
> go to
> >> >> allow integrating (wherever possible as modular and optional
> components)
> >> >> media types designed for human consumptions and support REST
> approaches
> >> >> there as well (thinking of the current back-button unfriendly UI).
> >> >>
> >> >
> >> > IMO, Clerezza is just too big for existing committers. If we could
> reduce
> >> > it to the
> >> > essential components dealing with rdf and leaving out templating and
> >> > rendering,
> >> > it may be easier to graduate.
> >> >
> >> > - Scala Server Pages
> >> >> - TypeRendering (selection of templates based on the rdf type of the
> >> >> returned response)
> >> >> - Security (already integrated to some degree, code based security to
> >> run
> >> >> bundles in a sandboxed manner is not)
> >> >> - Shell (already ships in the stanbol launcher, so here it's about
> >> >> 'adopting' the sources)
> >> >> - Dev tools: rapid development support (create sample projects, have
> >> source
> >> >> files as bundles)
> >> >>
> >> >> To the attic:
> >> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as
> the
> >> >> same support (jax-rs components asosgi services) is now provided by
> >> apache
> >> >> wink
> >> >> -  jssr 223 support
> >> >>
> >> >> In my opinion there is no urgent need for action, it is true that
> there
> >> >> hasn't been a lot of action in clerezza but imho the project os
> going on
> >> >> even at a low pace  (as other projects like e.g. the recently
> graduated
> >> >> wink).
> >> >>
> >> >
> >> > Not sure about no urgent need for action. Maybe we should list the
> >> > requirements
> >> > to fulfil in order to be able to graduate. Wonder if we are able to
> meet
> >> > them.
> >> >
> >> > Cheers
> >> > Hasan
> >> >
> >> >
> >> >>
> >> >> Cheers,
> >> >> Reto
> >> >>
> >> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> >> >> bdelacretaz@apache.org
> >> >> > wrote:
> >> >>
> >> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
> >> wrote:
> >> >> > > ...It's good to have the existing released artifacts remain -
> what
> >> >> about
> >> >> > after
> >> >> > > the donation?
> >> >> > >
> >> >> > > Presumably the moved modules will be released by the new host -
> will
> >> >> they
> >> >> > > use group id org.apache.clerezza? or move to the new host project
> >> group
> >> >> > id?
> >> >> > > I'd suggest renaming the group to the new project but realise it
> is
> >> a
> >> >> bit
> >> >> > > more disruptive...
> >> >> >
> >> >> > I think that's really up to whatever project adopts that code. In
> >> >> > theory package names should change but that's probably not
> convenient.
> >> >> >
> >> >> > Or maybe it's time to create a semantic module or two at
> >> >> > http://commons.apache.org/ ? If existing committers are willing to
> >> >> > support that with their work it should be easy to make it happen.
> >> >> >
> >> >> > -Bertrand
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi,

Templating: LdViewable is there in stanbol and actually using it brings us
a big step forward (and drops the jersey dependency). I don't think a
branch is needed for this. One could enhance LDViewable to be a generic
replacement for jersey viewable support pojo renderinh alongside
rdf-rendering, this would make transition easier. TypeBasedRendering would
be an additional step forward allowing the jax-rs resource to be completely
rendering agnostic and just return GraphNodes, I suggest to introduce this
at a later point and for now have LDViewable as best practice.

Scala: Running for stanbol instances on a server is probably a corner
usecase. The big permgenspace need is an issue that hopefully gets fixed in
future version, I don't think it is pertinent to design choices of the
language so I'm confident there will be some improvement in future (see
also: http://www.scala-lang.org/node/8229).

Security: Some ports were needed from clerezza to stanbol because the
clerezza implementation is based on wrhapi rather than servlet filters.
Which the switch from triaxrs to wink in clerezza the servlet filter
versions will be needed there as well.  just didn't ported over to stanbol
the classes that need no modification. With the switch to wink in clerezza
the stanbol packages will be of use there. This example shows how close the
two project are and that's often hard to say what best suit where.

So what would be a possible solution:

- clerezza.rdf graudates as commons.rdf: a modular java/scala
implementation of rdf related APIs, usable with and without OSGi
- the clerezza platform becomes a subproject of stanbol but not part of the
stanbol build, active clerezza committers would join stanbol

These solution would end the now very long incubation period while taking
into account that because of some architectural choices clereza and stanbol
cannot immediately be fully merged to one platform, at least not keeping
the modularity requirements of stanbol. The Stanbol platform would not
depend on clerezza bundles (but on commons.rdf) and the parts of clerezza
that fit into the stanbol architecture would gradually be moved to stanbol.
At the end clerezza might disappear completly or remain as a tinier
application on top of Stanbol.

Just an idea and not meant to imply that Clerezza coudln't grauduate by
itself. But given interest of both communities I think such a merger could
be beneficial.

WDYT?

Reto





On Sun, Nov 11, 2012 at 9:36 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Reto, all
>
> ad (2) "Type-Based rendering":
>
> You are right MessageBodyWriters might not easily work as responses for
> different types will use the same Java class and Content-Type.
>
> If it is not much work I suggest to activate both variants (LDVieable) and
> the
> template based mechanism in the suggested branch to refactor the Stanbol
> RESTful/Web stuff. Maybe a clever use of the OSGI Whiteboard Pattern
> could allow to include the hooks required for template based framework
> without
> creating a strong dependency to any implementation.
>
>
> ad Scala:
>
> > Do you know
> > about user having a concrete issue with the additional ram requirement or
> > is it more the fact that's not nice having memory used without clear
> reason
> > that's bothering you?
>
> It is mainly the 2nd. Stanbol runs on about 70MByte PermGen and as soon as
> Scala starts this rises to ~200MByte.
>
> However in case of the dev.iks-project.eu server where we had to run four
> Stanbol instances on 9Gbyte RAM it was also a real concern related to
> memory.
> In the meantime we where able to increase the RAM of the virtual host
> to 15GByte.
>
>
> ad Security:
>
> I was not aware that the user/pwd was loaded from an RDF graph. I
> was thinking that just the system properties used to init the system
> where loaded
> via Clerezza specific keys. My point was that the from the Stanbol Commons
> Security module(s) to Clerezza seam to be very weak (in the sense that
> they are only
> using very few things of the referenced Clerezza modules). So my
> question was if
> we can/should remove them or if there are good reasons to keep them.
>
>
> ad "Stanbol is no semantic CMS": you are right
>
> > [..] the vision of an ecosystem of modular semantic and
> > restful osgi components [..]
>
> describes it much better. The important thing is that Stanbol can be used
> with
> any CMS and does not require users to replace an existing CMS stack.
>
> best
> Rupert
>
> On Sun, Nov 11, 2012 at 2:09 AM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
>
> > (2) Type-Based rendering is [...]not something that can be implemented
> just by
> > adding MessageBodyWriters as different RDF resources do not result in
> > different java classes. For a framework providing resources as RDF typed
> > based rendering seems the straight forward approach to allow these
> > resources to be rendered in non rdf formats as well. For this we can
> still
> > use Freemarker (with LDPath templates) but our legacy template that are
> > require the class with the application logic to provide special hooks to
> > the templates goes against the concept of having a plugable UI that can
> be
> > left away for instances only to be used by machines. Keep in mind that an
> > infrastructure for providing templates in a better way is already there
> > since the introduction of LDVieable. Type Based rendering goes one step
> > further as the jax-rs root resource would no longer have to provide the
> > abstract template-path.
> >
> > (3)
> > JSR-223 support: I suggested to drop this.
> >
> > Scala support: I'm wondering myself why there is such a big PermGenSpace
> > need. I've just update clerezza trunk to use scala 2.9.2 this might have
> > improved things a bit. As the compiler classloading mechanism is changed
> in
> > 2.10 I guess a bigger improvement might come with that version. Do you
> know
> > about user having a concrete issue with the additional ram requirement or
> > is it more the fact that's not nice having memory used without clear
> reason
> > that's bothering you?
> >
> > Shell: The felix webconsole is there to install bunde, configure services
> > and so on. What you can do with the shell is actually invoking these
> > service's methods and explore exported package structures. Especially
> when
> > exploring API's I'm not yet familiar with the shell has been of great
> > benefit to me. Of course it's a module one can turn off.
> >
> > Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven
> skeletons
> > can also be used as prototypes for the bundle-dev-tool (just some maven
> > magic needed). Of course it's question of style and size of the module if
> > one want the dynamic update and things working independently of the pom
> > dependencies or prefers to compile and redeploy. In the trunk version of
> > dev-tools there's also instant update for static files which makes it
> > particularly convenient when editing css and javascript. As long as no
> > duplication of archetype/skeleton is needed I don't see why not offer
> both
> > maven archetypes and skeletons.
> >
> > Security:
> > You're suggesting one should configure the user, their password and
> > permission in some config files rather than storing them in RDF and
> having
> > a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
> > when we're talking about some launchers being stateless we mean that
> usage
> > of  the (main) functionality it offers doesn't alter the state of the
> > system. If you intepret "stateless" very strictly then you would have to
> > drop most parts of the felix webconsole as http requests to install
> bundle
> > or configure services aren't stateless. For the user-configuration a
> simple
> > file-based TcProvider would of course be enough so no TDB is needed for
> > that.
> >
> > I think we should see where we want to go as a community. For me the
> > important thing is that Stanbol remains very modular. I think statements
> > like "Stanbol is no semantic CMS" do not bring us further. It's important
> > that the stanbol services can be used as services and that many services
> > are stateless. But the contenthub is a component to manage content (the
> > entityhub to some degree as well), do we want to mandate a horrible user
> > interface just to comply with some catchphrase about what Stanbol is not?
> > Or do we want to reduce Stanbol to the be just the Enhancer and let the
> > other stuff to other projects?
> >
> > I'd rather go for the vision of an ecosystem of modular semantic and
> > restful osgi components, but if the community wants to focus on the
> > enhancer I think a clear statement should be made to avoid unnecessary
> > arguments about memory consumption.
> >
> > Cheers,
> > Reto
> >
> >
> > On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> let me share my throughs. Because this mail is rather long I tried to
> >> split it up in three separate section (1) RDF (2) RESTful/ Web
> >> Interface and (3) other related topics
> >>
> >>
> >> RDF libs:
> >> ====
> >>
> >> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> >> if it makes sense to manage an own RDF API. I expect the Semantic Web
> >> Standards to evolve quite a bit in the coming years and I do have
> >> concern that the Clerezza RDF modules will be updated/extended to
> >> provide implementations of those. One example of such an situation is
> >> SPARQL 1.1 that is around for quite some time and is still not
> >> supported by Clerezza. While I do like the small API, the flexibility
> >> to use different TripleStores and that Clerezza comes with OSGI
> >> support I think given the current situation we would need to discuss
> >> all options and those do also include a switch to Apache Jena or
> >> Sesame. Especially Sesame would be an attractive option as their RDF
> >> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> >> counterparts (Model [2] and Graph [3]) are considerable different and
> >> more complex interfaces. In addition Jena will only change to
> >> org.apache packages with the next major release so a switch before
> >> that release would mean two incompatible API changes.
> >>
> >> My personal opinion is that we should keep using Clerezza for now.
> >> Invest some effort to improve the Clerezza RDF modules and than see
> >> how it further develops. Such an Effort should include
> >>
> >> *  to implement SPQRAL fast lane (as already discussed with Reto
> >> during ApacheCon). Fast lane would allow Clerezza to use the native
> >> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> >> parses those parts of the SPARQL query to understand the RDF graph to
> >> execute the Query on. This information is than used to parse the query
> >> to the native SPARQL engine via an extended Interface of the
> >> TcProvide. The Clerezza SPARQL implementation would only be used in
> >> case the TcProvider does not provide a native SPARQL implementation of
> >> if the Query spans RDF graphs managed by different TcProvider
> >> instances. By that Clerezza users would be able to use any SPARQL
> >> feature provided by the used TripleStore.
> >> * update to the newest Jena versions (see also STANBOL-621; Peter
> >> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> >> Jena bundle used for the Stanbol/LMF integration [5])
> >> * finish and release the SingleTdbDatasetTcProvider.java
> >> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> >> component
> >> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> >> code base to Clerezza and release it so that we can use it from their
> >> in Stanbol
> >> * provide an Clerezza JsonLD parser/serializer. This is critical for
> >> Stanbol as several CMS use this as preferred RDF serialization.
> >>
> >> [1]
> >>
> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
> >> [2]
> >>
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
> >> [3]
> >>
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
> >> [4]
> >>
> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
> >> [5]
> >>
> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
> >>
> >>
> >> RESTful API / Web Interface:
> >> =====================
> >>
> >> There are several shortcomings of the current implementation of the
> >> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
> >> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
> >>
> >> * Jersey's use of java.util.ServiceLoader forces the use manual
> >> configuration of the JAX-RS components. A switch to an OSGI compatible
> >> implementation such as Apache Wink would be very welcome
> >> * The RESTful API documentation is currently written as HTML into
> >> Freemarker templates. This makes it really hard to maintain this
> >> documentation. I would really appreciate the possibility to use
> >> markdown (as used on the Webpage) for that
> >> * For Stanbol deployments of Stanbol it should be possible to exclude
> >> the WebUI so that only the RESTful services are available
> >>
> >> regarding :
> >>
> >> > Stanbol drops it's interretation of "REST" as "not for humans" and
> want
> >> to go to
> >> > allow integrating (wherever possible as modular and optional
> components)
> >> > media types designed for human consumptions and support REST
> approaches
> >> > there as well (thinking of the current back-button unfriendly UI).
> >>
> >> Adding support for a simple Table based representation of RDF data
> >> would indeed be an important feature. However having Resource (Entity)
> >> type specific rendering is out of the scope of Apache Stanbol (at
> >> least in my opinion). However AFAIK as soon as we switch to an OSGI
> >> compatible JAX-RS implementation users could add those easily by
> >> providing the according JAX-RS MessageBodyWriter.
> >>
> >> If there are people who would like to work it would be really great.
> >> If we could (re)use some stuff from Clerezza - even better. But things
> >> would need to keep simple as Stanbol is no semantic CMS.
> >>
> >> I would suggest to start development in an own branch and than have a
> >> discussion/vote based on an early prototype/demonstration.
> >>
> >>
> >> Other Topics
> >> =========
> >>
> >> ### Scala and jsr 223 (scripting in the JVM)
> >>
> >> I do have an issue with Scala as it adds >150MByte to the PermGen as
> >> soon as it is loaded. But as long as it is an optional dependency and
> >> users are aware of that when adding the dependency I am fine with it.
> >>
> >> ###  Shell
> >>
> >> Personally I do not find the shell very useful. For installing
> >> Bundles/Service configurations I prefer to use the Apache Sling
> >> FileInstaller. For deployment during development I like to use the
> >> Sling Maven Installer plugin. For creating new Stanbol Modules I
> >> rather suggest to create an extensive list of Maven Archetype (e.g.
> >> for Stanbol EnhancementEngines).
> >>
> >> As the Shell also depends on Scala the "+150MByte to the PermGen"
> >> issue also applies to the Shell.
> >>
> >> ### Security
> >>
> >> Having a security model in Apache Stanbol might be important for some
> >> use cases. Because of this I consider this an important topic. However
> >> one I have very little experience with.
> >>
> >> I would like to get rid of the dependencies to
> >> org.apache.clerezza:patform (AFAIK this is only needed for the
> >> configuration and this could be easily provided by the
> >> sling.properties file at runtime. Defaults can be provided in the
> >> commons.properties file already included in all Stanbol Launchers. I
> >> would also suggest to move the PermissionParser utility over to the
> >> Apache Stanbol Security modules.
> >> This two changes would allow to activate the security module also for
> >> the Stable (Stateless) launcher.
> >>
> >>
> >> best
> >> Rupert
> >>
> >>
> >> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> >> > Comments inline...
> >> >
> >> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
> >> wrote:
> >> >
> >> >> Ok, sorry for jumping into this discussion so lately. I've been
> having
> >> >> quite some discussion on the matter here at apacheconeu. Also I had
> >> >> prositive feedback from my resentation of Clerezza yesterday.
> >> >>
> >> >> I think two things:
> >> >> - For high level platform component it is often not clear if the fit
> >> better
> >> >> into Stanbol or into Clerezza
> >> >> - The RDF Api shoud actually be independen both from triple store
> >> provider
> >> >> as well as from consumer
> >> >>
> >> >> So I think a good solution would be to have the RDF liraries
> comprising:
> >> >> - A modular and very spec oriented API for RDF and related standards
> >> >> - A set of serializing and parsing providers
> >> >> - Adapters to triple stores (where the api isn't provided by the
> triple
> >> >> store)
> >> >> basically that's what in the org.apache.clerezza.rdf.* packages
> >> >>
> >> >> That's the stuff that would fit well into Stanbol. Provided that
> stanbol
> >> >> drops it's interretation of "REST" as "not for humans" and want to
> go to
> >> >> allow integrating (wherever possible as modular and optional
> components)
> >> >> media types designed for human consumptions and support REST
> approaches
> >> >> there as well (thinking of the current back-button unfriendly UI).
> >> >>
> >> >
> >> > IMO, Clerezza is just too big for existing committers. If we could
> reduce
> >> > it to the
> >> > essential components dealing with rdf and leaving out templating and
> >> > rendering,
> >> > it may be easier to graduate.
> >> >
> >> > - Scala Server Pages
> >> >> - TypeRendering (selection of templates based on the rdf type of the
> >> >> returned response)
> >> >> - Security (already integrated to some degree, code based security to
> >> run
> >> >> bundles in a sandboxed manner is not)
> >> >> - Shell (already ships in the stanbol launcher, so here it's about
> >> >> 'adopting' the sources)
> >> >> - Dev tools: rapid development support (create sample projects, have
> >> source
> >> >> files as bundles)
> >> >>
> >> >> To the attic:
> >> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as
> the
> >> >> same support (jax-rs components asosgi services) is now provided by
> >> apache
> >> >> wink
> >> >> -  jssr 223 support
> >> >>
> >> >> In my opinion there is no urgent need for action, it is true that
> there
> >> >> hasn't been a lot of action in clerezza but imho the project os
> going on
> >> >> even at a low pace  (as other projects like e.g. the recently
> graduated
> >> >> wink).
> >> >>
> >> >
> >> > Not sure about no urgent need for action. Maybe we should list the
> >> > requirements
> >> > to fulfil in order to be able to graduate. Wonder if we are able to
> meet
> >> > them.
> >> >
> >> > Cheers
> >> > Hasan
> >> >
> >> >
> >> >>
> >> >> Cheers,
> >> >> Reto
> >> >>
> >> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> >> >> bdelacretaz@apache.org
> >> >> > wrote:
> >> >>
> >> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
> >> wrote:
> >> >> > > ...It's good to have the existing released artifacts remain -
> what
> >> >> about
> >> >> > after
> >> >> > > the donation?
> >> >> > >
> >> >> > > Presumably the moved modules will be released by the new host -
> will
> >> >> they
> >> >> > > use group id org.apache.clerezza? or move to the new host project
> >> group
> >> >> > id?
> >> >> > > I'd suggest renaming the group to the new project but realise it
> is
> >> a
> >> >> bit
> >> >> > > more disruptive...
> >> >> >
> >> >> > I think that's really up to whatever project adopts that code. In
> >> >> > theory package names should change but that's probably not
> convenient.
> >> >> >
> >> >> > Or maybe it's time to create a semantic module or two at
> >> >> > http://commons.apache.org/ ? If existing committers are willing to
> >> >> > support that with their work it should be easy to make it happen.
> >> >> >
> >> >> > -Bertrand
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Reto, all

ad (2) "Type-Based rendering":

You are right MessageBodyWriters might not easily work as responses for
different types will use the same Java class and Content-Type.

If it is not much work I suggest to activate both variants (LDVieable) and the
template based mechanism in the suggested branch to refactor the Stanbol
RESTful/Web stuff. Maybe a clever use of the OSGI Whiteboard Pattern
could allow to include the hooks required for template based framework without
creating a strong dependency to any implementation.


ad Scala:

> Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?

It is mainly the 2nd. Stanbol runs on about 70MByte PermGen and as soon as
Scala starts this rises to ~200MByte.

However in case of the dev.iks-project.eu server where we had to run four
Stanbol instances on 9Gbyte RAM it was also a real concern related to memory.
In the meantime we where able to increase the RAM of the virtual host
to 15GByte.


ad Security:

I was not aware that the user/pwd was loaded from an RDF graph. I
was thinking that just the system properties used to init the system
where loaded
via Clerezza specific keys. My point was that the from the Stanbol Commons
Security module(s) to Clerezza seam to be very weak (in the sense that
they are only
using very few things of the referenced Clerezza modules). So my
question was if
we can/should remove them or if there are good reasons to keep them.


ad "Stanbol is no semantic CMS": you are right

> [..] the vision of an ecosystem of modular semantic and
> restful osgi components [..]

describes it much better. The important thing is that Stanbol can be used with
any CMS and does not require users to replace an existing CMS stack.

best
Rupert

On Sun, Nov 11, 2012 at 2:09 AM, Reto Bachmann-Gmür <re...@apache.org> wrote:

> (2) Type-Based rendering is [...]not something that can be implemented just by
> adding MessageBodyWriters as different RDF resources do not result in
> different java classes. For a framework providing resources as RDF typed
> based rendering seems the straight forward approach to allow these
> resources to be rendered in non rdf formats as well. For this we can still
> use Freemarker (with LDPath templates) but our legacy template that are
> require the class with the application logic to provide special hooks to
> the templates goes against the concept of having a plugable UI that can be
> left away for instances only to be used by machines. Keep in mind that an
> infrastructure for providing templates in a better way is already there
> since the introduction of LDVieable. Type Based rendering goes one step
> further as the jax-rs root resource would no longer have to provide the
> abstract template-path.
>
> (3)
> JSR-223 support: I suggested to drop this.
>
> Scala support: I'm wondering myself why there is such a big PermGenSpace
> need. I've just update clerezza trunk to use scala 2.9.2 this might have
> improved things a bit. As the compiler classloading mechanism is changed in
> 2.10 I guess a bigger improvement might come with that version. Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?
>
> Shell: The felix webconsole is there to install bunde, configure services
> and so on. What you can do with the shell is actually invoking these
> service's methods and explore exported package structures. Especially when
> exploring API's I'm not yet familiar with the shell has been of great
> benefit to me. Of course it's a module one can turn off.
>
> Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven skeletons
> can also be used as prototypes for the bundle-dev-tool (just some maven
> magic needed). Of course it's question of style and size of the module if
> one want the dynamic update and things working independently of the pom
> dependencies or prefers to compile and redeploy. In the trunk version of
> dev-tools there's also instant update for static files which makes it
> particularly convenient when editing css and javascript. As long as no
> duplication of archetype/skeleton is needed I don't see why not offer both
> maven archetypes and skeletons.
>
> Security:
> You're suggesting one should configure the user, their password and
> permission in some config files rather than storing them in RDF and having
> a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
> when we're talking about some launchers being stateless we mean that usage
> of  the (main) functionality it offers doesn't alter the state of the
> system. If you intepret "stateless" very strictly then you would have to
> drop most parts of the felix webconsole as http requests to install bundle
> or configure services aren't stateless. For the user-configuration a simple
> file-based TcProvider would of course be enough so no TDB is needed for
> that.
>
> I think we should see where we want to go as a community. For me the
> important thing is that Stanbol remains very modular. I think statements
> like "Stanbol is no semantic CMS" do not bring us further. It's important
> that the stanbol services can be used as services and that many services
> are stateless. But the contenthub is a component to manage content (the
> entityhub to some degree as well), do we want to mandate a horrible user
> interface just to comply with some catchphrase about what Stanbol is not?
> Or do we want to reduce Stanbol to the be just the Enhancer and let the
> other stuff to other projects?
>
> I'd rather go for the vision of an ecosystem of modular semantic and
> restful osgi components, but if the community wants to focus on the
> enhancer I think a clear statement should be made to avoid unnecessary
> arguments about memory consumption.
>
> Cheers,
> Reto
>
>
> On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi all,
>>
>> let me share my throughs. Because this mail is rather long I tried to
>> split it up in three separate section (1) RDF (2) RESTful/ Web
>> Interface and (3) other related topics
>>
>>
>> RDF libs:
>> ====
>>
>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>> Standards to evolve quite a bit in the coming years and I do have
>> concern that the Clerezza RDF modules will be updated/extended to
>> provide implementations of those. One example of such an situation is
>> SPARQL 1.1 that is around for quite some time and is still not
>> supported by Clerezza. While I do like the small API, the flexibility
>> to use different TripleStores and that Clerezza comes with OSGI
>> support I think given the current situation we would need to discuss
>> all options and those do also include a switch to Apache Jena or
>> Sesame. Especially Sesame would be an attractive option as their RDF
>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>> counterparts (Model [2] and Graph [3]) are considerable different and
>> more complex interfaces. In addition Jena will only change to
>> org.apache packages with the next major release so a switch before
>> that release would mean two incompatible API changes.
>>
>> My personal opinion is that we should keep using Clerezza for now.
>> Invest some effort to improve the Clerezza RDF modules and than see
>> how it further develops. Such an Effort should include
>>
>> *  to implement SPQRAL fast lane (as already discussed with Reto
>> during ApacheCon). Fast lane would allow Clerezza to use the native
>> SPARQL engine of the used Triplestore. Meaning that Clerezza only
>> parses those parts of the SPARQL query to understand the RDF graph to
>> execute the Query on. This information is than used to parse the query
>> to the native SPARQL engine via an extended Interface of the
>> TcProvide. The Clerezza SPARQL implementation would only be used in
>> case the TcProvider does not provide a native SPARQL implementation of
>> if the Query spans RDF graphs managed by different TcProvider
>> instances. By that Clerezza users would be able to use any SPARQL
>> feature provided by the used TripleStore.
>> * update to the newest Jena versions (see also STANBOL-621; Peter
>> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
>> Jena bundle used for the Stanbol/LMF integration [5])
>> * finish and release the SingleTdbDatasetTcProvider.java
>> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
>> component
>> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
>> code base to Clerezza and release it so that we can use it from their
>> in Stanbol
>> * provide an Clerezza JsonLD parser/serializer. This is critical for
>> Stanbol as several CMS use this as preferred RDF serialization.
>>
>> [1]
>> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
>> [2]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
>> [3]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
>> [4]
>> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
>> [5]
>> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
>>
>>
>> RESTful API / Web Interface:
>> =====================
>>
>> There are several shortcomings of the current implementation of the
>> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
>> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
>>
>> * Jersey's use of java.util.ServiceLoader forces the use manual
>> configuration of the JAX-RS components. A switch to an OSGI compatible
>> implementation such as Apache Wink would be very welcome
>> * The RESTful API documentation is currently written as HTML into
>> Freemarker templates. This makes it really hard to maintain this
>> documentation. I would really appreciate the possibility to use
>> markdown (as used on the Webpage) for that
>> * For Stanbol deployments of Stanbol it should be possible to exclude
>> the WebUI so that only the RESTful services are available
>>
>> regarding :
>>
>> > Stanbol drops it's interretation of "REST" as "not for humans" and want
>> to go to
>> > allow integrating (wherever possible as modular and optional components)
>> > media types designed for human consumptions and support REST approaches
>> > there as well (thinking of the current back-button unfriendly UI).
>>
>> Adding support for a simple Table based representation of RDF data
>> would indeed be an important feature. However having Resource (Entity)
>> type specific rendering is out of the scope of Apache Stanbol (at
>> least in my opinion). However AFAIK as soon as we switch to an OSGI
>> compatible JAX-RS implementation users could add those easily by
>> providing the according JAX-RS MessageBodyWriter.
>>
>> If there are people who would like to work it would be really great.
>> If we could (re)use some stuff from Clerezza - even better. But things
>> would need to keep simple as Stanbol is no semantic CMS.
>>
>> I would suggest to start development in an own branch and than have a
>> discussion/vote based on an early prototype/demonstration.
>>
>>
>> Other Topics
>> =========
>>
>> ### Scala and jsr 223 (scripting in the JVM)
>>
>> I do have an issue with Scala as it adds >150MByte to the PermGen as
>> soon as it is loaded. But as long as it is an optional dependency and
>> users are aware of that when adding the dependency I am fine with it.
>>
>> ###  Shell
>>
>> Personally I do not find the shell very useful. For installing
>> Bundles/Service configurations I prefer to use the Apache Sling
>> FileInstaller. For deployment during development I like to use the
>> Sling Maven Installer plugin. For creating new Stanbol Modules I
>> rather suggest to create an extensive list of Maven Archetype (e.g.
>> for Stanbol EnhancementEngines).
>>
>> As the Shell also depends on Scala the "+150MByte to the PermGen"
>> issue also applies to the Shell.
>>
>> ### Security
>>
>> Having a security model in Apache Stanbol might be important for some
>> use cases. Because of this I consider this an important topic. However
>> one I have very little experience with.
>>
>> I would like to get rid of the dependencies to
>> org.apache.clerezza:patform (AFAIK this is only needed for the
>> configuration and this could be easily provided by the
>> sling.properties file at runtime. Defaults can be provided in the
>> commons.properties file already included in all Stanbol Launchers. I
>> would also suggest to move the PermissionParser utility over to the
>> Apache Stanbol Security modules.
>> This two changes would allow to activate the security module also for
>> the Stable (Stateless) launcher.
>>
>>
>> best
>> Rupert
>>
>>
>> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
>> > Comments inline...
>> >
>> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
>> wrote:
>> >
>> >> Ok, sorry for jumping into this discussion so lately. I've been having
>> >> quite some discussion on the matter here at apacheconeu. Also I had
>> >> prositive feedback from my resentation of Clerezza yesterday.
>> >>
>> >> I think two things:
>> >> - For high level platform component it is often not clear if the fit
>> better
>> >> into Stanbol or into Clerezza
>> >> - The RDF Api shoud actually be independen both from triple store
>> provider
>> >> as well as from consumer
>> >>
>> >> So I think a good solution would be to have the RDF liraries comprising:
>> >> - A modular and very spec oriented API for RDF and related standards
>> >> - A set of serializing and parsing providers
>> >> - Adapters to triple stores (where the api isn't provided by the triple
>> >> store)
>> >> basically that's what in the org.apache.clerezza.rdf.* packages
>> >>
>> >> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> >> drops it's interretation of "REST" as "not for humans" and want to go to
>> >> allow integrating (wherever possible as modular and optional components)
>> >> media types designed for human consumptions and support REST approaches
>> >> there as well (thinking of the current back-button unfriendly UI).
>> >>
>> >
>> > IMO, Clerezza is just too big for existing committers. If we could reduce
>> > it to the
>> > essential components dealing with rdf and leaving out templating and
>> > rendering,
>> > it may be easier to graduate.
>> >
>> > - Scala Server Pages
>> >> - TypeRendering (selection of templates based on the rdf type of the
>> >> returned response)
>> >> - Security (already integrated to some degree, code based security to
>> run
>> >> bundles in a sandboxed manner is not)
>> >> - Shell (already ships in the stanbol launcher, so here it's about
>> >> 'adopting' the sources)
>> >> - Dev tools: rapid development support (create sample projects, have
>> source
>> >> files as bundles)
>> >>
>> >> To the attic:
>> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> >> same support (jax-rs components asosgi services) is now provided by
>> apache
>> >> wink
>> >> -  jssr 223 support
>> >>
>> >> In my opinion there is no urgent need for action, it is true that there
>> >> hasn't been a lot of action in clerezza but imho the project os going on
>> >> even at a low pace  (as other projects like e.g. the recently graduated
>> >> wink).
>> >>
>> >
>> > Not sure about no urgent need for action. Maybe we should list the
>> > requirements
>> > to fulfil in order to be able to graduate. Wonder if we are able to meet
>> > them.
>> >
>> > Cheers
>> > Hasan
>> >
>> >
>> >>
>> >> Cheers,
>> >> Reto
>> >>
>> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> >> bdelacretaz@apache.org
>> >> > wrote:
>> >>
>> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
>> wrote:
>> >> > > ...It's good to have the existing released artifacts remain - what
>> >> about
>> >> > after
>> >> > > the donation?
>> >> > >
>> >> > > Presumably the moved modules will be released by the new host - will
>> >> they
>> >> > > use group id org.apache.clerezza? or move to the new host project
>> group
>> >> > id?
>> >> > > I'd suggest renaming the group to the new project but realise it is
>> a
>> >> bit
>> >> > > more disruptive...
>> >> >
>> >> > I think that's really up to whatever project adopts that code. In
>> >> > theory package names should change but that's probably not convenient.
>> >> >
>> >> > Or maybe it's time to create a semantic module or two at
>> >> > http://commons.apache.org/ ? If existing committers are willing to
>> >> > support that with their work it should be easy to make it happen.
>> >> >
>> >> > -Bertrand
>> >> >
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Reto, all

ad (2) "Type-Based rendering":

You are right MessageBodyWriters might not easily work as responses for
different types will use the same Java class and Content-Type.

If it is not much work I suggest to activate both variants (LDVieable) and the
template based mechanism in the suggested branch to refactor the Stanbol
RESTful/Web stuff. Maybe a clever use of the OSGI Whiteboard Pattern
could allow to include the hooks required for template based framework without
creating a strong dependency to any implementation.


ad Scala:

> Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?

It is mainly the 2nd. Stanbol runs on about 70MByte PermGen and as soon as
Scala starts this rises to ~200MByte.

However in case of the dev.iks-project.eu server where we had to run four
Stanbol instances on 9Gbyte RAM it was also a real concern related to memory.
In the meantime we where able to increase the RAM of the virtual host
to 15GByte.


ad Security:

I was not aware that the user/pwd was loaded from an RDF graph. I
was thinking that just the system properties used to init the system
where loaded
via Clerezza specific keys. My point was that the from the Stanbol Commons
Security module(s) to Clerezza seam to be very weak (in the sense that
they are only
using very few things of the referenced Clerezza modules). So my
question was if
we can/should remove them or if there are good reasons to keep them.


ad "Stanbol is no semantic CMS": you are right

> [..] the vision of an ecosystem of modular semantic and
> restful osgi components [..]

describes it much better. The important thing is that Stanbol can be used with
any CMS and does not require users to replace an existing CMS stack.

best
Rupert

On Sun, Nov 11, 2012 at 2:09 AM, Reto Bachmann-Gmür <re...@apache.org> wrote:

> (2) Type-Based rendering is [...]not something that can be implemented just by
> adding MessageBodyWriters as different RDF resources do not result in
> different java classes. For a framework providing resources as RDF typed
> based rendering seems the straight forward approach to allow these
> resources to be rendered in non rdf formats as well. For this we can still
> use Freemarker (with LDPath templates) but our legacy template that are
> require the class with the application logic to provide special hooks to
> the templates goes against the concept of having a plugable UI that can be
> left away for instances only to be used by machines. Keep in mind that an
> infrastructure for providing templates in a better way is already there
> since the introduction of LDVieable. Type Based rendering goes one step
> further as the jax-rs root resource would no longer have to provide the
> abstract template-path.
>
> (3)
> JSR-223 support: I suggested to drop this.
>
> Scala support: I'm wondering myself why there is such a big PermGenSpace
> need. I've just update clerezza trunk to use scala 2.9.2 this might have
> improved things a bit. As the compiler classloading mechanism is changed in
> 2.10 I guess a bigger improvement might come with that version. Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?
>
> Shell: The felix webconsole is there to install bunde, configure services
> and so on. What you can do with the shell is actually invoking these
> service's methods and explore exported package structures. Especially when
> exploring API's I'm not yet familiar with the shell has been of great
> benefit to me. Of course it's a module one can turn off.
>
> Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven skeletons
> can also be used as prototypes for the bundle-dev-tool (just some maven
> magic needed). Of course it's question of style and size of the module if
> one want the dynamic update and things working independently of the pom
> dependencies or prefers to compile and redeploy. In the trunk version of
> dev-tools there's also instant update for static files which makes it
> particularly convenient when editing css and javascript. As long as no
> duplication of archetype/skeleton is needed I don't see why not offer both
> maven archetypes and skeletons.
>
> Security:
> You're suggesting one should configure the user, their password and
> permission in some config files rather than storing them in RDF and having
> a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
> when we're talking about some launchers being stateless we mean that usage
> of  the (main) functionality it offers doesn't alter the state of the
> system. If you intepret "stateless" very strictly then you would have to
> drop most parts of the felix webconsole as http requests to install bundle
> or configure services aren't stateless. For the user-configuration a simple
> file-based TcProvider would of course be enough so no TDB is needed for
> that.
>
> I think we should see where we want to go as a community. For me the
> important thing is that Stanbol remains very modular. I think statements
> like "Stanbol is no semantic CMS" do not bring us further. It's important
> that the stanbol services can be used as services and that many services
> are stateless. But the contenthub is a component to manage content (the
> entityhub to some degree as well), do we want to mandate a horrible user
> interface just to comply with some catchphrase about what Stanbol is not?
> Or do we want to reduce Stanbol to the be just the Enhancer and let the
> other stuff to other projects?
>
> I'd rather go for the vision of an ecosystem of modular semantic and
> restful osgi components, but if the community wants to focus on the
> enhancer I think a clear statement should be made to avoid unnecessary
> arguments about memory consumption.
>
> Cheers,
> Reto
>
>
> On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi all,
>>
>> let me share my throughs. Because this mail is rather long I tried to
>> split it up in three separate section (1) RDF (2) RESTful/ Web
>> Interface and (3) other related topics
>>
>>
>> RDF libs:
>> ====
>>
>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>> Standards to evolve quite a bit in the coming years and I do have
>> concern that the Clerezza RDF modules will be updated/extended to
>> provide implementations of those. One example of such an situation is
>> SPARQL 1.1 that is around for quite some time and is still not
>> supported by Clerezza. While I do like the small API, the flexibility
>> to use different TripleStores and that Clerezza comes with OSGI
>> support I think given the current situation we would need to discuss
>> all options and those do also include a switch to Apache Jena or
>> Sesame. Especially Sesame would be an attractive option as their RDF
>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>> counterparts (Model [2] and Graph [3]) are considerable different and
>> more complex interfaces. In addition Jena will only change to
>> org.apache packages with the next major release so a switch before
>> that release would mean two incompatible API changes.
>>
>> My personal opinion is that we should keep using Clerezza for now.
>> Invest some effort to improve the Clerezza RDF modules and than see
>> how it further develops. Such an Effort should include
>>
>> *  to implement SPQRAL fast lane (as already discussed with Reto
>> during ApacheCon). Fast lane would allow Clerezza to use the native
>> SPARQL engine of the used Triplestore. Meaning that Clerezza only
>> parses those parts of the SPARQL query to understand the RDF graph to
>> execute the Query on. This information is than used to parse the query
>> to the native SPARQL engine via an extended Interface of the
>> TcProvide. The Clerezza SPARQL implementation would only be used in
>> case the TcProvider does not provide a native SPARQL implementation of
>> if the Query spans RDF graphs managed by different TcProvider
>> instances. By that Clerezza users would be able to use any SPARQL
>> feature provided by the used TripleStore.
>> * update to the newest Jena versions (see also STANBOL-621; Peter
>> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
>> Jena bundle used for the Stanbol/LMF integration [5])
>> * finish and release the SingleTdbDatasetTcProvider.java
>> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
>> component
>> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
>> code base to Clerezza and release it so that we can use it from their
>> in Stanbol
>> * provide an Clerezza JsonLD parser/serializer. This is critical for
>> Stanbol as several CMS use this as preferred RDF serialization.
>>
>> [1]
>> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
>> [2]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
>> [3]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
>> [4]
>> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
>> [5]
>> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
>>
>>
>> RESTful API / Web Interface:
>> =====================
>>
>> There are several shortcomings of the current implementation of the
>> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
>> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
>>
>> * Jersey's use of java.util.ServiceLoader forces the use manual
>> configuration of the JAX-RS components. A switch to an OSGI compatible
>> implementation such as Apache Wink would be very welcome
>> * The RESTful API documentation is currently written as HTML into
>> Freemarker templates. This makes it really hard to maintain this
>> documentation. I would really appreciate the possibility to use
>> markdown (as used on the Webpage) for that
>> * For Stanbol deployments of Stanbol it should be possible to exclude
>> the WebUI so that only the RESTful services are available
>>
>> regarding :
>>
>> > Stanbol drops it's interretation of "REST" as "not for humans" and want
>> to go to
>> > allow integrating (wherever possible as modular and optional components)
>> > media types designed for human consumptions and support REST approaches
>> > there as well (thinking of the current back-button unfriendly UI).
>>
>> Adding support for a simple Table based representation of RDF data
>> would indeed be an important feature. However having Resource (Entity)
>> type specific rendering is out of the scope of Apache Stanbol (at
>> least in my opinion). However AFAIK as soon as we switch to an OSGI
>> compatible JAX-RS implementation users could add those easily by
>> providing the according JAX-RS MessageBodyWriter.
>>
>> If there are people who would like to work it would be really great.
>> If we could (re)use some stuff from Clerezza - even better. But things
>> would need to keep simple as Stanbol is no semantic CMS.
>>
>> I would suggest to start development in an own branch and than have a
>> discussion/vote based on an early prototype/demonstration.
>>
>>
>> Other Topics
>> =========
>>
>> ### Scala and jsr 223 (scripting in the JVM)
>>
>> I do have an issue with Scala as it adds >150MByte to the PermGen as
>> soon as it is loaded. But as long as it is an optional dependency and
>> users are aware of that when adding the dependency I am fine with it.
>>
>> ###  Shell
>>
>> Personally I do not find the shell very useful. For installing
>> Bundles/Service configurations I prefer to use the Apache Sling
>> FileInstaller. For deployment during development I like to use the
>> Sling Maven Installer plugin. For creating new Stanbol Modules I
>> rather suggest to create an extensive list of Maven Archetype (e.g.
>> for Stanbol EnhancementEngines).
>>
>> As the Shell also depends on Scala the "+150MByte to the PermGen"
>> issue also applies to the Shell.
>>
>> ### Security
>>
>> Having a security model in Apache Stanbol might be important for some
>> use cases. Because of this I consider this an important topic. However
>> one I have very little experience with.
>>
>> I would like to get rid of the dependencies to
>> org.apache.clerezza:patform (AFAIK this is only needed for the
>> configuration and this could be easily provided by the
>> sling.properties file at runtime. Defaults can be provided in the
>> commons.properties file already included in all Stanbol Launchers. I
>> would also suggest to move the PermissionParser utility over to the
>> Apache Stanbol Security modules.
>> This two changes would allow to activate the security module also for
>> the Stable (Stateless) launcher.
>>
>>
>> best
>> Rupert
>>
>>
>> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
>> > Comments inline...
>> >
>> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
>> wrote:
>> >
>> >> Ok, sorry for jumping into this discussion so lately. I've been having
>> >> quite some discussion on the matter here at apacheconeu. Also I had
>> >> prositive feedback from my resentation of Clerezza yesterday.
>> >>
>> >> I think two things:
>> >> - For high level platform component it is often not clear if the fit
>> better
>> >> into Stanbol or into Clerezza
>> >> - The RDF Api shoud actually be independen both from triple store
>> provider
>> >> as well as from consumer
>> >>
>> >> So I think a good solution would be to have the RDF liraries comprising:
>> >> - A modular and very spec oriented API for RDF and related standards
>> >> - A set of serializing and parsing providers
>> >> - Adapters to triple stores (where the api isn't provided by the triple
>> >> store)
>> >> basically that's what in the org.apache.clerezza.rdf.* packages
>> >>
>> >> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> >> drops it's interretation of "REST" as "not for humans" and want to go to
>> >> allow integrating (wherever possible as modular and optional components)
>> >> media types designed for human consumptions and support REST approaches
>> >> there as well (thinking of the current back-button unfriendly UI).
>> >>
>> >
>> > IMO, Clerezza is just too big for existing committers. If we could reduce
>> > it to the
>> > essential components dealing with rdf and leaving out templating and
>> > rendering,
>> > it may be easier to graduate.
>> >
>> > - Scala Server Pages
>> >> - TypeRendering (selection of templates based on the rdf type of the
>> >> returned response)
>> >> - Security (already integrated to some degree, code based security to
>> run
>> >> bundles in a sandboxed manner is not)
>> >> - Shell (already ships in the stanbol launcher, so here it's about
>> >> 'adopting' the sources)
>> >> - Dev tools: rapid development support (create sample projects, have
>> source
>> >> files as bundles)
>> >>
>> >> To the attic:
>> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> >> same support (jax-rs components asosgi services) is now provided by
>> apache
>> >> wink
>> >> -  jssr 223 support
>> >>
>> >> In my opinion there is no urgent need for action, it is true that there
>> >> hasn't been a lot of action in clerezza but imho the project os going on
>> >> even at a low pace  (as other projects like e.g. the recently graduated
>> >> wink).
>> >>
>> >
>> > Not sure about no urgent need for action. Maybe we should list the
>> > requirements
>> > to fulfil in order to be able to graduate. Wonder if we are able to meet
>> > them.
>> >
>> > Cheers
>> > Hasan
>> >
>> >
>> >>
>> >> Cheers,
>> >> Reto
>> >>
>> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> >> bdelacretaz@apache.org
>> >> > wrote:
>> >>
>> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
>> wrote:
>> >> > > ...It's good to have the existing released artifacts remain - what
>> >> about
>> >> > after
>> >> > > the donation?
>> >> > >
>> >> > > Presumably the moved modules will be released by the new host - will
>> >> they
>> >> > > use group id org.apache.clerezza? or move to the new host project
>> group
>> >> > id?
>> >> > > I'd suggest renaming the group to the new project but realise it is
>> a
>> >> bit
>> >> > > more disruptive...
>> >> >
>> >> > I think that's really up to whatever project adopts that code. In
>> >> > theory package names should change but that's probably not convenient.
>> >> >
>> >> > Or maybe it's time to create a semantic module or two at
>> >> > http://commons.apache.org/ ? If existing committers are willing to
>> >> > support that with their work it should be easy to make it happen.
>> >> >
>> >> > -Bertrand
>> >> >
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Rupert and all,

(1) I agree with what you shay regarding the RDF api and think to keep this
effort more sustainable while not running the risk of polluting the api
with implementation specific requirements to graduate clerezza as apache
commons.rdf for that.

(2) Type-Based rendering is not something that can be implemented just by
adding MessageBodyWriters as different RDF resources do not result in
different java classes. For a framework providing resources as RDF typed
based rendering seems the straight forward approach to allow these
resources to be rendered in non rdf formats as well. For this we can still
use Freemarker (with LDPath templates) but our legacy template that are
require the class with the application logic to provide special hooks to
the templates goes against the concept of having a plugable UI that can be
left away for instances only to be used by machines. Keep in mind that an
infrastructure for providing templates in a better way is already there
since the introduction of LDVieable. Type Based rendering goes one step
further as the jax-rs root resource would no longer have to provide the
abstract template-path.

(3)
JSR-223 support: I suggested to drop this.

Scala support: I'm wondering myself why there is such a big PermGenSpace
need. I've just update clerezza trunk to use scala 2.9.2 this might have
improved things a bit. As the compiler classloading mechanism is changed in
2.10 I guess a bigger improvement might come with that version. Do you know
about user having a concrete issue with the additional ram requirement or
is it more the fact that's not nice having memory used without clear reason
that's bothering you?

Shell: The felix webconsole is there to install bunde, configure services
and so on. What you can do with the shell is actually invoking these
service's methods and explore exported package structures. Especially when
exploring API's I'm not yet familiar with the shell has been of great
benefit to me. Of course it's a module one can turn off.

Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven skeletons
can also be used as prototypes for the bundle-dev-tool (just some maven
magic needed). Of course it's question of style and size of the module if
one want the dynamic update and things working independently of the pom
dependencies or prefers to compile and redeploy. In the trunk version of
dev-tools there's also instant update for static files which makes it
particularly convenient when editing css and javascript. As long as no
duplication of archetype/skeleton is needed I don't see why not offer both
maven archetypes and skeletons.

Security:
You're suggesting one should configure the user, their password and
permission in some config files rather than storing them in RDF and having
a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
when we're talking about some launchers being stateless we mean that usage
of  the (main) functionality it offers doesn't alter the state of the
system. If you intepret "stateless" very strictly then you would have to
drop most parts of the felix webconsole as http requests to install bundle
or configure services aren't stateless. For the user-configuration a simple
file-based TcProvider would of course be enough so no TDB is needed for
that.

I think we should see where we want to go as a community. For me the
important thing is that Stanbol remains very modular. I think statements
like "Stanbol is no semantic CMS" do not bring us further. It's important
that the stanbol services can be used as services and that many services
are stateless. But the contenthub is a component to manage content (the
entityhub to some degree as well), do we want to mandate a horrible user
interface just to comply with some catchphrase about what Stanbol is not?
Or do we want to reduce Stanbol to the be just the Enhancer and let the
other stuff to other projects?

I'd rather go for the vision of an ecosystem of modular semantic and
restful osgi components, but if the community wants to focus on the
enhancer I think a clear statement should be made to avoid unnecessary
arguments about memory consumption.

Cheers,
Reto


On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all,
>
> let me share my throughs. Because this mail is rather long I tried to
> split it up in three separate section (1) RDF (2) RESTful/ Web
> Interface and (3) other related topics
>
>
> RDF libs:
> ====
>
> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> if it makes sense to manage an own RDF API. I expect the Semantic Web
> Standards to evolve quite a bit in the coming years and I do have
> concern that the Clerezza RDF modules will be updated/extended to
> provide implementations of those. One example of such an situation is
> SPARQL 1.1 that is around for quite some time and is still not
> supported by Clerezza. While I do like the small API, the flexibility
> to use different TripleStores and that Clerezza comes with OSGI
> support I think given the current situation we would need to discuss
> all options and those do also include a switch to Apache Jena or
> Sesame. Especially Sesame would be an attractive option as their RDF
> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> counterparts (Model [2] and Graph [3]) are considerable different and
> more complex interfaces. In addition Jena will only change to
> org.apache packages with the next major release so a switch before
> that release would mean two incompatible API changes.
>
> My personal opinion is that we should keep using Clerezza for now.
> Invest some effort to improve the Clerezza RDF modules and than see
> how it further develops. Such an Effort should include
>
> *  to implement SPQRAL fast lane (as already discussed with Reto
> during ApacheCon). Fast lane would allow Clerezza to use the native
> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> parses those parts of the SPARQL query to understand the RDF graph to
> execute the Query on. This information is than used to parse the query
> to the native SPARQL engine via an extended Interface of the
> TcProvide. The Clerezza SPARQL implementation would only be used in
> case the TcProvider does not provide a native SPARQL implementation of
> if the Query spans RDF graphs managed by different TcProvider
> instances. By that Clerezza users would be able to use any SPARQL
> feature provided by the used TripleStore.
> * update to the newest Jena versions (see also STANBOL-621; Peter
> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> Jena bundle used for the Stanbol/LMF integration [5])
> * finish and release the SingleTdbDatasetTcProvider.java
> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> component
> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> code base to Clerezza and release it so that we can use it from their
> in Stanbol
> * provide an Clerezza JsonLD parser/serializer. This is critical for
> Stanbol as several CMS use this as preferred RDF serialization.
>
> [1]
> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
> [2]
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
> [3]
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
> [4]
> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
> [5]
> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
>
>
> RESTful API / Web Interface:
> =====================
>
> There are several shortcomings of the current implementation of the
> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
>
> * Jersey's use of java.util.ServiceLoader forces the use manual
> configuration of the JAX-RS components. A switch to an OSGI compatible
> implementation such as Apache Wink would be very welcome
> * The RESTful API documentation is currently written as HTML into
> Freemarker templates. This makes it really hard to maintain this
> documentation. I would really appreciate the possibility to use
> markdown (as used on the Webpage) for that
> * For Stanbol deployments of Stanbol it should be possible to exclude
> the WebUI so that only the RESTful services are available
>
> regarding :
>
> > Stanbol drops it's interretation of "REST" as "not for humans" and want
> to go to
> > allow integrating (wherever possible as modular and optional components)
> > media types designed for human consumptions and support REST approaches
> > there as well (thinking of the current back-button unfriendly UI).
>
> Adding support for a simple Table based representation of RDF data
> would indeed be an important feature. However having Resource (Entity)
> type specific rendering is out of the scope of Apache Stanbol (at
> least in my opinion). However AFAIK as soon as we switch to an OSGI
> compatible JAX-RS implementation users could add those easily by
> providing the according JAX-RS MessageBodyWriter.
>
> If there are people who would like to work it would be really great.
> If we could (re)use some stuff from Clerezza - even better. But things
> would need to keep simple as Stanbol is no semantic CMS.
>
> I would suggest to start development in an own branch and than have a
> discussion/vote based on an early prototype/demonstration.
>
>
> Other Topics
> =========
>
> ### Scala and jsr 223 (scripting in the JVM)
>
> I do have an issue with Scala as it adds >150MByte to the PermGen as
> soon as it is loaded. But as long as it is an optional dependency and
> users are aware of that when adding the dependency I am fine with it.
>
> ###  Shell
>
> Personally I do not find the shell very useful. For installing
> Bundles/Service configurations I prefer to use the Apache Sling
> FileInstaller. For deployment during development I like to use the
> Sling Maven Installer plugin. For creating new Stanbol Modules I
> rather suggest to create an extensive list of Maven Archetype (e.g.
> for Stanbol EnhancementEngines).
>
> As the Shell also depends on Scala the "+150MByte to the PermGen"
> issue also applies to the Shell.
>
> ### Security
>
> Having a security model in Apache Stanbol might be important for some
> use cases. Because of this I consider this an important topic. However
> one I have very little experience with.
>
> I would like to get rid of the dependencies to
> org.apache.clerezza:patform (AFAIK this is only needed for the
> configuration and this could be easily provided by the
> sling.properties file at runtime. Defaults can be provided in the
> commons.properties file already included in all Stanbol Launchers. I
> would also suggest to move the PermissionParser utility over to the
> Apache Stanbol Security modules.
> This two changes would allow to activate the security module also for
> the Stable (Stateless) launcher.
>
>
> best
> Rupert
>
>
> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> > Comments inline...
> >
> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
> >
> >> Ok, sorry for jumping into this discussion so lately. I've been having
> >> quite some discussion on the matter here at apacheconeu. Also I had
> >> prositive feedback from my resentation of Clerezza yesterday.
> >>
> >> I think two things:
> >> - For high level platform component it is often not clear if the fit
> better
> >> into Stanbol or into Clerezza
> >> - The RDF Api shoud actually be independen both from triple store
> provider
> >> as well as from consumer
> >>
> >> So I think a good solution would be to have the RDF liraries comprising:
> >> - A modular and very spec oriented API for RDF and related standards
> >> - A set of serializing and parsing providers
> >> - Adapters to triple stores (where the api isn't provided by the triple
> >> store)
> >> basically that's what in the org.apache.clerezza.rdf.* packages
> >>
> >> That's the stuff that would fit well into Stanbol. Provided that stanbol
> >> drops it's interretation of "REST" as "not for humans" and want to go to
> >> allow integrating (wherever possible as modular and optional components)
> >> media types designed for human consumptions and support REST approaches
> >> there as well (thinking of the current back-button unfriendly UI).
> >>
> >
> > IMO, Clerezza is just too big for existing committers. If we could reduce
> > it to the
> > essential components dealing with rdf and leaving out templating and
> > rendering,
> > it may be easier to graduate.
> >
> > - Scala Server Pages
> >> - TypeRendering (selection of templates based on the rdf type of the
> >> returned response)
> >> - Security (already integrated to some degree, code based security to
> run
> >> bundles in a sandboxed manner is not)
> >> - Shell (already ships in the stanbol launcher, so here it's about
> >> 'adopting' the sources)
> >> - Dev tools: rapid development support (create sample projects, have
> source
> >> files as bundles)
> >>
> >> To the attic:
> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
> >> same support (jax-rs components asosgi services) is now provided by
> apache
> >> wink
> >> -  jssr 223 support
> >>
> >> In my opinion there is no urgent need for action, it is true that there
> >> hasn't been a lot of action in clerezza but imho the project os going on
> >> even at a low pace  (as other projects like e.g. the recently graduated
> >> wink).
> >>
> >
> > Not sure about no urgent need for action. Maybe we should list the
> > requirements
> > to fulfil in order to be able to graduate. Wonder if we are able to meet
> > them.
> >
> > Cheers
> > Hasan
> >
> >
> >>
> >> Cheers,
> >> Reto
> >>
> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> >> bdelacretaz@apache.org
> >> > wrote:
> >>
> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
> wrote:
> >> > > ...It's good to have the existing released artifacts remain - what
> >> about
> >> > after
> >> > > the donation?
> >> > >
> >> > > Presumably the moved modules will be released by the new host - will
> >> they
> >> > > use group id org.apache.clerezza? or move to the new host project
> group
> >> > id?
> >> > > I'd suggest renaming the group to the new project but realise it is
> a
> >> bit
> >> > > more disruptive...
> >> >
> >> > I think that's really up to whatever project adopts that code. In
> >> > theory package names should change but that's probably not convenient.
> >> >
> >> > Or maybe it's time to create a semantic module or two at
> >> > http://commons.apache.org/ ? If existing committers are willing to
> >> > support that with their work it should be easy to make it happen.
> >> >
> >> > -Bertrand
> >> >
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
(which list is this discussion really on? :-)

On 09/11/12 09:56, Rupert Westenthaler wrote:
 > RDF libs:
 > ====
 >
 > Out of the viewpoint of Apache Stanbol one needs to ask the Question
 > if it makes sense to manage an own RDF API. I expect the Semantic Web
 > Standards to evolve quite a bit in the coming years and I do have
 > concern that the Clerezza RDF modules will be updated/extended to
 > provide implementations of those. One example of such an situation is
 > SPARQL 1.1 that is around for quite some time and is still not
 > supported by Clerezza. While I do like the small API, the flexibility
 > to use different TripleStores and that Clerezza comes with OSGI
 > support I think given the current situation we would need to discuss
 > all options and those do also include a switch to Apache Jena or
 > Sesame. Especially Sesame would be an attractive option as their RDF
 > Graph API [1] is very similar to what Clerezza uses. Apache Jena's
 > counterparts (Model [2] and Graph [3]) are considerable different and
 > more complex interfaces. In addition Jena will only change to
 > org.apache packages with the next major release so a switch before
 > that release would mean two incompatible API changes.

Jena isn't changing the packaging as such -- what we've discussed is 
providing a package for the current API and then a new, org.apache API. 
  The new API may be much the same as the existing one or it may be 
different - that depends on contributions made!

I'd like to hear more about your experiences esp. with Graph API as that 
is supposed to be quite simple - it's targeted at storage extensions as 
well as supporting the richer Model API.  Personally, aside from the 
fact that Clerreza enforces slot constraints (no literals as subjects), 
the Jena Graph API and Clerezza RDF core API seem reasonably aligned.

(for generalised systems such as rules engine - and for SPARQL - triples 
can arise with extras like literals as subjects; they get removed later)

     Andy

Re: Future of Clerezza and Stanbol

Posted by Andy Seaborne <an...@apache.org>.
(which list is this discussion really on? :-)

On 09/11/12 09:56, Rupert Westenthaler wrote:
 > RDF libs:
 > ====
 >
 > Out of the viewpoint of Apache Stanbol one needs to ask the Question
 > if it makes sense to manage an own RDF API. I expect the Semantic Web
 > Standards to evolve quite a bit in the coming years and I do have
 > concern that the Clerezza RDF modules will be updated/extended to
 > provide implementations of those. One example of such an situation is
 > SPARQL 1.1 that is around for quite some time and is still not
 > supported by Clerezza. While I do like the small API, the flexibility
 > to use different TripleStores and that Clerezza comes with OSGI
 > support I think given the current situation we would need to discuss
 > all options and those do also include a switch to Apache Jena or
 > Sesame. Especially Sesame would be an attractive option as their RDF
 > Graph API [1] is very similar to what Clerezza uses. Apache Jena's
 > counterparts (Model [2] and Graph [3]) are considerable different and
 > more complex interfaces. In addition Jena will only change to
 > org.apache packages with the next major release so a switch before
 > that release would mean two incompatible API changes.

Jena isn't changing the packaging as such -- what we've discussed is 
providing a package for the current API and then a new, org.apache API. 
  The new API may be much the same as the existing one or it may be 
different - that depends on contributions made!

I'd like to hear more about your experiences esp. with Graph API as that 
is supposed to be quite simple - it's targeted at storage extensions as 
well as supporting the richer Model API.  Personally, aside from the 
fact that Clerreza enforces slot constraints (no literals as subjects), 
the Jena Graph API and Clerezza RDF core API seem reasonably aligned.

(for generalised systems such as rules engine - and for SPARQL - triples 
can arise with extras like literals as subjects; they get removed later)

     Andy

Re: Future of Clerezza and Stanbol

Posted by adasal <ad...@gmail.com>.
Hello,

(Some points below made by others in interim while composing this.)

Whether or not I become a Stanbol/Clerezza consumer or developer I am at
the moment following the list and learning what I can.

For my own interest what was the solution here?

> The biggest design issue for me was that every graph was loaded into
> memory in bulk. The
> underlying reason for this seemed to be that java.util.Iterator does
> not have a close method, so there is no way of knowing when to release
> resources if an iterator is not used to completion.


Skipping down a bit -

> OSGi cannot do magic and set private fields, the compiled classes do have
> bind and unbind
> methods for the private fields, these methods are added by the maven felix
> scr-plugin.


I can see misunderstanding this. An IoC container would set private fields.
I'm wondering about the OSGi non-OSGi difference of approach in this thread.
Has that contributed (in some way) to the way some of the coding has been
approached?

Peter obviously is making some very good points.
I should add that developers generally prefer the ability to configure and
initialise programatically.
In the good (bad) old days Sun/IBM actually thought that the configuration
engineer was a separate bod to the integration and development engineer.
I have never actually come across this in practice.
What I have experienced is the irritation of dealing with external files
for IoC configuration which is a context switch from Java.
There is also the issue of where those files should be found in a
horizontal scale out.
I think that the move to all in code configuration from Spring 2.x to 3.x
and also the JEE environment evidences this.
I honestly don't know enough about OSGi to comment further. I realise it
deals with classpath loading issues and dependency.
But it does seem to me that the code should adopt the conventions now used
by non-OSGi deployables, that is a balance of abstract classes, interfaces
and factory methods.
Again I do not know about the possible role of an IoC container here. It
seems a possibility?

w.r.t. Peter's point about Subversion, Git and single release points this
seems like a helpful approach. Stanbol is mirrored on github, of course, so
does that not fulfil this point?
It seems a good point that as RDF2Go is not very active at the moment there
must be a gap in this area.
On the other hand Stanbol should be able to play nice with Sesame, perhaps
by concentrating on integration with either the repository interfaces or
SAIL - the goal of Stanbol is to be a used tool.

I take Reto's point about the difference between Clerezza and Sesame. So
the plus point about Cleressa is that it is an ideal fit to NoSQL rather
than dedicated triple stores. If I understand, this is an important adjunct
to the RDF stack.



Best,


Adam

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Peter,

I think it would be good to improve the rdf libraries for non OSGi usage,
your suggestion sound very reasonable. The design was strongly driven by
the requirements in OSGi environments. Your interest in these libraries
reinforces my believe that they should be maintained as an independent
project.

There are also some naming issues to discuss.

URIRef -> IRI
TcProvider -> Dataset
TcManager -> DefaultDataset

I suggest to discuss improvement of the API outside this future of
clerezza/stanbol thread.

Cheers,
Reto

On Mon, Nov 12, 2012 at 2:15 AM, Peter Ansell <an...@gmail.com>wrote:

> On 12 November 2012 09:59, Reto Bachmann-Gmür <re...@apache.org> wrote:
> > Hi Peter and all,
> >
> > Good to read about your experiments.Just a first comment:
> >
> > In addition, I did not want to use OSGI, so I had to make changes in
> >> many cases to allow a completely programmatic instantiation of
> >> components, as some fields were left private with no mutator method
> >> and in some cases no public contructor that could be used to populate
> >> the field programmatically. For all of the good that OSGI may provide
> >> for otherwise complex systems, it is not good Java software
> >> engineering to make fields private.
> >>
> >
> > The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot
> do
> > magic and set private fields, the compiled classes do have bind and
> unbind
> > methods for the private fields, these methods are added by the maven
> felix
> > scr-plugin.  For locating dependencies outside OSGi the META-INF/services
> > method is used so that for example one can add a serializitaion provider
> > seimply by adding it to the classpath without requiring and manual
> binding.
>
> Sorry, I was under the impression that OSGi could actually do Java
> reflection magic to inject dependencies directly into private fields
> based on annotations without having any alternative method of setting
> the field for regular plain old java users. :)
>
> In general I would like if OSGi classes that currently rely on
> bind/unbind, still offered public mutator methods and a public
> initialise/deinitialise method for any work that needs to be done
> after using the mutator methods. The bind/unbind methodology from
> memory when I was working on Clerezza/Stanbol, seemed to require that
> all of the mutators were run immediately and the initialise was
> automatically run, without offering any other possible sequence.
>
> Additionally, offering public mutators and a public initialise method
> gives the added benefit of compile-time typesafety for plain old java
> users, which a bind method taking a Dictionary<String, Object>
> parameter does not provide.
>
> In addition, from memory I think some of the bind methods were
> protected, and not public, which means they are not directly
> accessible, without resorting to using reflection or subclassing just
> to be able to call bind.
>
> I use META-INF/services heavily in my projects, and I rely on it when
> using Sesame and with my extensions to OWLAPI. I extended OWLAPI to
> use Sesame META-INF/services dependencies to find
> serialisation/parsing providers for OWLAPI based on the Sesame
> parser/writer services that are available on the classpath. However, I
> always try to make sure that the use of the automatically populated
> service registries is optional, so that users can populate their own
> registries from scratch using purely programmatic methods, and they do
> not have to resort to modifying global singleton registries as one
> does when using Jena.
>
> The services that I register in META-INF/services are always factories
> based on interfaces, so that dependencies can be passed into type-safe
> java "createServiceInstance" methods when creating instances of the
> service using the factory instance. This means that it does not matter
> if the java.util.ServiceLoader loads classes in a different order, as
> the actual objects are created from the factories explicitly by users,
> with or without a key to specify which instance of the service they
> require/prefer.
>
> Cheers,
>
> Peter
>

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Peter,

I think it would be good to improve the rdf libraries for non OSGi usage,
your suggestion sound very reasonable. The design was strongly driven by
the requirements in OSGi environments. Your interest in these libraries
reinforces my believe that they should be maintained as an independent
project.

There are also some naming issues to discuss.

URIRef -> IRI
TcProvider -> Dataset
TcManager -> DefaultDataset

I suggest to discuss improvement of the API outside this future of
clerezza/stanbol thread.

Cheers,
Reto

On Mon, Nov 12, 2012 at 2:15 AM, Peter Ansell <an...@gmail.com>wrote:

> On 12 November 2012 09:59, Reto Bachmann-Gmür <re...@apache.org> wrote:
> > Hi Peter and all,
> >
> > Good to read about your experiments.Just a first comment:
> >
> > In addition, I did not want to use OSGI, so I had to make changes in
> >> many cases to allow a completely programmatic instantiation of
> >> components, as some fields were left private with no mutator method
> >> and in some cases no public contructor that could be used to populate
> >> the field programmatically. For all of the good that OSGI may provide
> >> for otherwise complex systems, it is not good Java software
> >> engineering to make fields private.
> >>
> >
> > The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot
> do
> > magic and set private fields, the compiled classes do have bind and
> unbind
> > methods for the private fields, these methods are added by the maven
> felix
> > scr-plugin.  For locating dependencies outside OSGi the META-INF/services
> > method is used so that for example one can add a serializitaion provider
> > seimply by adding it to the classpath without requiring and manual
> binding.
>
> Sorry, I was under the impression that OSGi could actually do Java
> reflection magic to inject dependencies directly into private fields
> based on annotations without having any alternative method of setting
> the field for regular plain old java users. :)
>
> In general I would like if OSGi classes that currently rely on
> bind/unbind, still offered public mutator methods and a public
> initialise/deinitialise method for any work that needs to be done
> after using the mutator methods. The bind/unbind methodology from
> memory when I was working on Clerezza/Stanbol, seemed to require that
> all of the mutators were run immediately and the initialise was
> automatically run, without offering any other possible sequence.
>
> Additionally, offering public mutators and a public initialise method
> gives the added benefit of compile-time typesafety for plain old java
> users, which a bind method taking a Dictionary<String, Object>
> parameter does not provide.
>
> In addition, from memory I think some of the bind methods were
> protected, and not public, which means they are not directly
> accessible, without resorting to using reflection or subclassing just
> to be able to call bind.
>
> I use META-INF/services heavily in my projects, and I rely on it when
> using Sesame and with my extensions to OWLAPI. I extended OWLAPI to
> use Sesame META-INF/services dependencies to find
> serialisation/parsing providers for OWLAPI based on the Sesame
> parser/writer services that are available on the classpath. However, I
> always try to make sure that the use of the automatically populated
> service registries is optional, so that users can populate their own
> registries from scratch using purely programmatic methods, and they do
> not have to resort to modifying global singleton registries as one
> does when using Jena.
>
> The services that I register in META-INF/services are always factories
> based on interfaces, so that dependencies can be passed into type-safe
> java "createServiceInstance" methods when creating instances of the
> service using the factory instance. This means that it does not matter
> if the java.util.ServiceLoader loads classes in a different order, as
> the actual objects are created from the factories explicitly by users,
> with or without a key to specify which instance of the service they
> require/prefer.
>
> Cheers,
>
> Peter
>

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 12 November 2012 09:59, Reto Bachmann-Gmür <re...@apache.org> wrote:
> Hi Peter and all,
>
> Good to read about your experiments.Just a first comment:
>
> In addition, I did not want to use OSGI, so I had to make changes in
>> many cases to allow a completely programmatic instantiation of
>> components, as some fields were left private with no mutator method
>> and in some cases no public contructor that could be used to populate
>> the field programmatically. For all of the good that OSGI may provide
>> for otherwise complex systems, it is not good Java software
>> engineering to make fields private.
>>
>
> The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot do
> magic and set private fields, the compiled classes do have bind and unbind
> methods for the private fields, these methods are added by the maven felix
> scr-plugin.  For locating dependencies outside OSGi the META-INF/services
> method is used so that for example one can add a serializitaion provider
> seimply by adding it to the classpath without requiring and manual binding.

Sorry, I was under the impression that OSGi could actually do Java
reflection magic to inject dependencies directly into private fields
based on annotations without having any alternative method of setting
the field for regular plain old java users. :)

In general I would like if OSGi classes that currently rely on
bind/unbind, still offered public mutator methods and a public
initialise/deinitialise method for any work that needs to be done
after using the mutator methods. The bind/unbind methodology from
memory when I was working on Clerezza/Stanbol, seemed to require that
all of the mutators were run immediately and the initialise was
automatically run, without offering any other possible sequence.

Additionally, offering public mutators and a public initialise method
gives the added benefit of compile-time typesafety for plain old java
users, which a bind method taking a Dictionary<String, Object>
parameter does not provide.

In addition, from memory I think some of the bind methods were
protected, and not public, which means they are not directly
accessible, without resorting to using reflection or subclassing just
to be able to call bind.

I use META-INF/services heavily in my projects, and I rely on it when
using Sesame and with my extensions to OWLAPI. I extended OWLAPI to
use Sesame META-INF/services dependencies to find
serialisation/parsing providers for OWLAPI based on the Sesame
parser/writer services that are available on the classpath. However, I
always try to make sure that the use of the automatically populated
service registries is optional, so that users can populate their own
registries from scratch using purely programmatic methods, and they do
not have to resort to modifying global singleton registries as one
does when using Jena.

The services that I register in META-INF/services are always factories
based on interfaces, so that dependencies can be passed into type-safe
java "createServiceInstance" methods when creating instances of the
service using the factory instance. This means that it does not matter
if the java.util.ServiceLoader loads classes in a different order, as
the actual objects are created from the factories explicitly by users,
with or without a key to specify which instance of the service they
require/prefer.

Cheers,

Peter

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 12 November 2012 09:59, Reto Bachmann-Gmür <re...@apache.org> wrote:
> Hi Peter and all,
>
> Good to read about your experiments.Just a first comment:
>
> In addition, I did not want to use OSGI, so I had to make changes in
>> many cases to allow a completely programmatic instantiation of
>> components, as some fields were left private with no mutator method
>> and in some cases no public contructor that could be used to populate
>> the field programmatically. For all of the good that OSGI may provide
>> for otherwise complex systems, it is not good Java software
>> engineering to make fields private.
>>
>
> The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot do
> magic and set private fields, the compiled classes do have bind and unbind
> methods for the private fields, these methods are added by the maven felix
> scr-plugin.  For locating dependencies outside OSGi the META-INF/services
> method is used so that for example one can add a serializitaion provider
> seimply by adding it to the classpath without requiring and manual binding.

Sorry, I was under the impression that OSGi could actually do Java
reflection magic to inject dependencies directly into private fields
based on annotations without having any alternative method of setting
the field for regular plain old java users. :)

In general I would like if OSGi classes that currently rely on
bind/unbind, still offered public mutator methods and a public
initialise/deinitialise method for any work that needs to be done
after using the mutator methods. The bind/unbind methodology from
memory when I was working on Clerezza/Stanbol, seemed to require that
all of the mutators were run immediately and the initialise was
automatically run, without offering any other possible sequence.

Additionally, offering public mutators and a public initialise method
gives the added benefit of compile-time typesafety for plain old java
users, which a bind method taking a Dictionary<String, Object>
parameter does not provide.

In addition, from memory I think some of the bind methods were
protected, and not public, which means they are not directly
accessible, without resorting to using reflection or subclassing just
to be able to call bind.

I use META-INF/services heavily in my projects, and I rely on it when
using Sesame and with my extensions to OWLAPI. I extended OWLAPI to
use Sesame META-INF/services dependencies to find
serialisation/parsing providers for OWLAPI based on the Sesame
parser/writer services that are available on the classpath. However, I
always try to make sure that the use of the automatically populated
service registries is optional, so that users can populate their own
registries from scratch using purely programmatic methods, and they do
not have to resort to modifying global singleton registries as one
does when using Jena.

The services that I register in META-INF/services are always factories
based on interfaces, so that dependencies can be passed into type-safe
java "createServiceInstance" methods when creating instances of the
service using the factory instance. This means that it does not matter
if the java.util.ServiceLoader loads classes in a different order, as
the actual objects are created from the factories explicitly by users,
with or without a key to specify which instance of the service they
require/prefer.

Cheers,

Peter

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Peter and all,

Good to read about your experiments.Just a first comment:

In addition, I did not want to use OSGI, so I had to make changes in
> many cases to allow a completely programmatic instantiation of
> components, as some fields were left private with no mutator method
> and in some cases no public contructor that could be used to populate
> the field programmatically. For all of the good that OSGI may provide
> for otherwise complex systems, it is not good Java software
> engineering to make fields private.
>

The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot do
magic and set private fields, the compiled classes do have bind and unbind
methods for the private fields, these methods are added by the maven felix
scr-plugin.  For locating dependencies outside OSGi the META-INF/services
method is used so that for example one can add a serializitaion provider
seimply by adding it to the classpath without requiring and manual binding.

Cheers,
Reto

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Peter and all,

Good to read about your experiments.Just a first comment:

In addition, I did not want to use OSGI, so I had to make changes in
> many cases to allow a completely programmatic instantiation of
> components, as some fields were left private with no mutator method
> and in some cases no public contructor that could be used to populate
> the field programmatically. For all of the good that OSGI may provide
> for otherwise complex systems, it is not good Java software
> engineering to make fields private.
>

The clerezza.rdf package should all be usable withouth OSGi. OSGi cannot do
magic and set private fields, the compiled classes do have bind and unbind
methods for the private fields, these methods are added by the maven felix
scr-plugin.  For locating dependencies outside OSGi the META-INF/services
method is used so that for example one can add a serializitaion provider
seimply by adding it to the classpath without requiring and manual binding.

Cheers,
Reto

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 9 November 2012 19:56, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all,
>
> let me share my throughs. Because this mail is rather long I tried to
> split it up in three separate section (1) RDF (2) RESTful/ Web
> Interface and (3) other related topics
>
>
> RDF libs:
> ====
>
> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> if it makes sense to manage an own RDF API. I expect the Semantic Web
> Standards to evolve quite a bit in the coming years and I do have
> concern that the Clerezza RDF modules will be updated/extended to
> provide implementations of those. One example of such an situation is
> SPARQL 1.1 that is around for quite some time and is still not
> supported by Clerezza. While I do like the small API, the flexibility
> to use different TripleStores and that Clerezza comes with OSGI
> support I think given the current situation we would need to discuss
> all options and those do also include a switch to Apache Jena or
> Sesame. Especially Sesame would be an attractive option as their RDF
> Graph API [1] is very similar to what Clerezza uses.

Sesame has three different APIs for RDF graph manipulation/querying.
The main API that users target in my experience is the Repository API.
Repository implementors are encouraged to target the SAIL API. There
are very few users or implementors who actually use the Graph API for
significant purposes, in my experience.

> Apache Jena's
> counterparts (Model [2] and Graph [3]) are considerable different and
> more complex interfaces. In addition Jena will only change to
> org.apache packages with the next major release so a switch before
> that release would mean two incompatible API changes.
>
> My personal opinion is that we should keep using Clerezza for now.
> Invest some effort to improve the Clerezza RDF modules and than see
> how it further develops. Such an Effort should include
>
> *  to implement SPQRAL fast lane (as already discussed with Reto
> during ApacheCon). Fast lane would allow Clerezza to use the native
> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> parses those parts of the SPARQL query to understand the RDF graph to
> execute the Query on. This information is than used to parse the query
> to the native SPARQL engine via an extended Interface of the
> TcProvide. The Clerezza SPARQL implementation would only be used in
> case the TcProvider does not provide a native SPARQL implementation of
> if the Query spans RDF graphs managed by different TcProvider
> instances. By that Clerezza users would be able to use any SPARQL
> feature provided by the used TripleStore.

The SPARQL 1.1 specification is now a Proposed Recommendation, so it
would be a good time to implement it now without fearing more of the
large changes that have happened between each of the Working Drafts so
far.

> * update to the newest Jena versions (see also STANBOL-621; Peter
> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> Jena bundle used for the Stanbol/LMF integration [5])

I made changes to Clerezza to experiment with a few things that I saw
as issues when I was experimenting with Stanbol. The biggest design
issue for me was that every graph was loaded into memory in bulk. The
underlying reason for this seemed to be that java.util.Iterator does
not have a close method, so there is no way of knowing when to release
resources if an iterator is not used to completion. The other issue
for me was that I wanted to use an underlying Sesame repository, and
the Sesame module had not been maintained, and had been left off the
parent reactor, so it was no longer compatible with the other modules
when I was experimenting with it. Given that those were my goals, I
removed all of the Clerezza CMS modules from my Git fork and focused
on the underlying libraries, that would be much easier to maintain if
they were seperate.

In addition, I did not want to use OSGI, so I had to make changes in
many cases to allow a completely programmatic instantiation of
components, as some fields were left private with no mutator method
and in some cases no public contructor that could be used to populate
the field programmatically. For all of the good that OSGI may provide
for otherwise complex systems, it is not good Java software
engineering to make fields private.

> * finish and release the SingleTdbDatasetTcProvider.java
> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> component
> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> code base to Clerezza and release it so that we can use it from their
> in Stanbol
> * provide an Clerezza JsonLD parser/serializer. This is critical for
> Stanbol as several CMS use this as preferred RDF serialization.

I would focus on getting a single Java implementation of the JSON-LD
working here, and I know that Reto has been working on this by
contributing a Clerezza serialiser/callback implementation to Tristan
King's JSONLD-Java library at GitHub. When I get a chance I am going
to suggest that the dependencies for Sesame/Jena/Clerezza are in
separate modules in the JSONLD-Java project, to make maven dependency
chaining simpler.

Given that Stanbol is already quite large, it is not viable to
transfer the RDF libraries there, but it does not look like it is
viable to leave them combined with the other Clerezza modules as they
are unrelated and will have a different release cycle, when and if the
CMS components are maintained. It would be useful IMO to split the
Clerezza project to make it simpler to maintain the reusable
libraries. In particular, if Clerezza split its RDF libraries it may
eventually gain a similar level of developer support as either Sesame
or Jena. Currently the only project I can name that uses Clerezza RDF
libraries at their core is Stanbol, which reduces the user base, and
hence reduces the developer support base. That doesn't mean that there
aren't other projects out there that use Clerezza, just that I have
not come across them. Almost all of the comments in this thread are
about fixing issues in the Clerezza RDF libraries based on experience
from Stanbol.

One issue that would be easier to solve if the project split would be
the issue of disparate version numbers between modules that make it
difficult to identify when to update dependencies. I have mentioned to
the Stanbol list about having a single version for each release, so
that people have a single figure in their head when they describe a
version for the stanbol components they are relying on. It would be
much easier to depend on Clerezza IMO, if there was a single current
release version number for all of the library components. Maven
properties make it insanely easy to migrate to new versions of
multi-module libraries. If there is a single version number for each
new release, and if the trunk is consistently stable, then releases
can be made at any time and users can bump a single version property
in Maven to upgrade their systems.

I know that in Subversion it is difficult to guarantee that the trunk
is stable at each point in time--because people are afraid to branch
due to the difficulty involved and will instead develop new features
on the trunk--but the good news is that there are a number of
distributed version control systems around now that make it easy to
develop features on lightweight branches separate from the trunk and
painlessly merge them back in when they are stable. The suggestion to
Stanbol to switch to a single version number for releases was knocked
back based on the premise that not all modules would be stable at the
same time. If people were able to easily use branches for new features
and they test them well before integrating them, then the trunk would
be fairly consistently stable. The resulting stability would remove
the difficulties that Stanbol is still having since my suggestion with
identifying all of the necessary dependencies to update when they want
to release a new version of a particular module, as all modules, even
those without significant changes since their last release would be
released at each stage based on their trunk/master tests consistently
passing in Jenkins.

Cheers,

Peter

Re: Future of Clerezza and Stanbol

Posted by Reto Bachmann-Gmür <re...@apache.org>.
Hi Rupert and all,

(1) I agree with what you shay regarding the RDF api and think to keep this
effort more sustainable while not running the risk of polluting the api
with implementation specific requirements to graduate clerezza as apache
commons.rdf for that.

(2) Type-Based rendering is not something that can be implemented just by
adding MessageBodyWriters as different RDF resources do not result in
different java classes. For a framework providing resources as RDF typed
based rendering seems the straight forward approach to allow these
resources to be rendered in non rdf formats as well. For this we can still
use Freemarker (with LDPath templates) but our legacy template that are
require the class with the application logic to provide special hooks to
the templates goes against the concept of having a plugable UI that can be
left away for instances only to be used by machines. Keep in mind that an
infrastructure for providing templates in a better way is already there
since the introduction of LDVieable. Type Based rendering goes one step
further as the jax-rs root resource would no longer have to provide the
abstract template-path.

(3)
JSR-223 support: I suggested to drop this.

Scala support: I'm wondering myself why there is such a big PermGenSpace
need. I've just update clerezza trunk to use scala 2.9.2 this might have
improved things a bit. As the compiler classloading mechanism is changed in
2.10 I guess a bigger improvement might come with that version. Do you know
about user having a concrete issue with the additional ram requirement or
is it more the fact that's not nice having memory used without clear reason
that's bothering you?

Shell: The felix webconsole is there to install bunde, configure services
and so on. What you can do with the shell is actually invoking these
service's methods and explore exported package structures. Especially when
exploring API's I'm not yet familiar with the shell has been of great
benefit to me. Of course it's a module one can turn off.

Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven skeletons
can also be used as prototypes for the bundle-dev-tool (just some maven
magic needed). Of course it's question of style and size of the module if
one want the dynamic update and things working independently of the pom
dependencies or prefers to compile and redeploy. In the trunk version of
dev-tools there's also instant update for static files which makes it
particularly convenient when editing css and javascript. As long as no
duplication of archetype/skeleton is needed I don't see why not offer both
maven archetypes and skeletons.

Security:
You're suggesting one should configure the user, their password and
permission in some config files rather than storing them in RDF and having
a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
when we're talking about some launchers being stateless we mean that usage
of  the (main) functionality it offers doesn't alter the state of the
system. If you intepret "stateless" very strictly then you would have to
drop most parts of the felix webconsole as http requests to install bundle
or configure services aren't stateless. For the user-configuration a simple
file-based TcProvider would of course be enough so no TDB is needed for
that.

I think we should see where we want to go as a community. For me the
important thing is that Stanbol remains very modular. I think statements
like "Stanbol is no semantic CMS" do not bring us further. It's important
that the stanbol services can be used as services and that many services
are stateless. But the contenthub is a component to manage content (the
entityhub to some degree as well), do we want to mandate a horrible user
interface just to comply with some catchphrase about what Stanbol is not?
Or do we want to reduce Stanbol to the be just the Enhancer and let the
other stuff to other projects?

I'd rather go for the vision of an ecosystem of modular semantic and
restful osgi components, but if the community wants to focus on the
enhancer I think a clear statement should be made to avoid unnecessary
arguments about memory consumption.

Cheers,
Reto


On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all,
>
> let me share my throughs. Because this mail is rather long I tried to
> split it up in three separate section (1) RDF (2) RESTful/ Web
> Interface and (3) other related topics
>
>
> RDF libs:
> ====
>
> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> if it makes sense to manage an own RDF API. I expect the Semantic Web
> Standards to evolve quite a bit in the coming years and I do have
> concern that the Clerezza RDF modules will be updated/extended to
> provide implementations of those. One example of such an situation is
> SPARQL 1.1 that is around for quite some time and is still not
> supported by Clerezza. While I do like the small API, the flexibility
> to use different TripleStores and that Clerezza comes with OSGI
> support I think given the current situation we would need to discuss
> all options and those do also include a switch to Apache Jena or
> Sesame. Especially Sesame would be an attractive option as their RDF
> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
> counterparts (Model [2] and Graph [3]) are considerable different and
> more complex interfaces. In addition Jena will only change to
> org.apache packages with the next major release so a switch before
> that release would mean two incompatible API changes.
>
> My personal opinion is that we should keep using Clerezza for now.
> Invest some effort to improve the Clerezza RDF modules and than see
> how it further develops. Such an Effort should include
>
> *  to implement SPQRAL fast lane (as already discussed with Reto
> during ApacheCon). Fast lane would allow Clerezza to use the native
> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> parses those parts of the SPARQL query to understand the RDF graph to
> execute the Query on. This information is than used to parse the query
> to the native SPARQL engine via an extended Interface of the
> TcProvide. The Clerezza SPARQL implementation would only be used in
> case the TcProvider does not provide a native SPARQL implementation of
> if the Query spans RDF graphs managed by different TcProvider
> instances. By that Clerezza users would be able to use any SPARQL
> feature provided by the used TripleStore.
> * update to the newest Jena versions (see also STANBOL-621; Peter
> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> Jena bundle used for the Stanbol/LMF integration [5])
> * finish and release the SingleTdbDatasetTcProvider.java
> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> component
> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> code base to Clerezza and release it so that we can use it from their
> in Stanbol
> * provide an Clerezza JsonLD parser/serializer. This is critical for
> Stanbol as several CMS use this as preferred RDF serialization.
>
> [1]
> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
> [2]
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
> [3]
> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
> [4]
> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
> [5]
> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
>
>
> RESTful API / Web Interface:
> =====================
>
> There are several shortcomings of the current implementation of the
> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
>
> * Jersey's use of java.util.ServiceLoader forces the use manual
> configuration of the JAX-RS components. A switch to an OSGI compatible
> implementation such as Apache Wink would be very welcome
> * The RESTful API documentation is currently written as HTML into
> Freemarker templates. This makes it really hard to maintain this
> documentation. I would really appreciate the possibility to use
> markdown (as used on the Webpage) for that
> * For Stanbol deployments of Stanbol it should be possible to exclude
> the WebUI so that only the RESTful services are available
>
> regarding :
>
> > Stanbol drops it's interretation of "REST" as "not for humans" and want
> to go to
> > allow integrating (wherever possible as modular and optional components)
> > media types designed for human consumptions and support REST approaches
> > there as well (thinking of the current back-button unfriendly UI).
>
> Adding support for a simple Table based representation of RDF data
> would indeed be an important feature. However having Resource (Entity)
> type specific rendering is out of the scope of Apache Stanbol (at
> least in my opinion). However AFAIK as soon as we switch to an OSGI
> compatible JAX-RS implementation users could add those easily by
> providing the according JAX-RS MessageBodyWriter.
>
> If there are people who would like to work it would be really great.
> If we could (re)use some stuff from Clerezza - even better. But things
> would need to keep simple as Stanbol is no semantic CMS.
>
> I would suggest to start development in an own branch and than have a
> discussion/vote based on an early prototype/demonstration.
>
>
> Other Topics
> =========
>
> ### Scala and jsr 223 (scripting in the JVM)
>
> I do have an issue with Scala as it adds >150MByte to the PermGen as
> soon as it is loaded. But as long as it is an optional dependency and
> users are aware of that when adding the dependency I am fine with it.
>
> ###  Shell
>
> Personally I do not find the shell very useful. For installing
> Bundles/Service configurations I prefer to use the Apache Sling
> FileInstaller. For deployment during development I like to use the
> Sling Maven Installer plugin. For creating new Stanbol Modules I
> rather suggest to create an extensive list of Maven Archetype (e.g.
> for Stanbol EnhancementEngines).
>
> As the Shell also depends on Scala the "+150MByte to the PermGen"
> issue also applies to the Shell.
>
> ### Security
>
> Having a security model in Apache Stanbol might be important for some
> use cases. Because of this I consider this an important topic. However
> one I have very little experience with.
>
> I would like to get rid of the dependencies to
> org.apache.clerezza:patform (AFAIK this is only needed for the
> configuration and this could be easily provided by the
> sling.properties file at runtime. Defaults can be provided in the
> commons.properties file already included in all Stanbol Launchers. I
> would also suggest to move the PermissionParser utility over to the
> Apache Stanbol Security modules.
> This two changes would allow to activate the security module also for
> the Stable (Stateless) launcher.
>
>
> best
> Rupert
>
>
> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> > Comments inline...
> >
> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org>
> wrote:
> >
> >> Ok, sorry for jumping into this discussion so lately. I've been having
> >> quite some discussion on the matter here at apacheconeu. Also I had
> >> prositive feedback from my resentation of Clerezza yesterday.
> >>
> >> I think two things:
> >> - For high level platform component it is often not clear if the fit
> better
> >> into Stanbol or into Clerezza
> >> - The RDF Api shoud actually be independen both from triple store
> provider
> >> as well as from consumer
> >>
> >> So I think a good solution would be to have the RDF liraries comprising:
> >> - A modular and very spec oriented API for RDF and related standards
> >> - A set of serializing and parsing providers
> >> - Adapters to triple stores (where the api isn't provided by the triple
> >> store)
> >> basically that's what in the org.apache.clerezza.rdf.* packages
> >>
> >> That's the stuff that would fit well into Stanbol. Provided that stanbol
> >> drops it's interretation of "REST" as "not for humans" and want to go to
> >> allow integrating (wherever possible as modular and optional components)
> >> media types designed for human consumptions and support REST approaches
> >> there as well (thinking of the current back-button unfriendly UI).
> >>
> >
> > IMO, Clerezza is just too big for existing committers. If we could reduce
> > it to the
> > essential components dealing with rdf and leaving out templating and
> > rendering,
> > it may be easier to graduate.
> >
> > - Scala Server Pages
> >> - TypeRendering (selection of templates based on the rdf type of the
> >> returned response)
> >> - Security (already integrated to some degree, code based security to
> run
> >> bundles in a sandboxed manner is not)
> >> - Shell (already ships in the stanbol launcher, so here it's about
> >> 'adopting' the sources)
> >> - Dev tools: rapid development support (create sample projects, have
> source
> >> files as bundles)
> >>
> >> To the attic:
> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
> >> same support (jax-rs components asosgi services) is now provided by
> apache
> >> wink
> >> -  jssr 223 support
> >>
> >> In my opinion there is no urgent need for action, it is true that there
> >> hasn't been a lot of action in clerezza but imho the project os going on
> >> even at a low pace  (as other projects like e.g. the recently graduated
> >> wink).
> >>
> >
> > Not sure about no urgent need for action. Maybe we should list the
> > requirements
> > to fulfil in order to be able to graduate. Wonder if we are able to meet
> > them.
> >
> > Cheers
> > Hasan
> >
> >
> >>
> >> Cheers,
> >> Reto
> >>
> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> >> bdelacretaz@apache.org
> >> > wrote:
> >>
> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org>
> wrote:
> >> > > ...It's good to have the existing released artifacts remain - what
> >> about
> >> > after
> >> > > the donation?
> >> > >
> >> > > Presumably the moved modules will be released by the new host - will
> >> they
> >> > > use group id org.apache.clerezza? or move to the new host project
> group
> >> > id?
> >> > > I'd suggest renaming the group to the new project but realise it is
> a
> >> bit
> >> > > more disruptive...
> >> >
> >> > I think that's really up to whatever project adopts that code. In
> >> > theory package names should change but that's probably not convenient.
> >> >
> >> > Or maybe it's time to create a semantic module or two at
> >> > http://commons.apache.org/ ? If existing committers are willing to
> >> > support that with their work it should be easy to make it happen.
> >> >
> >> > -Bertrand
> >> >
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Future of Clerezza and Stanbol

Posted by Peter Ansell <an...@gmail.com>.
On 9 November 2012 19:56, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all,
>
> let me share my throughs. Because this mail is rather long I tried to
> split it up in three separate section (1) RDF (2) RESTful/ Web
> Interface and (3) other related topics
>
>
> RDF libs:
> ====
>
> Out of the viewpoint of Apache Stanbol one needs to ask the Question
> if it makes sense to manage an own RDF API. I expect the Semantic Web
> Standards to evolve quite a bit in the coming years and I do have
> concern that the Clerezza RDF modules will be updated/extended to
> provide implementations of those. One example of such an situation is
> SPARQL 1.1 that is around for quite some time and is still not
> supported by Clerezza. While I do like the small API, the flexibility
> to use different TripleStores and that Clerezza comes with OSGI
> support I think given the current situation we would need to discuss
> all options and those do also include a switch to Apache Jena or
> Sesame. Especially Sesame would be an attractive option as their RDF
> Graph API [1] is very similar to what Clerezza uses.

Sesame has three different APIs for RDF graph manipulation/querying.
The main API that users target in my experience is the Repository API.
Repository implementors are encouraged to target the SAIL API. There
are very few users or implementors who actually use the Graph API for
significant purposes, in my experience.

> Apache Jena's
> counterparts (Model [2] and Graph [3]) are considerable different and
> more complex interfaces. In addition Jena will only change to
> org.apache packages with the next major release so a switch before
> that release would mean two incompatible API changes.
>
> My personal opinion is that we should keep using Clerezza for now.
> Invest some effort to improve the Clerezza RDF modules and than see
> how it further develops. Such an Effort should include
>
> *  to implement SPQRAL fast lane (as already discussed with Reto
> during ApacheCon). Fast lane would allow Clerezza to use the native
> SPARQL engine of the used Triplestore. Meaning that Clerezza only
> parses those parts of the SPARQL query to understand the RDF graph to
> execute the Query on. This information is than used to parse the query
> to the native SPARQL engine via an extended Interface of the
> TcProvide. The Clerezza SPARQL implementation would only be used in
> case the TcProvider does not provide a native SPARQL implementation of
> if the Query spans RDF graphs managed by different TcProvider
> instances. By that Clerezza users would be able to use any SPARQL
> feature provided by the used TripleStore.

The SPARQL 1.1 specification is now a Proposed Recommendation, so it
would be a good time to implement it now without fearing more of the
large changes that have happened between each of the Working Drafts so
far.

> * update to the newest Jena versions (see also STANBOL-621; Peter
> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
> Jena bundle used for the Stanbol/LMF integration [5])

I made changes to Clerezza to experiment with a few things that I saw
as issues when I was experimenting with Stanbol. The biggest design
issue for me was that every graph was loaded into memory in bulk. The
underlying reason for this seemed to be that java.util.Iterator does
not have a close method, so there is no way of knowing when to release
resources if an iterator is not used to completion. The other issue
for me was that I wanted to use an underlying Sesame repository, and
the Sesame module had not been maintained, and had been left off the
parent reactor, so it was no longer compatible with the other modules
when I was experimenting with it. Given that those were my goals, I
removed all of the Clerezza CMS modules from my Git fork and focused
on the underlying libraries, that would be much easier to maintain if
they were seperate.

In addition, I did not want to use OSGI, so I had to make changes in
many cases to allow a completely programmatic instantiation of
components, as some fields were left private with no mutator method
and in some cases no public contructor that could be used to populate
the field programmatically. For all of the good that OSGI may provide
for otherwise complex systems, it is not good Java software
engineering to make fields private.

> * finish and release the SingleTdbDatasetTcProvider.java
> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
> component
> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
> code base to Clerezza and release it so that we can use it from their
> in Stanbol
> * provide an Clerezza JsonLD parser/serializer. This is critical for
> Stanbol as several CMS use this as preferred RDF serialization.

I would focus on getting a single Java implementation of the JSON-LD
working here, and I know that Reto has been working on this by
contributing a Clerezza serialiser/callback implementation to Tristan
King's JSONLD-Java library at GitHub. When I get a chance I am going
to suggest that the dependencies for Sesame/Jena/Clerezza are in
separate modules in the JSONLD-Java project, to make maven dependency
chaining simpler.

Given that Stanbol is already quite large, it is not viable to
transfer the RDF libraries there, but it does not look like it is
viable to leave them combined with the other Clerezza modules as they
are unrelated and will have a different release cycle, when and if the
CMS components are maintained. It would be useful IMO to split the
Clerezza project to make it simpler to maintain the reusable
libraries. In particular, if Clerezza split its RDF libraries it may
eventually gain a similar level of developer support as either Sesame
or Jena. Currently the only project I can name that uses Clerezza RDF
libraries at their core is Stanbol, which reduces the user base, and
hence reduces the developer support base. That doesn't mean that there
aren't other projects out there that use Clerezza, just that I have
not come across them. Almost all of the comments in this thread are
about fixing issues in the Clerezza RDF libraries based on experience
from Stanbol.

One issue that would be easier to solve if the project split would be
the issue of disparate version numbers between modules that make it
difficult to identify when to update dependencies. I have mentioned to
the Stanbol list about having a single version for each release, so
that people have a single figure in their head when they describe a
version for the stanbol components they are relying on. It would be
much easier to depend on Clerezza IMO, if there was a single current
release version number for all of the library components. Maven
properties make it insanely easy to migrate to new versions of
multi-module libraries. If there is a single version number for each
new release, and if the trunk is consistently stable, then releases
can be made at any time and users can bump a single version property
in Maven to upgrade their systems.

I know that in Subversion it is difficult to guarantee that the trunk
is stable at each point in time--because people are afraid to branch
due to the difficulty involved and will instead develop new features
on the trunk--but the good news is that there are a number of
distributed version control systems around now that make it easy to
develop features on lightweight branches separate from the trunk and
painlessly merge them back in when they are stable. The suggestion to
Stanbol to switch to a single version number for releases was knocked
back based on the premise that not all modules would be stable at the
same time. If people were able to easily use branches for new features
and they test them well before integrating them, then the trunk would
be fairly consistently stable. The resulting stability would remove
the difficulties that Stanbol is still having since my suggestion with
identifying all of the necessary dependencies to update when they want
to release a new version of a particular module, as all modules, even
those without significant changes since their last release would be
released at each stage based on their trunk/master tests consistently
passing in Jenkins.

Cheers,

Peter

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

let me share my throughs. Because this mail is rather long I tried to
split it up in three separate section (1) RDF (2) RESTful/ Web
Interface and (3) other related topics


RDF libs:
====

Out of the viewpoint of Apache Stanbol one needs to ask the Question
if it makes sense to manage an own RDF API. I expect the Semantic Web
Standards to evolve quite a bit in the coming years and I do have
concern that the Clerezza RDF modules will be updated/extended to
provide implementations of those. One example of such an situation is
SPARQL 1.1 that is around for quite some time and is still not
supported by Clerezza. While I do like the small API, the flexibility
to use different TripleStores and that Clerezza comes with OSGI
support I think given the current situation we would need to discuss
all options and those do also include a switch to Apache Jena or
Sesame. Especially Sesame would be an attractive option as their RDF
Graph API [1] is very similar to what Clerezza uses. Apache Jena's
counterparts (Model [2] and Graph [3]) are considerable different and
more complex interfaces. In addition Jena will only change to
org.apache packages with the next major release so a switch before
that release would mean two incompatible API changes.

My personal opinion is that we should keep using Clerezza for now.
Invest some effort to improve the Clerezza RDF modules and than see
how it further develops. Such an Effort should include

*  to implement SPQRAL fast lane (as already discussed with Reto
during ApacheCon). Fast lane would allow Clerezza to use the native
SPARQL engine of the used Triplestore. Meaning that Clerezza only
parses those parts of the SPARQL query to understand the RDF graph to
execute the Query on. This information is than used to parse the query
to the native SPARQL engine via an extended Interface of the
TcProvide. The Clerezza SPARQL implementation would only be used in
case the TcProvider does not provide a native SPARQL implementation of
if the Query spans RDF graphs managed by different TcProvider
instances. By that Clerezza users would be able to use any SPARQL
feature provided by the used TripleStore.
* update to the newest Jena versions (see also STANBOL-621; Peter
Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
Jena bundle used for the Stanbol/LMF integration [5])
* finish and release the SingleTdbDatasetTcProvider.java
(CLEREZZA-691) as this is important for the Stanbol Ontology Manager
component
* move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
code base to Clerezza and release it so that we can use it from their
in Stanbol
* provide an Clerezza JsonLD parser/serializer. This is critical for
Stanbol as several CMS use this as preferred RDF serialization.

[1] http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
[2] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
[3] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
[4] https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
[5] https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml


RESTful API / Web Interface:
=====================

There are several shortcomings of the current implementation of the
Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)

* Jersey's use of java.util.ServiceLoader forces the use manual
configuration of the JAX-RS components. A switch to an OSGI compatible
implementation such as Apache Wink would be very welcome
* The RESTful API documentation is currently written as HTML into
Freemarker templates. This makes it really hard to maintain this
documentation. I would really appreciate the possibility to use
markdown (as used on the Webpage) for that
* For Stanbol deployments of Stanbol it should be possible to exclude
the WebUI so that only the RESTful services are available

regarding :

> Stanbol drops it's interretation of "REST" as "not for humans" and want to go to
> allow integrating (wherever possible as modular and optional components)
> media types designed for human consumptions and support REST approaches
> there as well (thinking of the current back-button unfriendly UI).

Adding support for a simple Table based representation of RDF data
would indeed be an important feature. However having Resource (Entity)
type specific rendering is out of the scope of Apache Stanbol (at
least in my opinion). However AFAIK as soon as we switch to an OSGI
compatible JAX-RS implementation users could add those easily by
providing the according JAX-RS MessageBodyWriter.

If there are people who would like to work it would be really great.
If we could (re)use some stuff from Clerezza - even better. But things
would need to keep simple as Stanbol is no semantic CMS.

I would suggest to start development in an own branch and than have a
discussion/vote based on an early prototype/demonstration.


Other Topics
=========

### Scala and jsr 223 (scripting in the JVM)

I do have an issue with Scala as it adds >150MByte to the PermGen as
soon as it is loaded. But as long as it is an optional dependency and
users are aware of that when adding the dependency I am fine with it.

###  Shell

Personally I do not find the shell very useful. For installing
Bundles/Service configurations I prefer to use the Apache Sling
FileInstaller. For deployment during development I like to use the
Sling Maven Installer plugin. For creating new Stanbol Modules I
rather suggest to create an extensive list of Maven Archetype (e.g.
for Stanbol EnhancementEngines).

As the Shell also depends on Scala the "+150MByte to the PermGen"
issue also applies to the Shell.

### Security

Having a security model in Apache Stanbol might be important for some
use cases. Because of this I consider this an important topic. However
one I have very little experience with.

I would like to get rid of the dependencies to
org.apache.clerezza:patform (AFAIK this is only needed for the
configuration and this could be easily provided by the
sling.properties file at runtime. Defaults can be provided in the
commons.properties file already included in all Stanbol Launchers. I
would also suggest to move the PermissionParser utility over to the
Apache Stanbol Security modules.
This two changes would allow to activate the security module also for
the Stable (Stateless) launcher.


best
Rupert


On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> Comments inline...
>
> On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
>
>> Ok, sorry for jumping into this discussion so lately. I've been having
>> quite some discussion on the matter here at apacheconeu. Also I had
>> prositive feedback from my resentation of Clerezza yesterday.
>>
>> I think two things:
>> - For high level platform component it is often not clear if the fit better
>> into Stanbol or into Clerezza
>> - The RDF Api shoud actually be independen both from triple store provider
>> as well as from consumer
>>
>> So I think a good solution would be to have the RDF liraries comprising:
>> - A modular and very spec oriented API for RDF and related standards
>> - A set of serializing and parsing providers
>> - Adapters to triple stores (where the api isn't provided by the triple
>> store)
>> basically that's what in the org.apache.clerezza.rdf.* packages
>>
>> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> drops it's interretation of "REST" as "not for humans" and want to go to
>> allow integrating (wherever possible as modular and optional components)
>> media types designed for human consumptions and support REST approaches
>> there as well (thinking of the current back-button unfriendly UI).
>>
>
> IMO, Clerezza is just too big for existing committers. If we could reduce
> it to the
> essential components dealing with rdf and leaving out templating and
> rendering,
> it may be easier to graduate.
>
> - Scala Server Pages
>> - TypeRendering (selection of templates based on the rdf type of the
>> returned response)
>> - Security (already integrated to some degree, code based security to run
>> bundles in a sandboxed manner is not)
>> - Shell (already ships in the stanbol launcher, so here it's about
>> 'adopting' the sources)
>> - Dev tools: rapid development support (create sample projects, have source
>> files as bundles)
>>
>> To the attic:
>> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> same support (jax-rs components asosgi services) is now provided by apache
>> wink
>> -  jssr 223 support
>>
>> In my opinion there is no urgent need for action, it is true that there
>> hasn't been a lot of action in clerezza but imho the project os going on
>> even at a low pace  (as other projects like e.g. the recently graduated
>> wink).
>>
>
> Not sure about no urgent need for action. Maybe we should list the
> requirements
> to fulfil in order to be able to graduate. Wonder if we are able to meet
> them.
>
> Cheers
> Hasan
>
>
>>
>> Cheers,
>> Reto
>>
>> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> bdelacretaz@apache.org
>> > wrote:
>>
>> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org> wrote:
>> > > ...It's good to have the existing released artifacts remain - what
>> about
>> > after
>> > > the donation?
>> > >
>> > > Presumably the moved modules will be released by the new host - will
>> they
>> > > use group id org.apache.clerezza? or move to the new host project group
>> > id?
>> > > I'd suggest renaming the group to the new project but realise it is a
>> bit
>> > > more disruptive...
>> >
>> > I think that's really up to whatever project adopts that code. In
>> > theory package names should change but that's probably not convenient.
>> >
>> > Or maybe it's time to create a semantic module or two at
>> > http://commons.apache.org/ ? If existing committers are willing to
>> > support that with their work it should be easy to make it happen.
>> >
>> > -Bertrand
>> >
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

let me share my throughs. Because this mail is rather long I tried to
split it up in three separate section (1) RDF (2) RESTful/ Web
Interface and (3) other related topics


RDF libs:
====

Out of the viewpoint of Apache Stanbol one needs to ask the Question
if it makes sense to manage an own RDF API. I expect the Semantic Web
Standards to evolve quite a bit in the coming years and I do have
concern that the Clerezza RDF modules will be updated/extended to
provide implementations of those. One example of such an situation is
SPARQL 1.1 that is around for quite some time and is still not
supported by Clerezza. While I do like the small API, the flexibility
to use different TripleStores and that Clerezza comes with OSGI
support I think given the current situation we would need to discuss
all options and those do also include a switch to Apache Jena or
Sesame. Especially Sesame would be an attractive option as their RDF
Graph API [1] is very similar to what Clerezza uses. Apache Jena's
counterparts (Model [2] and Graph [3]) are considerable different and
more complex interfaces. In addition Jena will only change to
org.apache packages with the next major release so a switch before
that release would mean two incompatible API changes.

My personal opinion is that we should keep using Clerezza for now.
Invest some effort to improve the Clerezza RDF modules and than see
how it further develops. Such an Effort should include

*  to implement SPQRAL fast lane (as already discussed with Reto
during ApacheCon). Fast lane would allow Clerezza to use the native
SPARQL engine of the used Triplestore. Meaning that Clerezza only
parses those parts of the SPARQL query to understand the RDF graph to
execute the Query on. This information is than used to parse the query
to the native SPARQL engine via an extended Interface of the
TcProvide. The Clerezza SPARQL implementation would only be used in
case the TcProvider does not provide a native SPARQL implementation of
if the Query spans RDF graphs managed by different TcProvider
instances. By that Clerezza users would be able to use any SPARQL
feature provided by the used TripleStore.
* update to the newest Jena versions (see also STANBOL-621; Peter
Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
Jena bundle used for the Stanbol/LMF integration [5])
* finish and release the SingleTdbDatasetTcProvider.java
(CLEREZZA-691) as this is important for the Stanbol Ontology Manager
component
* move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
code base to Clerezza and release it so that we can use it from their
in Stanbol
* provide an Clerezza JsonLD parser/serializer. This is critical for
Stanbol as several CMS use this as preferred RDF serialization.

[1] http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
[2] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
[3] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
[4] https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
[5] https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml


RESTful API / Web Interface:
=====================

There are several shortcomings of the current implementation of the
Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)

* Jersey's use of java.util.ServiceLoader forces the use manual
configuration of the JAX-RS components. A switch to an OSGI compatible
implementation such as Apache Wink would be very welcome
* The RESTful API documentation is currently written as HTML into
Freemarker templates. This makes it really hard to maintain this
documentation. I would really appreciate the possibility to use
markdown (as used on the Webpage) for that
* For Stanbol deployments of Stanbol it should be possible to exclude
the WebUI so that only the RESTful services are available

regarding :

> Stanbol drops it's interretation of "REST" as "not for humans" and want to go to
> allow integrating (wherever possible as modular and optional components)
> media types designed for human consumptions and support REST approaches
> there as well (thinking of the current back-button unfriendly UI).

Adding support for a simple Table based representation of RDF data
would indeed be an important feature. However having Resource (Entity)
type specific rendering is out of the scope of Apache Stanbol (at
least in my opinion). However AFAIK as soon as we switch to an OSGI
compatible JAX-RS implementation users could add those easily by
providing the according JAX-RS MessageBodyWriter.

If there are people who would like to work it would be really great.
If we could (re)use some stuff from Clerezza - even better. But things
would need to keep simple as Stanbol is no semantic CMS.

I would suggest to start development in an own branch and than have a
discussion/vote based on an early prototype/demonstration.


Other Topics
=========

### Scala and jsr 223 (scripting in the JVM)

I do have an issue with Scala as it adds >150MByte to the PermGen as
soon as it is loaded. But as long as it is an optional dependency and
users are aware of that when adding the dependency I am fine with it.

###  Shell

Personally I do not find the shell very useful. For installing
Bundles/Service configurations I prefer to use the Apache Sling
FileInstaller. For deployment during development I like to use the
Sling Maven Installer plugin. For creating new Stanbol Modules I
rather suggest to create an extensive list of Maven Archetype (e.g.
for Stanbol EnhancementEngines).

As the Shell also depends on Scala the "+150MByte to the PermGen"
issue also applies to the Shell.

### Security

Having a security model in Apache Stanbol might be important for some
use cases. Because of this I consider this an important topic. However
one I have very little experience with.

I would like to get rid of the dependencies to
org.apache.clerezza:patform (AFAIK this is only needed for the
configuration and this could be easily provided by the
sling.properties file at runtime. Defaults can be provided in the
commons.properties file already included in all Stanbol Launchers. I
would also suggest to move the PermissionParser utility over to the
Apache Stanbol Security modules.
This two changes would allow to activate the security module also for
the Stable (Stateless) launcher.


best
Rupert


On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <ha...@trialox.org> wrote:
> Comments inline...
>
> On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
>
>> Ok, sorry for jumping into this discussion so lately. I've been having
>> quite some discussion on the matter here at apacheconeu. Also I had
>> prositive feedback from my resentation of Clerezza yesterday.
>>
>> I think two things:
>> - For high level platform component it is often not clear if the fit better
>> into Stanbol or into Clerezza
>> - The RDF Api shoud actually be independen both from triple store provider
>> as well as from consumer
>>
>> So I think a good solution would be to have the RDF liraries comprising:
>> - A modular and very spec oriented API for RDF and related standards
>> - A set of serializing and parsing providers
>> - Adapters to triple stores (where the api isn't provided by the triple
>> store)
>> basically that's what in the org.apache.clerezza.rdf.* packages
>>
>> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> drops it's interretation of "REST" as "not for humans" and want to go to
>> allow integrating (wherever possible as modular and optional components)
>> media types designed for human consumptions and support REST approaches
>> there as well (thinking of the current back-button unfriendly UI).
>>
>
> IMO, Clerezza is just too big for existing committers. If we could reduce
> it to the
> essential components dealing with rdf and leaving out templating and
> rendering,
> it may be easier to graduate.
>
> - Scala Server Pages
>> - TypeRendering (selection of templates based on the rdf type of the
>> returned response)
>> - Security (already integrated to some degree, code based security to run
>> bundles in a sandboxed manner is not)
>> - Shell (already ships in the stanbol launcher, so here it's about
>> 'adopting' the sources)
>> - Dev tools: rapid development support (create sample projects, have source
>> files as bundles)
>>
>> To the attic:
>> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> same support (jax-rs components asosgi services) is now provided by apache
>> wink
>> -  jssr 223 support
>>
>> In my opinion there is no urgent need for action, it is true that there
>> hasn't been a lot of action in clerezza but imho the project os going on
>> even at a low pace  (as other projects like e.g. the recently graduated
>> wink).
>>
>
> Not sure about no urgent need for action. Maybe we should list the
> requirements
> to fulfil in order to be able to graduate. Wonder if we are able to meet
> them.
>
> Cheers
> Hasan
>
>
>>
>> Cheers,
>> Reto
>>
>> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> bdelacretaz@apache.org
>> > wrote:
>>
>> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org> wrote:
>> > > ...It's good to have the existing released artifacts remain - what
>> about
>> > after
>> > > the donation?
>> > >
>> > > Presumably the moved modules will be released by the new host - will
>> they
>> > > use group id org.apache.clerezza? or move to the new host project group
>> > id?
>> > > I'd suggest renaming the group to the new project but realise it is a
>> bit
>> > > more disruptive...
>> >
>> > I think that's really up to whatever project adopts that code. In
>> > theory package names should change but that's probably not convenient.
>> >
>> > Or maybe it's time to create a semantic module or two at
>> > http://commons.apache.org/ ? If existing committers are willing to
>> > support that with their work it should be easy to make it happen.
>> >
>> > -Bertrand
>> >
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Future of Clerezza and Stanbol

Posted by Hasan Hasan <ha...@trialox.org>.
Comments inline...

On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:

> Ok, sorry for jumping into this discussion so lately. I've been having
> quite some discussion on the matter here at apacheconeu. Also I had
> prositive feedback from my resentation of Clerezza yesterday.
>
> I think two things:
> - For high level platform component it is often not clear if the fit better
> into Stanbol or into Clerezza
> - The RDF Api shoud actually be independen both from triple store provider
> as well as from consumer
>
> So I think a good solution would be to have the RDF liraries comprising:
> - A modular and very spec oriented API for RDF and related standards
> - A set of serializing and parsing providers
> - Adapters to triple stores (where the api isn't provided by the triple
> store)
> basically that's what in the org.apache.clerezza.rdf.* packages
>
> That's the stuff that would fit well into Stanbol. Provided that stanbol
> drops it's interretation of "REST" as "not for humans" and want to go to
> allow integrating (wherever possible as modular and optional components)
> media types designed for human consumptions and support REST approaches
> there as well (thinking of the current back-button unfriendly UI).
>

IMO, Clerezza is just too big for existing committers. If we could reduce
it to the
essential components dealing with rdf and leaving out templating and
rendering,
it may be easier to graduate.

- Scala Server Pages
> - TypeRendering (selection of templates based on the rdf type of the
> returned response)
> - Security (already integrated to some degree, code based security to run
> bundles in a sandboxed manner is not)
> - Shell (already ships in the stanbol launcher, so here it's about
> 'adopting' the sources)
> - Dev tools: rapid development support (create sample projects, have source
> files as bundles)
>
> To the attic:
> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
> same support (jax-rs components asosgi services) is now provided by apache
> wink
> -  jssr 223 support
>
> In my opinion there is no urgent need for action, it is true that there
> hasn't been a lot of action in clerezza but imho the project os going on
> even at a low pace  (as other projects like e.g. the recently graduated
> wink).
>

Not sure about no urgent need for action. Maybe we should list the
requirements
to fulfil in order to be able to graduate. Wonder if we are able to meet
them.

Cheers
Hasan


>
> Cheers,
> Reto
>
> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> bdelacretaz@apache.org
> > wrote:
>
> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org> wrote:
> > > ...It's good to have the existing released artifacts remain - what
> about
> > after
> > > the donation?
> > >
> > > Presumably the moved modules will be released by the new host - will
> they
> > > use group id org.apache.clerezza? or move to the new host project group
> > id?
> > > I'd suggest renaming the group to the new project but realise it is a
> bit
> > > more disruptive...
> >
> > I think that's really up to whatever project adopts that code. In
> > theory package names should change but that's probably not convenient.
> >
> > Or maybe it's time to create a semantic module or two at
> > http://commons.apache.org/ ? If existing committers are willing to
> > support that with their work it should be easy to make it happen.
> >
> > -Bertrand
> >
>

Re: Future of Clerezza and Stanbol

Posted by Hasan Hasan <ha...@trialox.org>.
Comments inline...

On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:

> Ok, sorry for jumping into this discussion so lately. I've been having
> quite some discussion on the matter here at apacheconeu. Also I had
> prositive feedback from my resentation of Clerezza yesterday.
>
> I think two things:
> - For high level platform component it is often not clear if the fit better
> into Stanbol or into Clerezza
> - The RDF Api shoud actually be independen both from triple store provider
> as well as from consumer
>
> So I think a good solution would be to have the RDF liraries comprising:
> - A modular and very spec oriented API for RDF and related standards
> - A set of serializing and parsing providers
> - Adapters to triple stores (where the api isn't provided by the triple
> store)
> basically that's what in the org.apache.clerezza.rdf.* packages
>
> That's the stuff that would fit well into Stanbol. Provided that stanbol
> drops it's interretation of "REST" as "not for humans" and want to go to
> allow integrating (wherever possible as modular and optional components)
> media types designed for human consumptions and support REST approaches
> there as well (thinking of the current back-button unfriendly UI).
>

IMO, Clerezza is just too big for existing committers. If we could reduce
it to the
essential components dealing with rdf and leaving out templating and
rendering,
it may be easier to graduate.

- Scala Server Pages
> - TypeRendering (selection of templates based on the rdf type of the
> returned response)
> - Security (already integrated to some degree, code based security to run
> bundles in a sandboxed manner is not)
> - Shell (already ships in the stanbol launcher, so here it's about
> 'adopting' the sources)
> - Dev tools: rapid development support (create sample projects, have source
> files as bundles)
>
> To the attic:
> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
> same support (jax-rs components asosgi services) is now provided by apache
> wink
> -  jssr 223 support
>
> In my opinion there is no urgent need for action, it is true that there
> hasn't been a lot of action in clerezza but imho the project os going on
> even at a low pace  (as other projects like e.g. the recently graduated
> wink).
>

Not sure about no urgent need for action. Maybe we should list the
requirements
to fulfil in order to be able to graduate. Wonder if we are able to meet
them.

Cheers
Hasan


>
> Cheers,
> Reto
>
> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
> bdelacretaz@apache.org
> > wrote:
>
> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <an...@apache.org> wrote:
> > > ...It's good to have the existing released artifacts remain - what
> about
> > after
> > > the donation?
> > >
> > > Presumably the moved modules will be released by the new host - will
> they
> > > use group id org.apache.clerezza? or move to the new host project group
> > id?
> > > I'd suggest renaming the group to the new project but realise it is a
> bit
> > > more disruptive...
> >
> > I think that's really up to whatever project adopts that code. In
> > theory package names should change but that's probably not convenient.
> >
> > Or maybe it's time to create a semantic module or two at
> > http://commons.apache.org/ ? If existing committers are willing to
> > support that with their work it should be easy to make it happen.
> >
> > -Bertrand
> >
>