You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Andy Seaborne <an...@apache.org> on 2012/11/15 13:43:45 UTC

Features

(NB multiple dev@ mailing lists)

On 13/11/12 12:13, Rupert Westenthaler wrote:
> Hi all,
>
> I would like to share some thoughts/comments and suggestions from my side:

Thanks - these are interesting to hear.

>
> ResourceFactory: Clerezza is missing a Factory for RDF resources. I
> would like to have such a Factory. The Factory should be obtainable
> via the Graph - the Collection of Triples. IMO such a Factory is
> required if all resource types (IRI, Bnode, Literal) are represented
> by interfaces.

Yes - a factory is needed if interfaces.

Whether they should interfaces or whether fixed classes is an 
interesting design point.  I can see arguments both ways.

The argument for interfaces is presumably different implementations for 
different storage layers (e.g. with hidden internal pointers related to 
the storage).  It is also a case of "it's the Java way".

But two RDFTerms (resources) are equal by value - if they have the same 
IRI they are equal and equality is tied to putting in Java collections.

I think the consequence is that a specific subsystem can't assume that 
RDF terms passed to it automatically must have come from that component. 
  Theer's not

And some RDF terms are

[[RDF Term is the term invented in SPARQL to cover 
resources/bnodes/literals because there wasn't one in RDF : resource is 
used for "web resource" so either the thing being described, not its 
name, and/or as a general concept, not specific to RDF ]]

Interesting design point for literals is value vs lexical form/datatype. 
It is the value that matters (OK - should matter), whether it's written 
"+1"^^xsd:integer  or "01"^^xsd:byte.  Does any one have a use case 
example where the derived datatype matters semantically?

> BNodes: If Bnode is an interface than any implementation is free to
> internally use a "bnode-id". One argument pro such ids (that was not
> yet mentioned) is that such id's allow you to avoid in-memory mappings
> for bnodes when wrapping an native implementation. In Clerezza you
> currently need to have this Bidi maps.
>
> Triple, Quads: While for some use cases the Triple-in-Graph based API
> (Quad := Triple t =
> TripleStore#getGraph(context).filter(subject,predicate,object)) is
> sufficient this is no longer the case as soon as Applications want to
> work with an Graph that contains Quads with several contexts. So I
> would vote for having support for Quads.

That is what an RDF dataset is supposed to be, but it's not completely 
transparent - just working with the default graph is very much like 
working with one graph.

The full-blown quads-in-graph would be N3-style formulae, where a graph 
nodes can be a graph.  Also called "graph literals".

At this point, they are not going to happen for RDF but if building an 
API or component, I would at least put the hooks in for it to prepare 
for a possible future.

> Dataset,Graph: Out of an User perspective Dataset (how the TripleStore
> looks at the Triples) and Graph (how RDF looks at the Triples) are not
> so different. Because of that I would like to have a single domain
> object fitting for both. The API should focus on the Graph aspects (as
> Clerezza does) while still allowing efficient implementations that do
> not load all triples into memory (e.g. use closeable iterators)
>
> Immutable Graphs: I had really problems to get this right and the
> current Clerezza API does not help with that task (resulting in things
> like read-only mutable graphs that are no Graphs as they only provide
> a read-only view on a Graph that might still be changed by other
> means). I think read-only Graphs (like
> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> use case to protect a returned graph from modifications by the caller
> of the method is much more prominent as truly immutable graphs.
>
> SPARQL: I would not deal with parsing SPARQL queries but rather
> forward them as is to the underlaying implementation. If doing so the
> API would only need to border with result sets. This would also avoid
> the need to deal with "Datasets". This is not arguing against a
> fallback (e.g. the trick Clerezza does by using the Jena SPARQL
> implementation) but in practice efficient SPARQL executions can only
> happen natively within the TripleStore. Trying to do otherwise will
> only trick users into use cases that will not scale.

Agreed - and memory is a precious resource at scale.  It's usually 
better to give it to the data storage to avoid I/O.  Too much overhead 
in higher level APIs keeping state competes with the I/O caching.

	Andy

>
> best
> Rupert

Re: Features

Posted by Reto Bachmann-Gmür <re...@apache.org>.

In Clerezza the graph check for read-permission on read and
readwrite-permission on add and delete. As the name suggests readwrite
permission implies read permission.

The reason for that is that the add-method returns false is a triple was
already in the graph so even a pure write permission would leak information
on what's in the graph.

I think security should be done with standard jaas and is independent of
particular interfaces in the api.

Cheers,
Reto
On Nov 16, 2012 2:45 PM, "Claude Warren" <cl...@xenei.com> wrote:

> >> Immutable Graphs: I had really problems to get this right and the
> >> current Clerezza API does not help with that task (resulting in things
> >> like read-only mutable graphs that are no Graphs as they only provide
> >> a read-only view on a Graph that might still be changed by other
> >> means). I think read-only Graphs (like
> >> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
> >> use case to protect a returned graph from modifications by the caller
> >> of the method is much more prominent as truly immutable graphs.
>
> I am currently working on a set of dynamic proxies in an attempt to
> add security all all layers of the Jena stack.  I currently have the
> graph layer complete and the model layer 50% done.
>
> My thought is that in addition to having read only you might want to
> have write only (I know that sounds strange but I've seen such in
> DBs).  The upshot is that I would put full CRUD restriction
> capabilities within the system.
>
> I'm not sure that it will work but I thought I would give it a try.  I
> think that something needs to be done in this arena to go along with
> the Fuseki security discussion I saw awhile back.
>
> Claude
>

Re: Features

Posted by Andy Seaborne <an...@apache.org>.

On 16/11/12 13:44, Claude Warren wrote:
>>> Immutable Graphs: I had really problems to get this right and the
>>> current Clerezza API does not help with that task (resulting in things
>>> like read-only mutable graphs that are no Graphs as they only provide
>>> a read-only view on a Graph that might still be changed by other
>>> means). I think read-only Graphs (like
>>> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
>>> use case to protect a returned graph from modifications by the caller
>>> of the method is much more prominent as truly immutable graphs.
>
> I am currently working on a set of dynamic proxies in an attempt to
> add security all all layers of the Jena stack.  I currently have the
> graph layer complete and the model layer 50% done.
>
> My thought is that in addition to having read only you might want to
> have write only (I know that sounds strange but I've seen such in
> DBs).  The upshot is that I would put full CRUD restriction
> capabilities within the system.
>
> I'm not sure that it will work but I thought I would give it a try.  I
> think that something needs to be done in this arena to go along with
> the Fuseki security discussion I saw awhile back.
>
> Claude
>

Excellent.

In case it helps:

com.hp.hpl.jena.sparql.core.DatasetGraphReadOnly

and friend:

com.hp.hpl.jena.sparql.graph.GraphReadOnly

Not perfect.


TDB, for read transactions enforces readonly at a lower level: 
BlockMgrBuilderReadonly and NodeTableBuilderReadonly.

As BlockMgrs and NodeTables are all datastructures for TRDB, if those 
two are readonly, the DB is immutable (for a view on to it).

	Andy

Re: Features

Posted by Claude Warren <cl...@xenei.com>.

>> Immutable Graphs: I had really problems to get this right and the
>> current Clerezza API does not help with that task (resulting in things
>> like read-only mutable graphs that are no Graphs as they only provide
>> a read-only view on a Graph that might still be changed by other
>> means). I think read-only Graphs (like
>> Collections.unmodifiableCollection(..)) should be sufficient. IMHO the
>> use case to protect a returned graph from modifications by the caller
>> of the method is much more prominent as truly immutable graphs.

I am currently working on a set of dynamic proxies in an attempt to
add security all all layers of the Jena stack.  I currently have the
graph layer complete and the model layer 50% done.

My thought is that in addition to having read only you might want to
have write only (I know that sounds strange but I've seen such in
DBs).  The upshot is that I would put full CRUD restriction
capabilities within the system.

I'm not sure that it will work but I thought I would give it a try.  I
think that something needs to be done in this arena to go along with
the Fuseki security discussion I saw awhile back.

Claude