You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rupert Westenthaler <rw...@apache.org> on 2011/01/25 10:40:09 UTC

FISE JCR Store outdated

Hi all,

While reviewing all the components as part of the work on STANBOL-40
(renaming FISE to the Apache Stanbol namespace) I discovered that the
JCR based store is not included within the mvn build process. In
addition this implementation is outdated because it does not implement
the "MGraph getEnhancementGraph()" added to the Store interface.
The getEnhancementGraph() method needs to return the RDF Graph with
all the RDF statements off all the the content items sent to the
stateful "/store" interface of FISE. This graph is needed for the
implementation of the SPARQL endpoint "/sparql".

Some facts about the current Implementation:
 - Each Content Item is represented by a Node in the JCR Repository.
 - All Content Items share a common parent with the path "/fise".
 - The Content and mimeType are store within the "jcr:data" and the
"jcr:mimeType" properties
 - The metadata (RDF data) are stored as triples. Each triple is
represented as child node of the content item node with the properties
"subject", "property" and "object"
 - In addition the metadata are also kept in memory (within a
SimpleMGraph instance)
 - The JCRContentItem registers a GraphListener to the in memory
representation and updates the JCR store on any change to the in
memory graph.

Based on this implementation it is not trivial to provide an
implementation of the getEnhancementGraph() method because if would
require to provide a Clerezza Graph instance that contains all Triples
from all content items stored within the JCRStore.

One possibility would be to use something like an "semantic index" -
an Graph that contains all the the metadata of all the content items
within the JCR Store. Every time a new version of an content item is
stored in the JCR Store, the old version of the content item would
needed to be deleted and the new version would needed to be added to
this store. In addition one would need to provide an bootstrapping
functionality to build up or restore the semantic index if necessary.

An other possibility would be to implement the Clerezza Graph
interface. This could be done by extending the AbstractMGraph
interface and usually requires to implement the
 - add(Triple)
 - remove(Triple) and the
 - Iterator<Triple> filter(NonLiteral subject, UriRef predicate,
Resource object) method that supports parsing null as wildcard for
each of the three arguments.
By doing that one really nice side effect would be, that this would
also allow to execute SPARQL queries directly on the JCR Store
(because Clerezza provides SPARQL functionality on top of any graph
implementation).

In addition to this I have discovered that the mapping Triple <-> JCR
node mapping as currently implemented does only work for URIs and not
for blank nodes nor Literals (see getMetadata() and persistTriple()
within JCRContentItem).

The reason why I am writing all that is, that I would very like to
update and improve the current implementation of the JCR store. I
could volunteer to improve the Triple <-> JCR node mapping, however
working on that makes only sense if someone could work on the
implementation of the getEnhancementGraph() method (preferable by
implementing the Clerezza Graph interface to try out how well SPARQL
queries can perform on such an implementation).

For the work on STANBOL-40 my plan is to apply all the required
namespace changes to this module even that it can not be used as Store
for the current version of FISE.

best
Rupert Westenthaler


-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen