You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by ij...@apache.org on 2011/11/20 19:31:29 UTC
svn commit: r1204205 - in /incubator/jena/site/trunk/content/jena/documentation/notes: index.mdtext jena-internals.mdtext

Author: ijd
Date: Sun Nov 20 18:31:28 2011
New Revision: 1204205

URL: http://svn.apache.org/viewvc?rev=1204205&view=rev
Log:
Added old notes on Jena internals

Added:
    incubator/jena/site/trunk/content/jena/documentation/notes/jena-internals.mdtext
Modified:
    incubator/jena/site/trunk/content/jena/documentation/notes/index.mdtext

Modified: incubator/jena/site/trunk/content/jena/documentation/notes/index.mdtext
URL: http://svn.apache.org/viewvc/incubator/jena/site/trunk/content/jena/documentation/notes/index.mdtext?rev=1204205&r1=1204204&r2=1204205&view=diff
==============================================================================
--- incubator/jena/site/trunk/content/jena/documentation/notes/index.mdtext (original)
+++ incubator/jena/site/trunk/content/jena/documentation/notes/index.mdtext Sun Nov 20 18:31:28 2011
@@ -2,3 +2,9 @@ Title: General notes and how-to's
 
 - [Concurrency how-to](concurrency-howto.mdtext) Handling concurrent access to Jena models
 - [Event handler how-to](event-handler-howto.mdtext) Responding to events
+- [File manager how-to](file-manager.html) Convenient access to RDF files
+- [Jena internals](jena-internals.html) Notes on internal architecture (now rather old)
+- [Model factory](model-factory.html) Creating Jena models of various kinds
+- [RDF frames](rdf-frames.html) Viewing RDF statements as frame-like objects
+- [Reification](reification.html) API support for RDF reification
+- [Typed literals](typed-literals.html) Creating and extracting RDF typed literals

Added: incubator/jena/site/trunk/content/jena/documentation/notes/jena-internals.mdtext
URL: http://svn.apache.org/viewvc/incubator/jena/site/trunk/content/jena/documentation/notes/jena-internals.mdtext?rev=1204205&view=auto
==============================================================================
--- incubator/jena/site/trunk/content/jena/documentation/notes/jena-internals.mdtext (added)
+++ incubator/jena/site/trunk/content/jena/documentation/notes/jena-internals.mdtext Sun Nov 20 18:31:28 2011
@@ -0,0 +1,545 @@
+Title: Notes on Jena internals
+
+**Note:** These notes are quite old now, but may still be of some interest in the design
+and architecture of Jena.
+
+## Enhanced Nodes
+
+This note is a development of the original note on the enhanced
+node and graph design of Jena 2.
+
+### Key objectives for the enhanced node design
+
+One problem with the Jena 1 design was that both the DAML layer and
+the RDB layer independently extended Resource with domain-specific
+information. That made it impossible to have a DAML-over-RDB
+implementation. While this could have been fixed by using the
+"enhanced resource" mechanism of Jena 1, that would have left a
+second problem.
+
+In Jena 1.0, once a resource has been determined to be a DAML Class
+(for instance), that remains true for the lifetime of the model. If
+a resource starts out not qualifying as a DAML Class (no
+`rdf:type daml:Class`) then adding the type assertion later doesn't
+make it a Class. Similarly, of a resource is a DAML Class, but then
+the type assertion is retracted, the resource is still apparently a
+class.
+
+Hence being a DAMLClass is a *view* of the resource that may change
+over time. Moreover, a given resource may validly have a number of
+different views simultaneously. Using the current `DAMLClass`
+implementation method means that a given resource is limited to a
+single such view.
+
+A key objective of the new design is to allow different views, or
+*facets*, to be used dynamically when accessing a node. The new
+design allows nodes to be polymorphic, in the sense that the same
+underlying node from the graph can present different encapsulations
+- thus different affordances to the programmer - on request.
+
+In summary, the enhanced node design in Jena 2.0 allows programmers
+to:
+
+-   provide alternative perspectives onto a node from a graph,
+    supporting additional functionality particular to that perspective;
+-   dynamically convert a between perspectives on nodes;
+-   register implementations of implementation classes that present
+    the node as an alternative perspective.
+
+### Terminology
+
+To assist the following discussion, the key terms are introduced
+first.
+
+node
+  ~ A subject or object from a triple in the underlying graph
+graph
+  ~ The underlying container of RDF triples that simplifies the
+    previous abstraction Model
+enhanced node
+  ~ An encapsulation of a node that adds additional state or
+    functionality to the interface defined for node. For example, a bag
+    is a resource that contains a number of other resources; an
+    enhanced node encapsulating a bag might provide simplified
+    programmatic access to the members of the bag.
+enhanced graph
+  ~ Just as an enhanced node encapsulates a node and adds extra
+    functionality, an enhanced graph encapsulates an underlying graph
+    and provides additional features. For example, both Model and
+    DAMLModel can be thought of as enhancements to the (deliberately
+    simple) interface to graphs.
+polymorphic
+  ~ An abstract super-class of enhanced graph and enhanced node
+    that exists purely to provide shared implementation.
+personality
+  ~ An abstraction that circumscribes the set of alternative views
+    that are available in a given context. In particular, defines a
+    mapping from types (q.v.) to implementations (q.v.). This seems to
+    be taken to be closed for graphs.
+implementation
+  ~ A factory object that is able to generate polymorphic objects
+    that present a given enhanced node according to a given type. For
+    example, an alt implementation can produce a sub-class of enhanced
+    node that provides accessors for the members of the alt.
+
+#### Key points
+
+Some key features of the design are:
+
+-   every enhanced graph has a single graph personality, which
+    represents the types of all the enhanced nodes that can be created
+    in this graph;
+-   every enhanced node refers to that personality
+-   different kinds of enhanced graph can have different
+    personalities, for example, may implement interfaces in different
+    ways, or not implement some at all.
+-   enhanced nodes wrap information in the graph, but keep no
+    independant state; they may be discarded and regenerated at whim.
+
+### How an enhanced node is created
+
+#### Creation from another enhanced node
+
+If `en` is an enhanced node representing some resource we wish to
+be able to view as being of some (Java) class/interface `T`, the
+expression `en.as(T.class)` will either deliver an EnhNode of type
+`C`, if it is possible to do so, or throw an exception if not.
+
+To check if the conversion is allowed, without having to catch
+exceptions, the expression `en.canAs(T.class)` delivers `true` iff
+the conversion is possible.
+
+#### Creation from a base node
+
+Somehow, some seed enhanced node must be created, otherwise `as()`
+would have nothing to work on. Subclasses of enhanced node provide
+constructors (perhaps hidden behind factories) which wrap plain
+nodes up in enhanced graphs. Eventually these invoke the
+constructor
+    `EnhNode(Node,EnhGraph)`
+
+It's up to the constructors for the enhanced node subclasses to
+ensure that they are called with appropriate arguments.
+#### internal operation of the conversion
+
+`as(Class T)` is defined on EnhNode to invoke `asInternal(T)` in
+`Polymorphic`. If the original enhanced node `en`is already a valid
+instance of `T`, it is returned as the result. Validity is checked
+by the method `isValue()`.
+
+If `en` is not already of type `T`, then a cache of alternative
+views of `en` is consulted to see if a suitable alternative exists.
+The cache is implemented as a *sibling ring* of enhanced nodes -
+each enhanced node has a link to its next sibling, and the "last"
+node links back to the "first". This makes it cheap to find
+alternative views if there are not too many of them, and avoids
+caches filling up with dead nodes and having to be flushed.
+
+If there is no existing suitable enhanced node, the node's
+personality is consulted. The personality maps the desired class
+type to an `Implementation` object, which is a factory with a
+`wrap` method which takes a (plain) node and an enhanced graph and
+delivers the new enhanced node after checking that its conditions
+apply. The new enhanced node is then linked into the sibling ring.
+
+### How to build an enhanced node & graph
+
+What you have to do to define an enhanced node/graph
+implementation:
+
+1.  define an interface `I` for the new enhanced node. (You could
+    use just the implementation class, but we've stuck with the
+    interface, because there might be different implementations)
+2.  define the implementation class `C`. This is just a front for
+    the enhanced node. All the state of `C` is reflected in the graph
+    (except for caching; but beware that the graph can change without
+    notice).
+3.  define an `Implementation` class for the factory. This class
+    defines methods `canWrap` and `wrap`, which test a node to see if
+    it is allowed to represent `I` and construct an implementation of
+    `C`respectively.
+4.  Arrange that the personality of the graph maps the class of `I`
+    to the factory. At the moment we do this by using (a copy of) the
+    built-in graph personality as the personality for the enhanced
+    graph.
+
+For an example, see the code for `ReifiedStatementImpl`.
+
+
+## Reification API
+
+### Introduction
+
+This document describes the reification API in Jena2, following
+discussions based on the 0.5a document. The essential decision made
+during that discussion is that reification triples are captured and
+dealt with by the Model transparently and appropriately.
+
+### Context
+
+The first Jena implementation made some attempt to optimise the
+representation of reification. In particular it tried to avoid so
+called 'triple bloat', *ie* requiring four triples to represent the
+reification of a statement. The approach taken was to make a
+*Statement* a subclass of *Resource* so that properties could be
+directly attached to statement objects.
+
+There are a number of defects in the Jena 1 approach.
+
+-   Not everyone in the team was bought in to the approach
+-   The *.equals()* method for *Statement*s was arguably wrong and
+    also violated the Java requirements on a *.equals()*
+-   The implied triples of a reification were not present so could
+    not be searched for
+-   There was confusion between the optimised representation and
+    explicit representation of reification using triples
+-   The optimisation did not round trip through RDF/XML using the
+    the writers and ARP.
+
+However, there are some supporters of the approach. They liked:
+-   the avoidance of triple bloat
+-   that the extra reifications statements are not there to be
+    found on queries or ListStatements and do not affect the *size()*
+    method.
+
+Since Jena was first written the RDFCore WG have clarified the
+meaning of a reified statement. Whilst Jena 1 took a reified
+statement to denote a statement, RDFCore have decided that a
+reified statement denotes an occurrence of a statement, otherwise
+called a stating. The Jena 1 *.equals()* methods for *Statement*s
+is thus inappropriate for comparing reified statements.
+The goal of reification support in the Jena 2 implementation are:
+
+-   to conform to the revised RDF specifications
+-   to maintain the expections of Jena 1; *ie* they should still be
+    able to reify everything without worrying about triple bloat if
+    they want to
+-   as far as is consistent with 2, to not break existing code, or
+    at least make it easy to transition old code to Jena 2.
+-   to enable round tripping through RDF/XML and other RDF
+    representation langauges
+-   enable a complete standard compliant implementation, but not
+    necessarily as default
+
+### Presentation API
+
+*Statement* will no longer be a subclass of *Resource*. Thus a
+statement may not be used where a resource is expected. Instead, a
+new interface *ReifiedStatement* will be defined:
+
+    public interface ReifiedStatement extends Resource
+        {
+        public Statement getStatement();
+        // could call it a day at that or could duplicate convenience
+        // methods from Statement, eg getSubject(), getInt().
+        ...
+        }
+
+The *Statement* interface will be extended with the following
+methods:
+    public interface Statement
+        ...
+        public ReifiedStatement createReifiedStatement();
+        public ReifiedStatement createReifiedStatement(String URI);
+    /* */
+        public boolean isReified();
+        public ReifiedStatement getAnyReifiedStatement();
+    /* */
+        public RSIterator listReifiedStatements();
+    /* */
+        public void removeAllReifications();
+        ...
+
+*RSIterator* is a new iterator which returns *ReifiedStatement*s.
+It is an extension of *ResourceIterator*.
+The *Model* interface will be extended with the following methods:
+
+    public interface Model
+        ...
+        public ReifiedStatement createReifiedStatement(Statement stmt);
+        public ReifiedStatement createReifiedStatement(String URI, Statement stmt);
+    /* */
+        public boolean isReified(Statement st);
+        public ReifiedStatement getAnyReifiedStatement(Statement stmt);
+    /* */
+        public RSIterator listReifiedStatements();
+        public RSIterator listReifiedStatements(Statement stmt);
+    /* */
+        public void removeReifiedStatement(reifiedStatement rs);
+        public void removeAllReifications(Statement st);
+        ...
+
+The methods in *Statement* are defined to be the obvious calls of
+methods in *Model*. The interaction of those models is expressed
+below. Reification operates over statements in the model which use
+predicates **rdf:subject**, **rdf:predicate**, **rdf:object**, and
+**rdf:type** with object **rdf:Statement**.
+*statements with those predicates are, by default, invisible*. They
+do not appear in calls of *listStatements*, *contains*, or uses of
+the *Query* mechanism. Adding them to the model will not affect
+*size()*. Models that do not hide reification quads will also be
+available.
+
+### Retrieval
+
+The *Model::as()* mechanism will allow the retrieval of reified
+statements.
+
+    someResource.as( ReifiedStatement.class )
+
+If *someResource* has an associated reification quad, then this
+will deliver an instance *rs* of *ReifiedStatement* such that
+*rs.getStatement()* will be the statement *rs* reifies. Otherwise a
+*DoesNotReifyException* will be thrown. (Use the predicate
+*canAs()* to test if the conversion is possible.)
+It does not matter how the quad components have arrived in the
+model; explicitly asserted or by the *create* mechanisms described
+below. If quad components are removed from the model, existing
+*ReifiedStatement* objects will continue to function, but
+conversions using *as()* will fail.
+
+### Creation
+
+*createReifiedStatement(Statement stmt)* creates a new
+*ReifiedStatement* object that reifies *stmt*; the appropriate
+quads are inserted into the model. The resulting resource is a
+blank node.
+
+*createReifiedStatement(String URI, Statement stmt)* creates a new
+*ReifiedStatement* object that reifies *stmt*; the appropriate
+quads are inserted into the model. The resulting resource is a
+*Resource* with the URI given.
+
+### Equality
+
+Two reified statements are *.equals()* iff they reify the same
+statement and have *.equals()* resources. Thus it is possible for
+equal *Statement*s to have unequal reifications.
+
+### IsReified
+
+*isReified(Statement st)* is true iff in the *Model* of this
+*Statement* there is a reification quad for this *Statement*. It
+does not matter if the quad was inserted piece-by-piece or all at
+once using a *create* method.
+
+### Fetching
+
+*getAnyReifiedStatement(Statement st)* delivers an existing
+*ReifiedStatement* object that reifies *st*, if there is one;
+otherwise it creates a new one. If there are multiple reifications
+for *st*, it is not specified which one will be returned.
+
+### Listing
+
+*listReifiedStatements()* will return an *RSIterator* which will
+deliver all the reified statements in the model.
+
+*listReifiedStatements( Statement st )* will return an *RSIterator*
+which will deliver all the reified statements in the model that
+reifiy *st*.
+
+### Removal
+
+*removeReifiedStatement(ReifiedStatement rs)* will remove the
+reification *rs* from the model by removing the reification quad.
+Other reified statements with different resources will remain.
+
+*removeAllReifications(Statement st)* will remove all the
+reifications in this model which reify *st*.
+
+### Input and output
+
+The writers will have access to the complete set of *Statement*s
+and will be able to write out the quad components.
+
+The readers need have no special machinery, but it would be
+efficient for them to be able to call *createReifiedStatement* when
+detecting an reification.
+
+### Performance
+
+Jena1's "statements as resources" approach avoided triples bloat by
+not storing the reification quads. How, then, do we avoid triple
+bloat in Jena2?
+
+The underlying machinery is intended to capture the reification
+quad components and store them in a form optimised for reification.
+In particular, in the case where a statement is completely reified,
+it is expected to store only the implementation representation of
+the *Statement*.
+
+*createReifiedStatement* is expected to bypass the construction and
+detection of the quad components, so that in the "usual case" they
+will never come into existance.
+
+
+## The Reification SPI
+
+### Introduction
+
+This document describes the reification SPI, the mechanisms by
+which the Graph family supports the Model API reification
+interface.
+
+Graphs handle reification at two levels. First, their reifier
+supports requests to reify triples and to search for reifications.
+The reifier is responsible for managing the reification information
+it adds and removes - the graph is not involved.
+
+Second, a graph may optionally allow all triples added and removed
+through its normal operations (including the bulk update
+interfaces) to be monitored by its reifier. If so, all appropriate
+triples become the property of the reifier - they are no longer
+visible through the graph.
+
+A graph may also have a reifier that doesn't do any reification.
+This is useful for internal graphs that are not exposed as models.
+So there are three kinds of `Graph`:
+
+Graphs that do no reification;
+Graphs that only do explicit reficiation;
+Graphs that do implicit reification.
+
+### Graph operations for reification
+
+The primary reification operation on graphs is to extract their
+`Reifier` instance. Handing reification off to a different class
+allows reification to be handled independantly of other Graph
+issues, eg query handling, bulk update.
+
+#### Graph.getReifier() -\> Reifier
+
+Returns the `Reifier` for this `Graph`. Each graph has a single
+reifier during its lifetime. The reifier object need not be
+allocated until the first call of `getReifier()`.
+### add(Triple), delete(Triple)
+
+These two operations may defer their triples to the graph's reifier
+using `handledAdd(Triple)` and `handledDelete(Triple)`; see below
+for details.
+
+### Interface Reifier
+
+Instances of `Reifier` handle reification requests from their
+`Graph` and from the API level code (issues by the API class
+`ModelReifier`.
+
+#### reifier.getHiddenTriples() -\> Graph
+
+The reifier may keep reification triples to itself, coded in some
+special way, rather than having them stored in the parent `Graph`.
+This method exposes those triples as another `Graph`. This is a
+dynamic graph - it changes as the underlying reifications change.
+However, it is read-only; triples cannot be added to or removed
+from it.
+    The `SimpleReifier` implementation currently does not implement a
+    dynamic graph. This is a bug that will need fixing.
+
+#### reifier.getParentGraph() -\> Graph
+
+Get the `Graph` that this reifier serves; the result is never
+`null`. (Thus the observable relationship between graphs and
+reifiers is 1-1.)
+
+#### class AlreadyReifiedException
+
+This class extends `RDFException`; it is the exception that may be
+thrown by `reifyAs`.
+### reifier.reifyAs( Triple t, Node n ) -\> Node
+
+Record the `t` as reified in the parent `Graph` by the given `n`
+and returns `n`. If `n` already reifies a different `Triple`, throw
+a `AlreadyReifiedException`.
+Calling `reifyAs(t,n)` is like adding the triples:
+
+`n rdf:type ref:Statement`
+`n rdf:subject t.getSubject()`
+`n rdf:predicate t.getPredicate()`
+`n rdf:object t.getObject()`
+to the associated Graph; however, it is intended that it is
+efficient in both time and space.
+
+#### reifier.hasTriple( Triple t ) -\> boolean
+
+Returns true iff some `Node n` reifies `t` in this `Reifier`,
+typically by an unretracted call of `reifyAs(t,n)`.
+The intended (and actual) use for `hasTriple(Triple)` is in the
+implementation of `isReified(Statement)` in `Model`.
+
+#### reifier.getTriple( Node n ) -\> Triple
+
+Get the single `Triple` associated with `n`, if there is one. If
+there isn't, return `null`.
+A node reifies at most one triple. If `reifyAs`, with its explicit
+check, is bypassed, and extra reification triples are asserted into
+the parent graph, then `getTriple()` will simply return `null`.
+
+### reifier.allNodes() -\> ExtendedIterator
+
+Returns an (extended) iterator over all the nodes that (still)
+reifiy something in this reifier.
+This is intended for the implementation of `listReifiedStatements`
+in `Model`.
+
+#### reifier.allNodes( Triple t ) -\> ClosableIterator
+
+Returns an iterator over all the nodes that (still) reify the
+triple \_t\_.
+### reifier.remove( Node n, Triple t )
+
+Remove the association between `n` and the triple`t`. Subsequently,
+`hasNode(n)` will return false and `getTriple(n)` will return
+`null`.
+This method is used to implement `removeReification(Statement)` in
+`Model`.
+
+#### reifier.remove( Triple t )
+
+Remove all the associations between any node `n` and `t`; ie, for
+all `n` do `remove(n,t)`.
+This method is used to implement `removeAllReifications` in
+`Model`.
+
+#### handledAdd( Triple t ) -\> boolean
+
+A graph doing reification may choose to monitor the triples being
+added to it and have the reifier handle reification triples. In
+this case, the graph's `add(t)` should call `handledAdd(t)` and
+only proceed with its add if the result is `false`.
+A graph that does not use `handledAdd()` [and `handledDelete()`]
+can only use the explict reification supplied by its reifier.
+
+#### handledRemove( Triple t )
+
+As for `handledAdd(t)`, but applied to `delete`.
+
+### SimpleReifier
+
+`SimpleReifier` is an implementation of `Reifier` suitable for
+in-memory `Graph`s built over `GraphBase`. It operates in either of
+two modes: with and without triple interception. With interception
+enabled, reification triples fed to (or removed from) its parent
+graph are captured using `handledAdd()` and `handledRemove`;
+otherwise they are ignored and the graph must store them itself.
+`SimpleReifier` keeps a map from nodes to the reification
+information about that node. Nodes which have no reification
+information (most of them, in the usual case) do not appear in the
+map at all.
+
+Nodes with partial or excessive reification information are
+associated with `Fragments`. A `Fragments` for a node `n` records
+separately
+
+the `S`s of all `n ref:subject S` triples
+the `P`s of all `n ref:predicate P` triples
+the `O`s of all `n ref:subject O` triples
+the `T`s of all `n ref:type T[Statement]` triples
+If the `Fragments` becomes *singular*, ie each of these sets
+contains exactly one element, then `n` represents a reification of
+the triple `(S, P, O)`, and the `Fragments` object is replaced by
+that triple.
+(If another reification triple for `n` arrives, then the triple is
+re-exploded into `Fragments`.)
+
+