You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@marmotta.apache.org by ss...@apache.org on 2013/02/25 21:06:57 UTC

svn commit: r1449866 - /incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm

Author: sschaffert
Date: Mon Feb 25 20:06:57 2013
New Revision: 1449866

URL: http://svn.apache.org/r1449866
Log:
finished documentation for the KiWi triple store

Modified:
    incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm

Modified: incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm
URL: http://svn.apache.org/viewvc/incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm?rev=1449866&r1=1449865&r2=1449866&view=diff
==============================================================================
--- incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm (original)
+++ incubator/marmotta/site/trunk/content/markdown/kiwi-triplestore.md.vm Mon Feb 25 20:06:57 2013
@@ -154,3 +154,233 @@ The TransactionListener interface define
   (i.e. you can rely on the data being persistent)
 * _rollback(TransactionData data)_ is called when the transaction is rolled back (e.g. in case of an error)
 
+
+KiWi Triple Table (kiwi-tripletable)
+------------------------------------
+
+The KiWi Triple Table offers efficient Java Collections over OpenRDF Statements. It implements the Java Set interface,
+but offers query support (listing triples with wildcards) with in-memory SPOC and CSPO indexes. This is useful if you
+want to keep large temporary in-memory collections of triples and is e.g. used by the kiwi-transactions module for
+keeping track of added and removed triples in the transaction data. It can also be used for caching purposes.
+
+<h3> Maven Artifact </h3>
+
+The KiWi Triple Table can be used with any OpenRDF repository, it is merely a container for triples. To use the library
+in your own project, add the following Maven dependency to your project:
+
+     <dependency>
+         <groupId>org.apache.marmotta</groupId>
+         <artifactId>kiwi-tripletable</artifactId>
+         <version>${projectVersion}</version>
+     </dependency>
+
+<h3> Code Usage </h3>
+
+As the triple table implements the Set interface, usage is very simple. The following code block illustrates how:
+
+    TripleTable<Statement> triples = new TripleTable<Statement>();
+
+    // add some triples
+    triples.add(valueFactory.createStatement(s1,p1,o1);
+    triples.add(valueFactory.createStatement(s2,p2,o2);
+    ...
+
+    // iterate over all triples
+    for(Statement t : triples) {
+        // do something with t
+    }
+
+    // list only triples with subject s1
+    for(Statement t : triples.listTriples(s1,null,null,null)) {
+        // do something with t
+    }
+
+Note that the KiWi Triple Table does not implement a complete repository and therefore neither offers its own value
+factory nor allows persistence of statements or connection management. In case you need an in-memory repository with
+support for all these features, consider using a OpenRDF memory sail.
+
+
+KiWi Reasoner (kiwi-reasoner)
+-----------------------------
+
+The KiWi reasoner is a powerful and flexible rule-based reasoner that can be used on top of a KiWi Triple Store. Its
+expressivity is more or less the same as Datalog, i.e. it will always terminate and can be evaluated in polynomial
+time (data complexity not taking into account the number of rules). In the context of triple stores, the KiWi
+reasoner can be used to easily implement the implicit semantics of different domain vocabularies. For example, the
+following rule program expresses SKOS semantics:
+
+    @prefix skos: <http://www.w3.org/2004/02/skos/core#>
+
+    ($1 skos:broader $2) -> ($1 skos:broaderTransitive $2)
+    ($1 skos:narrower $2) -> ($1 skos:narrowerTransitive $2)
+
+    ($1 skos:broaderTransitive $2), ($2 skos:broaderTransitive $3) -> ($1 skos:broaderTransitive $3)
+    ($1 skos:narrowerTransitive $2), ($2 skos:narrowerTransitive $3) -> ($1 skos:narrowerTransitive $3)
+
+    ($1 skos:broader $2) -> ($2 skos:narrower $1)
+    ($1 skos:narrower $2) -> ($2 skos:broader $1)
+
+    ($1 skos:broader $2) -> ($1 skos:related $2)
+    ($1 skos:narrower $2) -> ($1 skos:related $2)
+    ($1 skos:related $2) -> ($2 skos:related $1)
+
+Similarly, the reasoner can be used for expressing RDFS subclass and domain inference, as well as a subset of OWL
+semantics (the one that is most interesting :-P ). Beyond RDFS and OWL, it also allows implementing domain-specific
+rule semantics. Additional examples for programs can be found in the source code.
+
+The reasoner is implemented as a incremental forward-chaining reasoner with truth maintenance. In practice, this
+means that:
+
+* incremental reasoning is triggered after a transaction commits successfully; the reasoner will then apply those
+  rules that match with at least one of the newly added triples
+* inferred triples are then materialized in the triple store in the inferred context (see the configuration of
+  the triple store above) and are thus available in the same way as base triples
+* truth maintenance keeps track of the reasons (i.e. rules and triples) why an inferred triple exists; this helps
+  making updates (especially removals of rules and triples) very efficient without requiring to completely
+  recompute all inferred triples
+
+<h3> Maven Artifact </h3>
+
+The KiWi Reasoner can only be used in conjunction with the KiWi Triple Store, because it maintains most of its
+information in the relational database (e.g. the data structures for truth maintenance) and directly translates
+rule body query patterns into SQL. To include it in a project that uses the KiWi Triple Store, add the following
+dependency to your Maven project:
+
+     <dependency>
+         <groupId>org.apache.marmotta</groupId>
+         <artifactId>kiwi-reasoner</artifactId>
+         <version>${projectVersion}</version>
+     </dependency>
+
+
+<h3> Code Usage </h3>
+
+The KiWi Reasoner can be stacked into any sail stack with a transactional sail (see kiwi-transactions) and a
+KiWi Store at its root. The relevant database tables are created automatically when the repository is initialised.
+A simple repository with reasoner is initialized as follows:
+
+    KiWistore = new KiWiStore("test",jdbcUrl,jdbcUser,jdbcPass,dialect, "http://localhost/context/default", "http://localhost/context/inferred");
+    KiWiTransactionalSail tsail = new KiWiTransactionalSail(store);
+    KiWiReasoningSail rsail = new KiWiReasoningSail(tsail, new ReasoningConfiguration());
+    repository = new SailRepository(rsail);
+    repository.initialize();
+
+    // add a reasoning program
+    rsail.addProgram("simple", this.getClass().getResourceAsStream("simple.kwrl"));
+
+    // update an existing reasoning program
+    rsail.updateProgram("simple", ...);
+
+    // run full reasoning (delete all existing inferred triples and re-create them)
+    rsail.reRunPrograms();
+
+The reasoner can have any number of reasoning programs. The concept of a program is merely introduced to group
+different tasks. Internally, all reasoning rules are considered as an unordered collection, regardless which
+program they belong to.
+
+<h3> Performance Considerations </h3>
+
+Even though the reasoner is efficient compared with many other reasoners, there are a number of things to take into
+account, because reasoning is always a potentially expensive operation:
+
+* reasoning will always terminate, but the upper bound for inferred triples is in theory the set of all combinations
+  of nodes occurring in base triples in the database used as subject, predicate, or object, i.e. n^3
+* specific query patterns with many ground values are more efficient than patterns with many variables, as fixed
+  values can considerably reduce the candidate results in the SQL queries while variables are translated into SQL
+  joins
+* re-running a full reasoning can be extremely costly on large databases, so it is better configuring the reasoning
+  programs before importing large datasets (large being in the range of millions of triples)
+* updating a program is more efficient than first deleting the old version and then adding the new version,
+  because the reasoner compares old and new program and only updates the changed rules
+
+In addition, the reasoner is currently executed in a single worker thread. The main reason is that otherwise there
+are potentially many transaction conflicts. We are working on an improved version that could benefit more from
+multi-core processors.
+
+KiWi Versioning (kiwi-versioning)
+---------------------------------
+
+The KiWi Versioning module allows logging of updates to the triple store as well as accessing snapshots of the
+triple store at any given time in history. In many ways, it is similar to the history functionality offered by
+Wiki systems. KiWi Versioning can be useful for many purposes:
+
+* for tracking changes to resources and the whole repository and identifying the source (provenance) of certain
+  triples
+* for creating snapshots of the repository that are "known to be good" and referring to these snapshots later
+  while still updating the repository with new data
+* for more easily reverting errorneous changes to the triple store, in a similar way to a wiki; this can e.g. be
+  used in a "data wiki"
+
+Currently, the KiWi Versioning module allows tracking changes and creating snapshots. Reverting changes has not
+yet been implemented and will be added later (together with support for pruning old versions).
+
+Versioning is tightly bound to the transaction support: a version is more or less the transaction data after
+commit time. This corresponds to the concept of "unit of work": a unit of work is finished when the user
+explicitly commits the transaction (e.g. when the entity has been completely added with all its triples or
+the ontology has been completely imported).
+
+<h3> Maven Artifact </h3>
+
+The KiWi Versioning module can only be used in conjunction with the KiWi Triple Store, because it maintains most of
+its information in the relational database (e.g. the data structures for change tracking). To include it in a
+project that uses the KiWi Triple Store, add the following dependency to your Maven project:
+
+     <dependency>
+         <groupId>org.apache.marmotta</groupId>
+         <artifactId>kiwi-versioning</artifactId>
+         <version>${projectVersion}</version>
+     </dependency>
+
+<h3> Code Usage </h3>
+
+You can use the KiWi Versioning module in your own code in a sail stack with a KiWi transactional sail and a KiWi
+Store at the root. The basic usage is as follows:
+
+    KiWiStore store = new KiWiStore("test",jdbcUrl,jdbcUser,jdbcPass,dialect, "http://localhost/context/default", "http://localhost/context/inferred");
+    KiWiTransactionalSail tsail = new KiWiTransactionalSail(store);
+    KiWiVersioningSail vsail = new KiWiVersioningSail(tsail);
+    repository = new SailRepository(vsail);
+    repository.initialize();
+
+    // do something with the repository (e.g. add data)
+    ...
+
+    // list all versions (note that there are many methods with different parameters, including a subject resource,
+    // a date range, or both)
+    RepositoryResult<Version> versions = vsail.listVersions();
+    try {
+        while(versions.hasNext()) {
+            Version v = version.next();
+
+            // do something with v
+        }
+    } finally {
+        versions.close();
+    }
+
+    // get a snapshot connection for a certain date
+    Date snapshotDate = new Date(...);
+    RepositoryConnection snapshotConnection = vsail.getSnapshot(snapshotDate);
+    try {
+        // access the triples in the snapshot as they were at the time of the snapshot
+        ...
+    } finally {
+        snapshotConnection.close();
+    }
+
+Note that for obvious reasons (you cannot change history!), a snapshot connection is read-only. Accessing any update
+functionality of the connection will throw a RepositoryException. However, you can of course even run SPARQL queries
+over a snapshot connection (SPARQLing the past...).
+
+
+<h3> Performance Considerations </h3>
+
+When versioning is enabled, bear in mind that nothing is ever really deleted in the triple store. Triples that are
+removed in one of the updates are simply marked as "deleted" and added to the version information
+for removed triples.
+
+Otherwise there is no considerable performance impact. Accessing snapshots at any date is essentially as efficient
+as any ordinary triple access (but it does not do triple caching).
+
+
+