You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@marmotta.apache.org by ss...@apache.org on 2013/02/28 16:16:17 UTC

svn commit: r1451227 - in /incubator/marmotta/site/trunk/content/markdown/ldcache: backends.md.vm usage.md.vm

Author: sschaffert
Date: Thu Feb 28 15:16:16 2013
New Revision: 1451227

URL: http://svn.apache.org/r1451227
Log:
added backend documentation for LDCache

Modified:
    incubator/marmotta/site/trunk/content/markdown/ldcache/backends.md.vm
    incubator/marmotta/site/trunk/content/markdown/ldcache/usage.md.vm

Modified: incubator/marmotta/site/trunk/content/markdown/ldcache/backends.md.vm
URL: http://svn.apache.org/viewvc/incubator/marmotta/site/trunk/content/markdown/ldcache/backends.md.vm?rev=1451227&r1=1451226&r2=1451227&view=diff
==============================================================================
--- incubator/marmotta/site/trunk/content/markdown/ldcache/backends.md.vm (original)
+++ incubator/marmotta/site/trunk/content/markdown/ldcache/backends.md.vm Thu Feb 28 15:16:16 2013
@@ -1,3 +1,82 @@
 # LDCache Backends
 
-@TODO@
\ No newline at end of file
+LDCache implements a modular architecture which allows using different kinds of backends for storing cache entries.
+Some of these backends are purely in-memory (i.e. don't survive a restart and might even expire when memory runs out),
+others are file-based or even database-backed. In principle, an LDCache backend needs to be able to store the following
+two kinds of data:
+
+* **cached triples**: the triples retrieved from a remote Linked Data resource; triples can be stored either per-resource
+  or in a common triple store; the way the backend solves it is up to the developer; on the interface level, developers
+  always have to access the cached triples on a per-resource level (using the `getCacheConnection(resource)` method).
+* **caching metadata**: information about the caching process itself, most importantly expiry information and the date
+  when the resource has last been retrieved
+
+The following sections describe the LDCache backends that are currently available or will be available in the near future.
+
+KiWi Backend
+------------
+
+The KiWi backend for LDCache relies on an underlying KiWi triple store to store caching information. It will use the
+KiWi store's JDBC connection to add additional tables and information to the database. The two kinds of caching data
+are stored as follows:
+
+* **cached triples** are stored together with all other triples in the KiWi triple store, but in a separate context
+  (named graph) to be able to distinguish them from local triples; the advantage of this approach is that it allows
+  storing all data in one place and makes the implementation of connection wrappers trivial and efficient
+* **caching metadata** is stored in an additional table in the database; the table is created when the backend is
+  initialized; each time a resource is refreshed, its entry in the table is updated
+
+The KiWi backend is the backend you should choose when you are using a KiWi triple store, except if you want to
+completely keep apart your local and your cached data. You can include the KiWi backend in your project using the
+following Maven artifact:
+
+    <dependency>
+        <groupId>org.apache.marmotta</groupId>
+        <artifactId>ldcache-backend-kiwi</artifactId>
+        <version>${projectVersion}</version>
+    </dependency>
+
+Setting up an LDCache instance with a KiWi backend requires the following configuration steps:
+
+    KiWiStore store = new KiWiStore("test",jdbcUrl,jdbcUser,jdbcPass,dialect, "http://localhost/context/default", "http://localhost/context/inferred");
+    Repository repository = new SailRepository(store);
+    repository.initialize();
+
+    LDCachingBackend backend = new LDCachingKiWiBackend(store, CACHE_CONTEXT);
+    backend.initialize();
+
+    LDCache ldcache = new LDCache(new CacheConfiguration(),backend);
+
+Note that the underlying KiWi repository must be initialized before using it in the LDCachingKiWiBackend, because
+otherwise the necessary database tables might not be present. The argument `CACHE_CONTEXT` is the URI of the resource
+to use as context (named graph) for storing and accessing cached triples.
+
+
+EHCache Backend (under development)
+-----------------------------------
+
+The EHCache backend for LDCache relies on an [EHCache](http://www.ehcache.org) caching infrastructure for storing
+caching information. When using the Open Source version of EHCache, this usually means in-memory caching only. However,
+for enterprise systems it is also possible to build a high-performance caching cluster that can be used by the LDCache
+backend (please refer to the EHCache documentation on how to set this up).
+
+In the EHCache backend, both the caching metadata and the cached triples for a resource are stored in the same cache
+entry. To allow EHCache to serialize cache entries for distribution over the cluster or for swapping to disk, the
+triples are represented in a serializable in-memory representation.
+
+**Note:** the EHCache backend is currently still under development. We therefore don't publish Maven artifacts for it
+yet.
+
+
+MapDB Backend (under development)
+---------------------------------
+
+The MapDB backend uses the embedded NoSQL database [MapDB](http://www.mapdb.org) (formerly known as JDBM) for storing
+cache information in a persistent disk-based hash map.
+
+In the MapDB backend, both the caching metadata and the cached triples for a resource are stored in the same cache
+entry. To allow MapDB to serialize cache entries when persisting the hash map to disk, the triples are represented in a
+serializable in-memory representation.
+
+**Note:** the MapDB backend is currently still under development. We therefore don't publish Maven artifacts for it
+yet.

Modified: incubator/marmotta/site/trunk/content/markdown/ldcache/usage.md.vm
URL: http://svn.apache.org/viewvc/incubator/marmotta/site/trunk/content/markdown/ldcache/usage.md.vm?rev=1451227&r1=1451226&r2=1451227&view=diff
==============================================================================
--- incubator/marmotta/site/trunk/content/markdown/ldcache/usage.md.vm (original)
+++ incubator/marmotta/site/trunk/content/markdown/ldcache/usage.md.vm Thu Feb 28 15:16:16 2013
@@ -18,6 +18,10 @@ To use the Linked Data Cache in your own
         <version>${projectVersion}</version>
     </dependency>
 
+Since LDCache is internally using LDClient to retrieve resource data, you should also add the LDClient backends you
+need. Please refer to the [LDClient documentation](../ldclient/dataproviders.html) to see the list of available modules
+and how to configure and use them.
+
 In order to use the Linked Data Cache, you'll also need to add at least one caching backend to the project. Assuming
 you are using the KiWi triple store, you would add the KiWi caching backend as follows:
 
@@ -48,10 +52,96 @@ connection wrappers in your project. If 
 Retrieving Cache Entries
 ------------------------
 
+Retrieving Linked Data resources through the cache works almost exactly like retrieving them through a raw LDClient
+instance. To initialise a LDCache instance, you use the following basic procedure:
+
+    CacheConfiguration config = new CacheConfiguration();
+
+    LDCache ldcache = new LDCache(config,backend);
+
+    // do stuff
+
+    ldcache.shutdown();
+
+The CacheConfiguration consists of a LDClient configuration and some additional configuration values that are relevant
+for caching only (currently only the default expiry time in case none is given in the response). The `backend` parameter
+is an instance of LDCachingBackend and depends on the actual backend implementation used (see [backends](backends.html)).
+
+To retrieve a resource into the cache, you would add the following statement:
 
+    ldcache.refreshResource(resource, false);
+
+The first argument is the Sesame URI of the resource you want to refresh. The second argument is a boolean value
+indicating whether you want to force the refresh or acknowledge the expiry time (i.e. not perform a refresh if the
+resource is not yet expired).
+
+In order to access the cached triple content of a resource, you can request a Sesame RepositoryConnection to the
+cached content as follows:
+
+    LDCachingConnection con = ldcache.getCacheConnection(resource_uri);
+
+LDCachingConnection is a standard RepositoryConnection with some additional methods for getting cache information. Note
+that the repository might also be shared between many resources, so when querying for cached triples, set the subject
+parameter to the requested resource.
+
+In addition to the basic functionality, LDCache offers a number of additional methods to support typical operations. The
+most important methods are:
+
+* `listCacheEntries()` returns a (lazy, closeable) iterator over all cache entries managed by the LDCache instance
+* `listExpiredEntries()` returns a (lazy, closeable) iterator over all **expired** cache entries managed by the LDCache
+  instance
+* `expire(URI resource)` forces the expiry of the cache entry for the resource given as argument
+* `expireAll()` forces the expiry of all cache entries managed by the LDCache instance
+* `refreshExpired()` updates all expired entries with the latest content from the source
 
 
 
 Transparent Linked Data Access
 ------------------------------
 
+The probably most attractive feature of LDCache is actually transparent Linked Data access. It lets you access Linked
+Data resources through a Sesame RepositoryConnection as if they were stored in a local triple store. This essentially
+gives you a Sesame Repository view of the Linked Data Cloud. To make use of this feature, you simply add one of the
+LDCache sail connection wrappers to your sail stack, e.g. as follows (example for the KiWi LDcache sail, see
+[backends](backends.html) for more details):
+
+    // create LDClient client configuration
+    ClientConfiguration config = new ClientConfiguration();
+
+    // add LDClient endpoints if needed
+    config.addEndpoint(...);
+
+    // configure a filter to indicate which resources are considered as "remote
+    ResourceFilter cacheFilter = new UriPrefixFilter("http://remote/");
+
+    KiWiStore store = new KiWiStore("test",jdbcUrl,jdbcUser,jdbcPass,dialect, "http://localhost/context/default", "http://localhost/context/inferred");
+    KiWiLinkedDataSail lsail = new KiWiLinkedDataSail(store,cacheFilter,CACHE_CONTEXT, config);
+
+    Repository repository = new SailRepository(lsail);
+    repository.initialize();
+
+    RepositoryConnection con = repository.getConnection();
+    try {
+        URI subject = repository.getValueFactory().createURI("http://remote/testresource");
+
+        // transparently access the triples of "subject"
+        RepositoryResult<Statement> triples = con.getStatements(subject,null,null,true);
+        while(triples.hasNext()) {
+            Statement t = triples.next();
+
+            // do something
+            ...
+        }
+        triples.close();
+    } finally {
+        con.close();
+    }
+
+    repository.shutdown();
+
+
+Obviously, there are some restrictions, the most important being that you cannot use wildcards on the subjects of
+triple queries. So, sorry folks, no full SPARQL over the Linked Data Cloud; however, in combination with the
+[LDPath](../ldpath/language.html) query language, transparent caching gives you a very powerful tool for accessing
+Linked Data resources.
+