You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2015/03/25 17:00:11 UTC

svn commit: r1669138 - /jena/site/trunk/content/documentation/query/text-query.mdtext

Author: andy
Date: Wed Mar 25 16:00:11 2015
New Revision: 1669138

URL: http://svn.apache.org/r1669138
Log:
JENA-686 documentation

Modified:
    jena/site/trunk/content/documentation/query/text-query.mdtext

Modified: jena/site/trunk/content/documentation/query/text-query.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/query/text-query.mdtext?rev=1669138&r1=1669137&r2=1669138&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/query/text-query.mdtext (original)
+++ jena/site/trunk/content/documentation/query/text-query.mdtext Wed Mar 25 16:00:11 2015
@@ -43,6 +43,7 @@ the actual label.  More details are give
 - [Working with Fuseki](#working-with-fuseki)
 - [Building a Text Index](#building-a-text-index)
 - [Deletion of Indexed Entities](#deletion-of-indexed-entities)
+- [Configuring Alternative TextDocProducers](#configuring-alternative-textdocproducers)
 - [Maven Dependency](#maven-dependency)
 
 ## Architecture
@@ -418,6 +419,72 @@ By only indexing but not storing the lit
 It may be necessary to periodically rebuild the index if a large proportion
 of the RDF data changes.
 
+# Configuring Alternative TextDocProducers
+
+The default behaviour when text indexing is to index a single
+property as a single field, generating a different `Document` 
+for each indexed triple. To change this behaviour requires 
+writing and configuring an alternative `TextDocProducer`.
+
+To configure a `TextDocProducer`, say `net.code.MyProducer` in a dataset assembly,
+use the property `textDocProducer`, eg:
+
+	<#ds-with-lucene> rdf:type text:TextDataset;
+		text:index <#indexLucene> ;
+		text:dataset <#ds> ;
+		text:textDocProducer <java:net.code.MyProducer> ;
+		.
+
+where `CLASSNAME` is the full java class name. It must have either
+a single-argument constructor of type `TextIndex`, or a two-argument
+constructor `(DatasetGraph, TextIndex)`. The `TextIndex` argument
+will be the configured text index, and the `DatasetGraph` argument
+will be the graph of the configured dataset.
+
+For example, to explicitly create the default `TextDocProducer` use:
+
+	...
+	    text:textDocProducer <java:org.apache.jena.query.text.TextDocProducerTriples> ;
+	...
+
+`TextDocProducerTriples` produces a new document for each subject/field
+added to the dataset, using `TextIndex.addEntity(Entity)`. 
+
+## Example 
+
+The example class below is a `TextDocProducer` that only indexes
+`ADD`s of quads for which the subject already had at least one
+property-value. It uses the two-argument constructor to give it
+access to the dataset so that it count the `(?G, S, P, ?O)` quads
+with that subject and predicate, and delegates the indexing to
+`TextDocProducerTriples` if there are at least two values for
+that property (one of those values, of course, is the one that
+gives rise to this `change()`).
+
+      public class Example extends TextDocProducerTriples {
+      
+          final DatasetGraph dg;
+          
+          public Example(DatasetGraph dg, TextIndex indexer) {
+              super(indexer);
+              this.dg = dg;
+          }
+          
+          public void change(QuadAction qaction, Node g, Node s, Node p, Node o) {
+              if (qaction == QuadAction.ADD) {
+                  if (alreadyHasOne(s, p)) super.change(qaction, g, s, p, o);
+              }
+          }
+      
+          private boolean alreadyHasOne(Node s, Node p) {
+              int count = 0;
+              Iterator<Quad> quads = dg.find( null, s, p, null );
+              while (quads.hasNext()) { quads.next(); count += 1; }
+              return count > 1;
+          }
+      }
+
+
 ## Maven Dependency
 
 The <code>jena-text</code> module is included in Fuseki.  To use it within application code,
@@ -430,4 +497,4 @@ then use the following maven dependency:
     </dependency>
 
 adjusting the version <code>X.Y.Z</code> as necessary.  This will automatically
-include a compatible version of Lucene and the Solr java client, but not Solr server.
\ No newline at end of file
+include a compatible version of Lucene and the Solr java client, but not Solr server.