You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2015/03/25 17:00:11 UTC
svn commit: r1669138 -
/jena/site/trunk/content/documentation/query/text-query.mdtext
Author: andy
Date: Wed Mar 25 16:00:11 2015
New Revision: 1669138
URL: http://svn.apache.org/r1669138
Log:
JENA-686 documentation
Modified:
jena/site/trunk/content/documentation/query/text-query.mdtext
Modified: jena/site/trunk/content/documentation/query/text-query.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/query/text-query.mdtext?rev=1669138&r1=1669137&r2=1669138&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/query/text-query.mdtext (original)
+++ jena/site/trunk/content/documentation/query/text-query.mdtext Wed Mar 25 16:00:11 2015
@@ -43,6 +43,7 @@ the actual label. More details are give
- [Working with Fuseki](#working-with-fuseki)
- [Building a Text Index](#building-a-text-index)
- [Deletion of Indexed Entities](#deletion-of-indexed-entities)
+- [Configuring Alternative TextDocProducers](#configuring-alternative-textdocproducers)
- [Maven Dependency](#maven-dependency)
## Architecture
@@ -418,6 +419,72 @@ By only indexing but not storing the lit
It may be necessary to periodically rebuild the index if a large proportion
of the RDF data changes.
+# Configuring Alternative TextDocProducers
+
+The default behaviour when text indexing is to index a single
+property as a single field, generating a different `Document`
+for each indexed triple. To change this behaviour requires
+writing and configuring an alternative `TextDocProducer`.
+
+To configure a `TextDocProducer`, say `net.code.MyProducer` in a dataset assembly,
+use the property `textDocProducer`, eg:
+
+ <#ds-with-lucene> rdf:type text:TextDataset;
+ text:index <#indexLucene> ;
+ text:dataset <#ds> ;
+ text:textDocProducer <java:net.code.MyProducer> ;
+ .
+
+where `CLASSNAME` is the full java class name. It must have either
+a single-argument constructor of type `TextIndex`, or a two-argument
+constructor `(DatasetGraph, TextIndex)`. The `TextIndex` argument
+will be the configured text index, and the `DatasetGraph` argument
+will be the graph of the configured dataset.
+
+For example, to explicitly create the default `TextDocProducer` use:
+
+ ...
+ text:textDocProducer <java:org.apache.jena.query.text.TextDocProducerTriples> ;
+ ...
+
+`TextDocProducerTriples` produces a new document for each subject/field
+added to the dataset, using `TextIndex.addEntity(Entity)`.
+
+## Example
+
+The example class below is a `TextDocProducer` that only indexes
+`ADD`s of quads for which the subject already had at least one
+property-value. It uses the two-argument constructor to give it
+access to the dataset so that it count the `(?G, S, P, ?O)` quads
+with that subject and predicate, and delegates the indexing to
+`TextDocProducerTriples` if there are at least two values for
+that property (one of those values, of course, is the one that
+gives rise to this `change()`).
+
+ public class Example extends TextDocProducerTriples {
+
+ final DatasetGraph dg;
+
+ public Example(DatasetGraph dg, TextIndex indexer) {
+ super(indexer);
+ this.dg = dg;
+ }
+
+ public void change(QuadAction qaction, Node g, Node s, Node p, Node o) {
+ if (qaction == QuadAction.ADD) {
+ if (alreadyHasOne(s, p)) super.change(qaction, g, s, p, o);
+ }
+ }
+
+ private boolean alreadyHasOne(Node s, Node p) {
+ int count = 0;
+ Iterator<Quad> quads = dg.find( null, s, p, null );
+ while (quads.hasNext()) { quads.next(); count += 1; }
+ return count > 1;
+ }
+ }
+
+
## Maven Dependency
The <code>jena-text</code> module is included in Fuseki. To use it within application code,
@@ -430,4 +497,4 @@ then use the following maven dependency:
</dependency>
adjusting the version <code>X.Y.Z</code> as necessary. This will automatically
-include a compatible version of Lucene and the Solr java client, but not Solr server.
\ No newline at end of file
+include a compatible version of Lucene and the Solr java client, but not Solr server.