You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by bu...@apache.org on 2015/03/25 17:00:32 UTC

svn commit: r945028 - in /websites/staging/jena/trunk/content: ./ documentation/query/text-query.html

Author: buildbot
Date: Wed Mar 25 16:00:31 2015
New Revision: 945028

Log:
Staging update by buildbot for jena

Modified:
    websites/staging/jena/trunk/content/   (props changed)
    websites/staging/jena/trunk/content/documentation/query/text-query.html

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Mar 25 16:00:31 2015
@@ -1 +1 @@
-1668256
+1669138

Modified: websites/staging/jena/trunk/content/documentation/query/text-query.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/query/text-query.html (original)
+++ websites/staging/jena/trunk/content/documentation/query/text-query.html Wed Mar 25 16:00:31 2015
@@ -183,6 +183,7 @@ the actual label.  More details are give
 <li><a href="#working-with-fuseki">Working with Fuseki</a></li>
 <li><a href="#building-a-text-index">Building a Text Index</a></li>
 <li><a href="#deletion-of-indexed-entities">Deletion of Indexed Entities</a></li>
+<li><a href="#configuring-alternative-textdocproducers">Configuring Alternative TextDocProducers</a></li>
 <li><a href="#maven-dependency">Maven Dependency</a></li>
 </ul>
 <h2 id="architecture">Architecture</h2>
@@ -197,11 +198,11 @@ or
 properties work with.  When data is added, any properties matching the
 description cause an entry to be added from analysed text from the triple
 object and mapping to the subject.</p>
-<h3 id="pattern-a-wzxhzdk19-rdf-data">Pattern A &ndash; RDF data</h3>
+<h3 id="pattern-a-wzxhzdk22-rdf-data">Pattern A &ndash; RDF data</h3>
 <p>In this pattern, the data in the text index is indexing literals in the RDF data.<br />
 Additions to the RDF data are reflected in additions to the index.</p>
 <p>(Deletes do not remove text index entries - <a href="#deletion-of-indexed-entities">see below</a>)</p>
-<h3 id="pattern-b-wzxhzdk20-external-content">Pattern B &ndash; External content</h3>
+<h3 id="pattern-b-wzxhzdk23-external-content">Pattern B &ndash; External content</h3>
 <p>There is no requirement that the text data indexed is present in the RDF
 data.  As long as the index contains the index text documents to match the
 index description, then text search can be performed.</p>
@@ -264,7 +265,7 @@ surrounding <code>( )</code> can be omit
 <h3 id="good-practice">Good practice</h3>
 <p>The query execution does not know the selectivity of the text index.  It is
 better to use one of two styles.</p>
-<h4 id="query-pattern-1-wzxhzdk21-find-in-the-text-index-and-enhance-results">Query pattern 1 &ndash; Find in the text index and enhance results</h4>
+<h4 id="query-pattern-1-wzxhzdk24-find-in-the-text-index-and-enhance-results">Query pattern 1 &ndash; Find in the text index and enhance results</h4>
 <p>Access to the index is first in the query and used to find a number of
 items of interest; further information is obtained about these items from
 the RDF data.</p>
@@ -278,7 +279,7 @@ the RDF data.</p>
 
 <p>Limit is useful here when working with large indexes to limit results to the
 more higher scoring results.</p>
-<h4 id="query-pattern-2-wzxhzdk22-filter">Query pattern 2 &ndash; Filter</h4>
+<h4 id="query-pattern-2-wzxhzdk25-filter">Query pattern 2 &ndash; Filter</h4>
 <p>By finding items of interest first in the RDF data, the text search can be
 used to restrict the items found still further.</p>
 <div class="codehilite"><pre><span class="n">SELECT</span> ?<span class="n">s</span>
@@ -538,6 +539,69 @@ does, returning the whole label.</p>
 <p>By only indexing but not storing the literals themselves, the index is kept smaller.
 It may be necessary to periodically rebuild the index if a large proportion
 of the RDF data changes.</p>
+<h1 id="configuring-alternative-textdocproducers">Configuring Alternative TextDocProducers</h1>
+<p>The default behaviour when text indexing is to index a single
+property as a single field, generating a different <code>Document</code> 
+for each indexed triple. To change this behaviour requires 
+writing and configuring an alternative <code>TextDocProducer</code>.</p>
+<p>To configure a <code>TextDocProducer</code>, say <code>net.code.MyProducer</code> in a dataset assembly,
+use the property <code>textDocProducer</code>, eg:</p>
+<div class="codehilite"><pre><span class="o">&lt;</span>#<span class="n">ds</span><span class="o">-</span><span class="n">with</span><span class="o">-</span><span class="n">lucene</span><span class="o">&gt;</span> <span class="n">rdf</span><span class="p">:</span><span class="n">type</span> <span class="n">text</span><span class="p">:</span><span class="n">TextDataset</span><span class="p">;</span>
+    <span class="n">text</span><span class="p">:</span><span class="n">index</span> <span class="o">&lt;</span>#<span class="n">indexLucene</span><span class="o">&gt;</span> <span class="p">;</span>
+    <span class="n">text</span><span class="p">:</span><span class="n">dataset</span> <span class="o">&lt;</span>#<span class="n">ds</span><span class="o">&gt;</span> <span class="p">;</span>
+    <span class="n">text</span><span class="p">:</span><span class="n">textDocProducer</span> <span class="o">&lt;</span><span class="n">java</span><span class="p">:</span><span class="n">net</span><span class="p">.</span><span class="n">code</span><span class="p">.</span><span class="n">MyProducer</span><span class="o">&gt;</span> <span class="p">;</span>
+    <span class="p">.</span>
+</pre></div>
+
+
+<p>where <code>CLASSNAME</code> is the full java class name. It must have either
+a single-argument constructor of type <code>TextIndex</code>, or a two-argument
+constructor <code>(DatasetGraph, TextIndex)</code>. The <code>TextIndex</code> argument
+will be the configured text index, and the <code>DatasetGraph</code> argument
+will be the graph of the configured dataset.</p>
+<p>For example, to explicitly create the default <code>TextDocProducer</code> use:</p>
+<div class="codehilite"><pre><span class="p">...</span>
+    <span class="n">text</span><span class="p">:</span><span class="n">textDocProducer</span> <span class="o">&lt;</span><span class="n">java</span><span class="p">:</span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">jena</span><span class="p">.</span><span class="n">query</span><span class="p">.</span><span class="n">text</span><span class="p">.</span><span class="n">TextDocProducerTriples</span><span class="o">&gt;</span> <span class="p">;</span>
+<span class="p">...</span>
+</pre></div>
+
+
+<p><code>TextDocProducerTriples</code> produces a new document for each subject/field
+added to the dataset, using <code>TextIndex.addEntity(Entity)</code>. </p>
+<h2 id="example">Example</h2>
+<p>The example class below is a <code>TextDocProducer</code> that only indexes
+<code>ADD</code>s of quads for which the subject already had at least one
+property-value. It uses the two-argument constructor to give it
+access to the dataset so that it count the <code>(?G, S, P, ?O)</code> quads
+with that subject and predicate, and delegates the indexing to
+<code>TextDocProducerTriples</code> if there are at least two values for
+that property (one of those values, of course, is the one that
+gives rise to this <code>change()</code>).</p>
+<div class="codehilite"><pre>  <span class="n">public</span> <span class="n">class</span> <span class="n">Example</span> <span class="n">extends</span> <span class="n">TextDocProducerTriples</span> <span class="p">{</span>
+
+      <span class="n">final</span> <span class="n">DatasetGraph</span> <span class="n">dg</span><span class="p">;</span>
+
+      <span class="n">public</span> <span class="n">Example</span><span class="p">(</span><span class="n">DatasetGraph</span> <span class="n">dg</span><span class="p">,</span> <span class="n">TextIndex</span> <span class="n">indexer</span><span class="p">)</span> <span class="p">{</span>
+          <span class="n">super</span><span class="p">(</span><span class="n">indexer</span><span class="p">);</span>
+          <span class="n">this</span><span class="p">.</span><span class="n">dg</span> <span class="p">=</span> <span class="n">dg</span><span class="p">;</span>
+      <span class="p">}</span>
+
+      <span class="n">public</span> <span class="n">void</span> <span class="n">change</span><span class="p">(</span><span class="n">QuadAction</span> <span class="n">qaction</span><span class="p">,</span> <span class="n">Node</span> <span class="n">g</span><span class="p">,</span> <span class="n">Node</span> <span class="n">s</span><span class="p">,</span> <span class="n">Node</span> <span class="n">p</span><span class="p">,</span> <span class="n">Node</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
+          <span class="k">if</span> <span class="p">(</span><span class="n">qaction</span> <span class="o">==</span> <span class="n">QuadAction</span><span class="p">.</span><span class="n">ADD</span><span class="p">)</span> <span class="p">{</span>
+              <span class="k">if</span> <span class="p">(</span><span class="n">alreadyHasOne</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">p</span><span class="p">))</span> <span class="n">super</span><span class="p">.</span><span class="n">change</span><span class="p">(</span><span class="n">qaction</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">o</span><span class="p">);</span>
+          <span class="p">}</span>
+      <span class="p">}</span>
+
+      <span class="n">private</span> <span class="n">boolean</span> <span class="n">alreadyHasOne</span><span class="p">(</span><span class="n">Node</span> <span class="n">s</span><span class="p">,</span> <span class="n">Node</span> <span class="n">p</span><span class="p">)</span> <span class="p">{</span>
+          <span class="n">int</span> <span class="n">count</span> <span class="p">=</span> 0<span class="p">;</span>
+          <span class="n">Iterator</span><span class="o">&lt;</span><span class="n">Quad</span><span class="o">&gt;</span> <span class="n">quads</span> <span class="p">=</span> <span class="n">dg</span><span class="p">.</span><span class="nb">find</span><span class="p">(</span> <span class="n">null</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">null</span> <span class="p">);</span>
+          <span class="k">while</span> <span class="p">(</span><span class="n">quads</span><span class="p">.</span><span class="n">hasNext</span><span class="p">())</span> <span class="p">{</span> <span class="n">quads</span><span class="p">.</span><span class="n">next</span><span class="p">();</span> <span class="n">count</span> <span class="o">+</span><span class="p">=</span> 1<span class="p">;</span> <span class="p">}</span>
+          <span class="k">return</span> <span class="n">count</span> <span class="o">&gt;</span> 1<span class="p">;</span>
+      <span class="p">}</span>
+  <span class="p">}</span>
+</pre></div>
+
+
 <h2 id="maven-dependency">Maven Dependency</h2>
 <p>The <code>jena-text</code> module is included in Fuseki.  To use it within application code,
 then use the following maven dependency:</p>