You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by bu...@apache.org on 2018/06/30 16:19:58 UTC

svn commit: r1031934 - in /websites/staging/jena/trunk/content: ./ documentation/query/text-query.html

Author: buildbot
Date: Sat Jun 30 16:19:58 2018
New Revision: 1031934

Log:
Staging update by buildbot for jena

Modified:
    websites/staging/jena/trunk/content/   (props changed)
    websites/staging/jena/trunk/content/documentation/query/text-query.html

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Jun 30 16:19:58 2018
@@ -1 +1 @@
-1834748
+1834749

Modified: websites/staging/jena/trunk/content/documentation/query/text-query.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/query/text-query.html (original)
+++ websites/staging/jena/trunk/content/documentation/query/text-query.html Sat Jun 30 16:19:58 2018
@@ -1531,9 +1531,9 @@ indexing and search.</p>
 </ul>
 <p>The first situation arises when entering triples that include languages with multiple encodings that for various reasons are not normalized to a single encoding. In this situation it is helpful to be able to retrieve appropriate result sets without regard for the encodings used at the time that the triples were inserted into the dataset.</p>
 <p>There are several such languages of interest: Chinese, Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and ideographic variants.</p>
-<p>Encodings may not normalized when inserting triples for a variety of reasons. A principle one is that the <code>rdf:langString</code> object often must be entered in the same encoding that it occurs in some physical text that is being catalogued. Another is that metadata may be imported from sources that use different encoding conventions and it is desireable to preserve the original form.</p>
+<p>Encodings may not be normalized when inserting triples for a variety of reasons. A principle one is that the <code>rdf:langString</code> object often must be entered in the same encoding that it occurs in some physical text that is being catalogued. Another is that metadata may be imported from sources that use different encoding conventions and it is desireable to preserve the original form.</p>
 <p>The second situation arises to provide simple support for phonetic or other forms of lossy search at the time that triples are indexed directly in the Lucene system.</p>
-<p>To handle the first situation a <code>text</code> assembler predicate, <code>text:searchFor</code>, is introduced that specifies a list of language tags that provides a list of language variants that should be searched whenever a query string of a given encoding (language tag) is used. For example, the following <code>text:TextIndexLucene/text:defineAnalyzers</code> fragment :</p>
+<p>To handle the first situation a <code>text</code> assembler predicate, <code>text:searchFor</code>, is introduced that specifies a list of language tags that provides a list of language variants that should be searched whenever a query string of a given encoding (language tag) is used. For example, the following <code>text:defineAnalyzers</code> fragment :</p>
 <div class="codehilite"><pre>    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">addLang</span> &quot;<span class="n">bo</span>&quot; <span class="p">;</span> 
       <span class="n">text</span><span class="p">:</span><span class="n">searchFor</span> <span class="p">(</span> &quot;<span class="n">bo</span>&quot; &quot;<span class="n">bo</span><span class="o">-</span><span class="n">x</span><span class="o">-</span><span class="n">ewts</span>&quot; &quot;<span class="n">bo</span><span class="o">-</span><span class="n">alalc97</span>&quot; <span class="p">)</span> <span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span class="n">analyzer</span> <span class="p">[</span> 
@@ -1575,8 +1575,46 @@ indexing and search.</p>
 
 <p>which reflects the underlying Tibetan Unicode term encoding. During <code>IndexSearcher.search</code> all documents with one of the three fields in the index for term, "རྗེ", will be returned even though the value in the fields <code>label_bo-x-ewts</code> and <code>label_bo-alalc97</code> for the returned documents will be the original value "rje".</p>
 <p>This support simplifies applications by permitting encoding independent retrieval without additional layers of transcoding and so on. It's all done under the covers in Lucene.</p>
-<p>Solving the second situation simplifies applications by adding appropriate fields and indexing via configuration in the <code>text:TextIndexLucene/text:defineAnalyzers</code>. For example, the following fragment</p>
-<div class="codehilite"><pre>    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">addLang</span> &quot;<span class="n">zh</span><span class="o">-</span><span class="n">hans</span>&quot; <span class="p">;</span> 
+<p>Solving the second situation simplifies applications by adding appropriate fields and indexing via configuration in the <code>text:defineAnalyzers</code>. For example, the following fragment:</p>
+<div class="codehilite"><pre>    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">defineAnalyzer</span> <span class="p">:</span><span class="n">hanzAnalyzer</span> <span class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span class="p">.</span><span class="n">zh</span><span class="p">.</span><span class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> &quot;<span class="n">TC2SC</span>&quot; <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">stopwords</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> <span class="n">false</span> <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">filterChars</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> 0 <span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>  
+    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">defineAnalyzer</span> <span class="p">:</span><span class="n">han2pinyin</span> <span class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span class="p">.</span><span class="n">zh</span><span class="p">.</span><span class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> &quot;<span class="n">TC2PYstrict</span>&quot; <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">stopwords</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> <span class="n">false</span> <span class="p">]</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">filterChars</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> 0 <span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>
+    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">defineAnalyzer</span> <span class="p">:</span><span class="n">pinyin</span> <span class="p">;</span> 
+      <span class="n">text</span><span class="p">:</span><span class="n">analyzer</span> <span class="p">[</span> 
+        <span class="n">a</span> <span class="n">text</span><span class="p">:</span><span class="n">GenericAnalyzer</span> <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">class</span> &quot;<span class="n">io</span><span class="p">.</span><span class="n">bdrc</span><span class="p">.</span><span class="n">lucene</span><span class="p">.</span><span class="n">zh</span><span class="p">.</span><span class="n">ChineseAnalyzer</span>&quot; <span class="p">;</span>
+        <span class="n">text</span><span class="p">:</span><span class="n">params</span> <span class="p">(</span>
+            <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">paramName</span> &quot;<span class="n">profile</span>&quot; <span class="p">;</span>
+              <span class="n">text</span><span class="p">:</span><span class="n">paramValue</span> &quot;<span class="n">PYstrict</span>&quot; <span class="p">]</span>
+            <span class="p">)</span>
+        <span class="p">]</span> <span class="p">;</span> 
+      <span class="p">]</span>
+    <span class="p">[</span> <span class="n">text</span><span class="p">:</span><span class="n">addLang</span> &quot;<span class="n">zh</span><span class="o">-</span><span class="n">hans</span>&quot; <span class="p">;</span> 
       <span class="n">text</span><span class="p">:</span><span class="n">searchFor</span> <span class="p">(</span> &quot;<span class="n">zh</span><span class="o">-</span><span class="n">hans</span>&quot; &quot;<span class="n">zh</span><span class="o">-</span><span class="n">hant</span>&quot; <span class="p">)</span> <span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span class="n">auxIndex</span> <span class="p">(</span> &quot;<span class="n">zh</span><span class="o">-</span><span class="n">aux</span><span class="o">-</span><span class="n">han2pinyin</span>&quot; <span class="p">)</span> <span class="p">;</span>
       <span class="n">text</span><span class="p">:</span><span class="n">analyzer</span> <span class="p">[</span>