You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Anonymous CMS User <an...@apache.org> on 2017/05/08 13:57:31 UTC

CMS diff: Text searches with SPARQL

Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext



Index: trunk/content/documentation/query/text-query.mdtext
===================================================================
--- trunk/content/documentation/query/text-query.mdtext	(revision 1655891)
+++ trunk/content/documentation/query/text-query.mdtext	(working copy)
@@ -9,7 +9,7 @@
 accessing the RDF graph.
 
 The text index can be either [Apache Lucene](http://lucene.apache.org/core) for a
-same-machine text index, or [Apache Solr](http://lucene.apache.org/solr/)
+same-machine text index, or [ElasticSearch](https://www.elastic.co/)
 for a large scale enterprise search application.
 
 Some example code is [available here](https://github.com/apache/jena/tree/master/jena-text/src/main/java/examples/).
@@ -54,7 +54,7 @@
 The text index uses the native query language of the index:
 [Lucene query format](http://lucene.apache.org/core/4_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description)
 or
-[Solr query format](http://wiki.apache.org/solr/SolrQuerySyntax).
+[Elasticsearch query format](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl.html).
 
 A text-supporting dataset is configured with a description of which
 properties work with.  When data is added, any properties matching the
@@ -83,7 +83,7 @@
 
 ### External applications
 
-By using Solr, in either pattern A (RDF data indexed) or pattern B
+By using ElasticSearch, in either pattern A (RDF data indexed) or pattern B
 (external content indexed), other applications can share the
 text index with SPARQL search.
 
@@ -156,11 +156,11 @@
 [Jena assembler description](../assembler/index.html).  Configurations can
 also be built with code. The assembler describes a 'text
 dataset' which has an underlying RDF dataset and a text index. The text
-index describes the text index technology (Lucene or Solr) and the details
+index describes the text index technology (Lucene or ElasticSearch) and the details
 needed for for each.
 
 A text index has an "entity map" which defines the properties to
-index, the name of the lucene/solr field and field used for storing the URI
+index, the name of the lucene/elasticsearch field and field used for storing the URI
 itself.
 
 For common RDF use, there will be one field, mapping a property to a text
@@ -193,8 +193,8 @@
     text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
     # Lucene index
     text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
-    # Solr index
-    text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .
+    # ElasticSearch index
+    text:TextIndexES    rdfs:subClassOf   text:TextIndex .
 
     ## ---------------------------------------------------------------
     ## This URI must be fixed - it's used to assemble the text dataset.
@@ -241,9 +241,8 @@
 ### Configuring an Analyzer
 
 Text to be indexed is passed through a text analyzer that divides it into tokens 
-and may perform other transformations such as eliminating stop words.  If a Lucene
-text index is used then, by default a `StandardAnalyzer` is used.  If a Solr text
-index is used, the analyzer used is determined by the Solr configuration.
+and may perform other transformations such as eliminating stop words.  If a Lucene or ElasticSearch
+text index is used then, by default a `StandardAnalyzer` is used. 
 
 It is possible to configure an alternative analyzer for each field indexed in a
 Lucene index.  For example:
@@ -270,6 +269,8 @@
 In addition, Jena provides `LowerCaseKeywordAnalyzer`,
 which is a case-insensitive version of `KeywordAnalyzer`.
 
+ElasticSearch currently doesn't support Analyzers beyond Standard Analyzer. 
+
 ### Configuration by Code
 
 A text dataset can also be constructed in code as might be done for a
@@ -417,4 +418,14 @@
     </dependency>
 
 adjusting the version <code>X.Y.Z</code> as necessary.  This will automatically
-include a compatible version of Lucene and the Solr java client, but not Solr server.
\ No newline at end of file
+include a compatible version of Lucene.
+
+For ElasticSearch implementation, you can include the following Maven Dependency:
+
+    <dependency>
+      <groupId>org.apache.jena</groupId>
+      <artifactId>jena-text-es</artifactId>
+      <version>X.Y.Z</version>
+    </dependency>
+
+adjusting the version <code>X.Y.Z</code> as necessary.
\ No newline at end of file