You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ch...@apache.org on 2015/09/09 11:35:59 UTC

svn commit: r1701957 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md

Author: chetanm
Date: Wed Sep  9 09:35:59 2015
New Revision: 1701957

URL: http://svn.apache.org/r1701957
Log:
OAK-3367 - Boosting fields not working as expected

Update docs

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md?rev=1701957&r1=1701956&r2=1701957&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md Wed Sep  9 09:35:59 2015
@@ -292,8 +292,8 @@ isRegexp
 
 boost
 : If the property is included in `nodeScopeIndex` then it defines the boost
-  done for the index value against the given property name.
-  **Boost currently does not work as expected due to [OAK-3367][OAK-3367]**
+  done for the index value against the given property name. See 
+  [Boost and Search Relevancy](#boost) for more details
 
 index
 : Determines if this property should be indexed. Mostly useful for fulltext
@@ -604,6 +604,67 @@ the codec to `Lucene46`
 Refer to [OAK-2853][OAK-2853] for details. Enabling the `Lucene46` codec
 would lead to smaller and compact indexes.
 
+<a name="boost"></a>
+#### Boost and Search Relevancy
+
+`@since Oak 1.2.5`
+
+When fulltext indexing is enabled then internally Oak would create a fulltext
+field which consists of text extracted from various other fields i.e. fields 
+for which `nodeScopeIndex` is `true`. This allows search like 
+`//*[jcr:contains(., 'foo')]` to perform search across any indexable field 
+containing foo (See [contains function][jcr-contains] for details)
+
+In certain cases its desirable that those nodes where the searched term is present
+in a specific property are ranked higher (come earlier in search result) compared
+to those node where the searched term is found in some other property.
+
+In such cases it should be possible to boost specific text contributed by 
+individual property. Meaning that if a title field is boosted more than description, 
+then search result would those node coming earlier where searched term is found
+in title field
+
+For that to work ensure that for each such property (which need to be preferred)
+both `nodeScopeIndex` and `analyzed` are set to true. In addition you can specify 
+`boost` property so give higher weightage to values found in specific property
+
+Note that even without setting explicit `boost` and just setting `nodeScopeIndex` 
+and `analyzed` to true would improve the search result due to the way 
+[Lucene does scoring][boost-faq]. Internally Oak would create separate Lucene 
+fields for those jcr properties and would perform a search across all such fields. 
+For more details refer to [OAK-3367][OAK-3367]
+
+```
+  + indexRules
+    - jcr:primaryType = "nt:unstructured"
+    + app:Asset
+      + properties
+        - jcr:primaryType = "nt:unstructured"
+        + description
+          - nodeScopeIndex = true
+          - analyzed = true
+          - name = "jcr:content/metadata/jcr:description"
+        + title
+          - analyzed = true
+          - nodeScopeIndex = true
+          - name = "jcr:content/metadata/jcr:title"
+          - boost = 2.0
+```
+
+With above index config a search like
+
+```
+SELECT
+  *
+FROM [app:Asset] 
+WHERE 
+  CONTAINS(., 'Batman')
+```
+
+Would have those node (of type app:Asset) come first where _Batman_ is found in
+_jcr:title_. While those nodes where search text is found in other field
+like aggregated content would come later
+
 <a name="osgi-config"></a>
 ### LuceneIndexProvider Configuration
 
@@ -1352,4 +1413,6 @@ such fields
 [default-config]: https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
 [lucene-codec]: https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/codecs/Codec.html
 [tika-download]: https://tika.apache.org/download.html
-[oak-run-tika]: https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#tika
\ No newline at end of file
+[oak-run-tika]: https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#tika
+[jcr-contains]: http://www.day.com/specs/jcr/1.0/6.6.5.2_jcr_contains_Function.html
+[boost-faq]: https://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_make_sure_that_a_match_in_a_document_title_has_greater_weight_than_a_match_in_a_document_body.3F
\ No newline at end of file