You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/06/07 20:47:28 UTC

[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #418: Add search index documentation

flimzy commented on a change in pull request #418: Add search index documentation
URL: https://github.com/apache/couchdb-documentation/pull/418#discussion_r291746331
 
 

 ##########
 File path: src/api/ddoc/views.rst
 ##########
 @@ -315,6 +315,1128 @@ including the update sequence of the database from which the view was
 generated. The returned value can be compared this to the current update
 sequence exposed in the database information (returned by :get:`/{db}`).
 
+Search
+======
+
+Search indexes enable you to query a database by using `Lucene Query Parser Syntax <http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview>`_. A search index uses one, or multiple, fields from your documents. You can use a search index to run queries, find documents based on the content they contain, or work with groups, facets, or geographical searches.
+
+To create a search index, you add a JavaScript function to a design document in the database. An index builds after processing one search request or after the server detects a document update. The ``index`` function takes the following parameters: 
+
+1.  Field name - The name of the field you want to use when you query the index. If you set this parameter to ``default``, then this field is queried if no field is specified in the query syntax.
+2.  Data that you want to index, for example, ``doc.address.country``. 
+3.  (Optional) The third parameter includes the following fields: ``boost``, ``facet``, ``index``, and ``store``. These fields are described in more detail later.   
+
+By default, a search index response returns 25 rows. The number of rows that is returned can be changed by using the ``limit`` parameter. However, a result set from a search is limited to 200 rows. Each response includes a ``bookmark`` field. You can include the value of the ``bookmark`` field in later queries to look through the responses.
+
+You can query the API by using one of the following methods: URI, CouchDB Dashboard, curl, or a browser plug-in, such as Postman or RESTClient.
+
+*Example design document that defines a search index:*
+
+.. code-block:: json
+
+    {
+    	"_id": "_design/search_example",
+    	"indexes": {
+    		"animals": {
+    			"index": "function(doc){ ... }"
+    		}
+	    }
+    }
+
+Search index partitioning type
+------------------------------
+
+A search index will inherit the partitioning type from the ``options.partitioned``
+field of the design document that contains it.
+
+Index functions
+---------------
+
+Attempting to index by using a data field that does not exist fails. To avoid this problem, use an appropriate :ref:`index_guard_clauses`.
+
+.. note:: 
+    Your indexing functions operate in a memory-constrained environment where the 
+    document itself forms a part of the memory that is used in that environment. 
+    Your code's stack and document must fit inside this memory. Documents are limited 
+    to a maximum size of 64 MB.
+
+.. note:: 
+    Within a search index, do not index the same field name with more than one data 
+    type. If the same field name is indexed with different data types in the same search 
+    index function, you might get an error when querying the search index that says the 
+    field "was indexed without position data." For example, do not include both of these 
+    lines in the same search index function, as they index the ``myfield`` field as two 
+    different data types: a string ``"this is a string"`` and a number ``123``.
+
+.. code-block:: json
+
+    index("myfield", "this is a string");
+    index("myfield", 123);
+
+The function that is contained in the index field is a JavaScript function
+that is called for each document in the database.
+The function takes the document as a parameter,
+extracts some data from it,
+and then calls the function that is defined in the ``index`` field to index that data.
+
+The ``index`` function takes three parameters, where the third parameter is optional.
+
+The first parameter is the name of the field you intend to use when querying the index,
+and which is specified in the Lucene syntax portion of subsequent queries.
+An example appears in the following query:
+
+.. code-block:: javascript
+
+    query=color:red
+
+The Lucene field name ``color`` is the first parameter of the ``index`` function.
+
+The ``query`` parameter can be abbreviated to ``q``,
+so another way of writing the query is as follows:
+
+.. code-block:: javascript
+
+    q=color:red
+
+If the special value ``"default"`` is used when you define the name,
+you do not have to specify a field name at query time.
+The effect is that the query can be simplified:
+
+.. code-block:: javascript
+
+    query=red
+
+The second parameter is the data to be indexed. Keep the following information in mind when you index your data: 
+
+- This data must be only a string, number, or boolean. Other types will cause an error to be thrown by the index function call.
+- If an error is thrown when running your function, for this reason or others, the document will not be added to that search index.
+
+The third, optional, parameter is a JavaScript object with the following fields:
+
+*Index function (optional parameter)*
+
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| Option       | Description                                                          | Values                           | Default         |
++==============+======================================================================+==================================+=================+
+| ``boost``    | A number that specifies the relevance in search results.             | A positive floating point number | 1 (no boosting) |
+|              | Content that is indexed with a boost value greater than 1            |                                  |                 |
+|              | is more relevant than content that is indexed without a boost value. |                                  |                 |
+|              | Content with a boost value less than one is not so relevant.         |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``facet``    | Creates a faceted index. For more information, see                   | ``true``, ``false``              | ``false``       |
+|              | :ref:`faceting`.                                                     |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``index``    | Whether the data is indexed, and if so, how. If set to ``false``,    | ``true``, ``false``              | ``false``       |
+|              | the data cannot be used for searches, but can still be retrieved     |                                  |                 |
+|              | from the index if ``store`` is set to ``true``.                      |                                  |                 |
+|              | For more information, see :ref:`analyzers`.                          |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``store``    | If ``true``, the value is returned in the search result;             | ``true``, ``false``              | ``false``       |
+|              | otherwise, the value is not returned.                                |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+
+.. note:: 
+
+    If you do not set the ``store`` parameter,
+    the index data results for the document are not returned in response to a query.
+
+*Example search index function:*
+
+.. code-block:: javascript
+
+    function(doc) {
+	    index("default", doc._id);
+	    if (doc.min_length) {
+		    index("min_length", doc.min_length, {"store": true});
+	    }
+	    if (doc.diet) {
+		    index("diet", doc.diet, {"store": true});
+	    }
+	    if (doc.latin_name) {
+		    index("latin_name", doc.latin_name, {"store": true});
+	    }
+	    if (doc.class) {
+		    index("class", doc.class, {"store": true});
+	    }
+    }
+
+.. _api/ddoc/view/index_guard_clauses:
+
+Index guard clauses
+^^^^^^^^^^^^^^^^^^^
+
+The ``index`` function requires the name of the data field to index as the second parameter.
+However,
+if that data field does not exist for the document,
+an error occurs.
+The solution is to use an appropriate 'guard clause' that checks if the field exists,
+and contains the expected type of data,
+*before* any attempt to create the corresponding index.
+
+*Example of failing to check whether the index data field exists:*
+
+.. code-block:: javascript
+
+    if (doc.min_length) {
+	    index("min_length", doc.min_length, {"store": true});
+    }
+
+You might use the JavaScript ``typeof`` function to implement the guard clause test.
+If the field exists *and* has the expected type,
+the correct type name is returned,
+so the guard clause test succeeds and it is safe to use the index function.
+If the field does *not* exist,
+you would not get back the expected type of the field,
+therefore you would not attempt to index the field.
+
+JavaScript considers a result to be false if one of the following values is tested:
+
+*	'undefined'
+*	null
+*	The number +0
+*	The number -0
+*	NaN (not a number)
+*	"" (the empty string)
+
+*Using a guard clause to check whether the required data field exists,
+and holds a number,
+before an attempt to index:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) === 'number') {
+	    index("min_length", doc.min_length, {"store": true});
+    }
+
+Use a generic guard clause test to ensure that the type of the candidate data field is defined.
+
+*Example of a 'generic' guard clause:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) !== 'undefined') {
+	    // The field exists, and does have a type, so we can proceed to index using it.
+	    ...
+    }
+
+.. _api/ddoc/view/analyzers:
+
+Analyzers
+---------
+
+Analyzers are settings that define how to recognize terms within text.
+Analyzers can be helpful if you need to :ref:`language-specific-analyzers`.
+
+Here's the list of generic analyzers that are supported by search:
+
++----------------+---------------------------------------------------------------------------------+
+| Analyzer       | Description                                                                     |
++================+=================================================================================+
+| ``classic``    | The standard Lucene analyzer, circa release 3.1.                                |
++----------------+---------------------------------------------------------------------------------+
+| ``email``      | Like the ``standard`` analyzer, but tries harder to match an email              |
+|                | address as a complete token.                                                    |
++----------------+---------------------------------------------------------------------------------+
+| ``keyword``    | Input is not tokenized at all.                                                  |
++----------------+---------------------------------------------------------------------------------+
+| ``simple``     | Divides text at non-letters.                                                    |
++----------------+---------------------------------------------------------------------------------+
+| ``standard``   | The default analyzer. It implements the Word Break rules from the               |
+|                | `Unicode Text Segmentation algorithm <http://www.unicode.org/reports/tr29/>`_.  |
++----------------+---------------------------------------------------------------------------------+
+| ``whitespace`` | Divides text at white space boundaries.                                         |
++----------------+---------------------------------------------------------------------------------+
+
+
+*Example analyzer document:*
+
+.. code-block:: javascript
+
+    {
+	    "_id": "_design/analyzer_example",
+	    "indexes": {
+		    "INDEX_NAME": {
+			    "index": "function (doc) { ... }",
+			    "analyzer": "$ANALYZER_NAME"
+		    }
+	    }
+    }
+
+.. _api/ddoc/view/language-specific-analyzers:
+
+Language-specific analyzers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These analyzers omit common words in the specific language,
+and many also `remove prefixes and suffixes <http://en.wikipedia.org/wiki/Stemming>`_.
+The name of the language is also the name of the analyzer.
+
+*	``arabic``
+*	``armenian``
+*	``basque``
+*	``bulgarian``
+*	``brazilian``
+*	``catalan``
+*	``cjk`` (Chinese, Japanese, Korean)
+*	``chinese`` (`smartcn <http://lucene.apache.org/core/4_2_1/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html>`_)
+*	``czech``
+*	``danish``
+*	``dutch``
+*	``english``
+*	``finnish``
+*	``french``
+*	``german``
+*	``greek``
+*	``galician``
+*	``hindi``
+*	``hungarian``
+*	``indonesian``
+*	``irish``
+*	``italian``
+*	``japanese`` (`kuromoji <http://lucene.apache.org/core/4_2_1/analyzers-kuromoji/overview-summary.html>`_)
+*	``latvian``
+*	``norwegian``
+*	``persian``
+*	``polish`` (`stempel <http://lucene.apache.org/core/4_2_1/analyzers-stempel/overview-summary.html>`_)
+*	``portuguese``
+*	``romanian``
+*	``russian``
+*	``spanish``
+*	``swedish``
+*	``thai``
+*	``turkish``
+
+.. note::
+
+    Language-specific analyzers are optimized for the specified language. You cannot combine a generic analyzer with a language-specific analyzer. Instead, you might use a :ref:`per-field-analyzers` to select different analyzers for different fields within the documents.
+
+.. _api/ddoc/view/per-field-analyzers:
+
+Per-field analyzers
+^^^^^^^^^^^^^^^^^^^
+
+The ``perfield`` analyzer configures multiple analyzers for different fields.
+
+*Example of defining different analyzers for different fields:*
+
+.. code-block:: javascript
+
+    {
+	    "_id": "_design/analyzer_example",
+	    "indexes": {
+		    "INDEX_NAME": {
+			    "analyzer": {
+				    "name": "perfield",
+				    "default": "english",
+				    "fields": {
+					    "spanish": "spanish",
+					    "german": "german"
+				    }
+			    },
+			    "index": "function (doc) { ... }"
+		    }
+	    }
+    }
+
+Stop words
+^^^^^^^^^^
+
+Stop words are words that do not get indexed.
+You define them within a design document by turning the analyzer string into an object.
+
+.. note:: 
+
+    The ``keyword``, ``simple``, and ``whitespace`` analyzers do not support stop words.
+
+The default stop words for the ``standard`` analyzer are included below:
+
+.. code-block:: json
+
+    "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", 
+    "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", 
+    "that", "the", "their", "then", "there", "these", "they", "this", 
+    "to", "was", "will", "with" 
+
+
+*Example of defining non-indexed ('stop') words:*
+
+.. code-block:: json
+
+    {
+	    "_id": "_design/stop_words_example",
+	    "indexes": {
+		    "INDEX_NAME": {
+			    "analyzer": {
+				    "name": "portuguese",
+				    "stopwords": [
+					    "foo",
+					    "bar",
+					    "baz"
+				    ]
+			    },
+			    "index": "function (doc) { ... }"
+		    }
+	    }
+    }
+
+Testing analyzer tokenization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can test the results of analyzer tokenization by posting sample data to the ``_search_analyze`` endpoint.
+
+*Example of using HTTP to test the ``keyword`` analyzer:*
+
+.. code-block:: http
+
+    POST /_search_analyze HTTP/1.1
+    Content-Type: application/json
+    {"analyzer":"keyword", "text":"ablanks@renovations.com"}
+
+*Example of using the command line to test the ``keyword`` analyzer:*
+
+.. code-block:: sh
+
+    curl 'https://$HOST:5984/_search_analyze' -H 'Content-Type: application/json'
+	    -d '{"analyzer":"keyword", "text":"ablanks@renovations.com"}'
+
+*Result of testing the ``keyword`` analyzer:*
+
+.. code-block:: json
+
+    {
+	    "tokens": [
+		    "ablanks@renovations.com"
+	    ]
+    }
+
+*Example of using HTTP to test the ``standard`` analyzer:*
+
+.. code-block:: http
+
+    POST /_search_analyze HTTP/1.1
+    Content-Type: application/json
+    {"analyzer":"standard", "text":"ablanks@renovations.com"}
+
+*Example of using the command line to test the ``standard`` analyzer:*
+
+.. code-block:: sh
+
+    curl 'https://$HOST:5984/_search_analyze' -H 'Content-Type: application/json'
+	    -d '{"analyzer":"standard", "text":"ablanks@renovations.com"}'
+
+*Result of testing the ``standard`` analyzer:*
+
+.. code-block:: json
+
+    {
+	    "tokens": [
+		    "ablanks",
+		    "renovations.com"
+	    ]
+    }
+
+Queries
+-------
+
+After you create a search index, you can query it.
+
+- Issue a partition query using: ``GET /$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/_search/$INDEX_NAME``
+- Issue a global query using: ``GET /$DATABASE/_design/$DDOC/_search/$INDEX_NAME``
+
+Specify your search by using the ``query`` parameter.
+
+*Example of using HTTP to query a partitioned index:*
+
+.. code-block:: http
+
+    GET /$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/_search/$INDEX_NAME?include_docs=true&query="*:*"&limit=1 HTTP/1.1
+    Content-Type: application/json
+
+*Example of using HTTP to query a global index:*
+
+.. code-block:: http
+
+    GET /$DATABASE/_design/$DDOC/_search/$INDEX_NAME?include_docs=true&query="*:*"&limit=1 HTTP/1.1
+    Content-Type: application/json
+
+*Example of using the command line to query a partitioned index:*
+
+.. code-block:: sh
+
+    curl https://$HOST:5984/$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/
+    _search/$INDEX_NAME?include_docs=true\&query="*:*"\&limit=1 \
+
+*Example of using the command line to query a global index:*
+
+.. code-block:: sh
+
+    curl https://$HOST:5984/$DATABASE/_design/$DDOC/_search/$INDEX_NAME?
+    include_docs=true\&query="*:*"\&limit=1 \
+
+.. _api/ddoc/view/query_parameters:
+
+Query Parameters
+^^^^^^^^^^^^^^^^
+
+You must enable :ref:`faceting` before you can use the following parameters:
+
+-	``counts``
+-	``drilldown``
+
++------------------------+------------------------------------------------------+-------------------+------------------+-----------------------+-------------------+
+| Argument               | Description                                          | Optional          | Type             | Supported values      | Partitioned query |
++========================+======================================================+===================+==================+=======================+===================+
+| ``bookmark``           | A bookmark that was received from a previous search. | yes               | String           |                       | yes               |
+|                        | This parameter enables paging through the results.   |                   |                  |                       |                   |
+|                        | If there are no more results after the bookmark,     |                   |                  |                       |                   |
+|                        | you get a response with an empty rows array and the  |                   |                  |                       |                   | 
+|                        | same bookmark, confirming the end of the result list.|                   |                  |                       |                   |
++------------------------+------------------------------------------------------+-------------------+------------------+-----------------------+-------------------+
+| ``counts``             | This field defines an array of names of string       | yes               | JSON             | A JSON array of field | no                |
+|                        | fields, for which counts are requested. The response |                   |                  | names.                |                   |
+|                        | contains counts for each unique value of this        |                   |                  |                       |                   |
+|                        | field name among the documents that match the search |                   |                  |                       |                   | 
+|                        | query. :ref:`faceting` must be enabled for           |                   |                  |                       |                   |
+|                        | this parameter to function.                          |                   |                  |                       |                   |
++------------------------+------------------------------------------------------+-------------------+------------------+-----------------------+-------------------+
+| ``drilldown``          | This field can be used several times. Each use       | no                | JSON             | A JSON array with two | yes               |
+|                        | defines a pair with a field name and a value.        |                   |                  | elements: the field   |                   |
+|                        | The search matches only documents containing the     |                   |                  | name and the value.   |                   | 
+|                        | value that was provided in the named field. It       |                   |                  |                       |                   |
+|                        | differs from using ``"fieldname:value"`` in          |                   |                  |                       |                   |
+|                        | the ``q`` parameter only in that the values are not  |                   |                  |                       |                   |
+|                        | analyzed. :ref:`faceting` must be enabled for this   |                   |                  |                       |                   |
+|                        | parameter to function.                               |                   |                  |                       |                   |
++------------------------+------------------------------------------------------+-------------------+------------------+-----------------------+-------------------+
+| ``group_field``        | Field that groups search matches                     | yes               |  String          | A string that         | no                |
+|                        |                                                      |                   |                  | contains the name of  |                   |
+|                        |                                                      |                   |                  | a string field.       |                   |
+|                        |                                                      |                   |                  | Fields containing     |                   |
+|                        |                                                      |                   |                  | other data such as    |                   | 
+|                        |                                                      |                   |                  | numbers,objects, or   |                   |
 
 Review comment:
   ```suggestion
   |                        |                                                      |                   |                  | numbers, objects, or  |                   |
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services