You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "mikerhodes (via GitHub)" <gi...@apache.org> on 2023/01/27 13:38:52 UTC

[GitHub] [couchdb] mikerhodes opened a new pull request, #4410: Mango covering JSON indexes RFC

mikerhodes opened a new pull request, #4410:
URL: https://github.com/apache/couchdb/pull/4410

   ## Overview
   
   <!-- Please give a short brief for the pull request,
        what problem it solves or how it makes things better. -->
   
   RFC for covering indexes in Mango. Propose covering indexes within Mango, for JSON indexes, and outline the implementation steps.
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affect multiple components in different
        repositories please put links to those issues or pull requests here.  -->
   
   #4394 is a pre-req for this work, because without it we always need to read the full doc to return to the coordinator.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1090470757


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).

Review Comment:
   Noted for some later pondering 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091168233


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   Wonder if there is any value in applying limits, or would that be over-complicating it?  We'd have to decide what happens when the limit is reached: crash or ignore and fall back to just loading the doc. 



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.

Review Comment:
   Would the equivalent of `"include": []` be the same as not specifying an `"include"` field at all? When generating the view signature we'd probably also want to "normalize" this fact, i.e. transform the `"include": []` to a doc with a missing `"include"` field or vice-versa. Or we could just reject `"include": []`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091101499


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   Missing '`' for `$lte`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091152641


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether its a covering index. This means that an updated shard will be
+checking for those fields. When it can't find them, it'll fallback to the
+current behaviour in `view_cb/2`, meaning that it reads the document found via
+`include_docs=true`, execute `match_and_extract_doc/3` and return the row if it
+matches the query.
+
+The coordinator will received the final result document as today and assume it's
+correct, and forward it to the client.More work than needed will be carried out

Review Comment:
   `client.More` - Add a space between `client.` and `More`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092465800


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   Looks great. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1090470291


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of

Review Comment:
   It's worth considering. In [the past](https://dx13.co.uk/articles/2017/02/04/why-you-should-generally-avoid-using-include_docs-in-cloudant-and-couchdb-view-queries/) I've used a simple benchmark to show that having documents included in the index does produce faster query response than `include_docs=true`, at least in relatively extreme scenarios, which does argue for using it. In truth, I'd want to look at how the btree is laid out on disk and look at some benchmarking to understand how much storing tons of extra data in the view affects query speed. It feels like there's a potential to be pulling tons of unneeded data from disk.
   
   So, to start with, I wanted to scope this down to what I saw as the minimum useful thing. It's harder to remove things later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091141464


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether its a covering index. This means that an updated shard will be

Review Comment:
   `its` -> `it's` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1093466980


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   > What if instead of storing plain values in rows we'd allow also storing a marker which would indicate that a value was too large, and just fallback to reading the doc?
   
   I'd like to avoid doing this if we can -- while nice in some ways, this approach makes it more difficult for a customer to understand the performance of their query. I'd like the performance of queries to be predictable. My feeling is that large fields should require a trip to primary data, and that enforcing smaller value sizes keeps it more likely that indexes can live in RAM, where, I think, they should be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] pgj commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "pgj (via GitHub)" <gi...@apache.org>.
pgj commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1115446830


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,398 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo-inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions

Review Comment:
   That is `mango_cursor_view.erl`.



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,398 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo-inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, `$gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name": {"$exists":
+true}}` requires that the `name` field exist so the index can be used because
+the query predicates can only match documents containing both `age` and `name`,
+just like the index. In both cases, note the predicate `"age": {"$gt": 30}`
+implies `"age": {"$exists": true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested

Review Comment:
   I guess that is `mango_cursor_view:execute/3`.  It might be worth to use fully qualified names to aid readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] garrensmith commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "garrensmith (via GitHub)" <gi...@apache.org>.
garrensmith commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1090527150


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if

Review Comment:
   Nice example thanks. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094391514


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- `"include": []` is equivalent to omitting the `include` field.
+- Ordering of the fields in `include` is not important. They can be reordered
+    before storing if needed (eg, sorted).
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+### Limits on included fields
+
+Adding "too much" data to indexes is likely to slow down index scans because
+there will be more data to process. We would also like to avoid users creating
+pathological cases just because they can. Therefore, limiting the data that can
+be stored seems wise. Saying that, for those that are willing to profile their
+workloads should have a get-out clause from limits.
+
+As an example, [postgres](https://www.postgresql.org/docs/current/limits.html) limits indexes to 32 columns. Its max field size is 1GB; I think we'd like something a little smaller!
+
+Therefore the feature will have the following limit enforcement settings:
+
+- `mango_json_index_include_fields_max` is the limit on the length of the
+    `include` list.
+- `mango_json_index_include_depth_max` is a limit on the depth of fields we will
+    pull out. Basically the maximum numbers of `.` in a path.
+- If the total number of bytes for values exceeds
+  `mango_json_index_include_size_bytes_max` then we will skip that document from
+  the index.
+
+I need to check whether these should be prefixed `mango_` given they would live
+in a `mango` configuration section.
+
+Defaults:
+
+- `mango_json_index_include_fields_max=16`
+- `mango_json_index_include_depth_max=8`
+- `mango_json_index_include_size_bytes_max=32768` (32kb)
+
+I have chosen power-of-two limits mostly because they feel like familiar
+numbers. Not a great reason, so these may be refined during code writing if I
+can work out suitable benchmarks.
+
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether it's a covering index. This means that an updated shard will
+be checking for those fields. When it can't find them, it'll fallback to the
+current behaviour in `view_cb/2`, meaning that it reads the document found via
+`include_docs=true`, execute `match_and_extract_doc/3` and return the row if it
+matches the query.
+
+The coordinator will received the final result document as today and assume it's

Review Comment:
   6f5dd5f5d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- `"include": []` is equivalent to omitting the `include` field.

Review Comment:
   6f5dd5f5d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.

Review Comment:
   6f5dd5f5d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092266820


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across

Review Comment:
   91f99be2d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether its a covering index. This means that an updated shard will be
+checking for those fields. When it can't find them, it'll fallback to the
+current behaviour in `view_cb/2`, meaning that it reads the document found via
+`include_docs=true`, execute `match_and_extract_doc/3` and return the row if it
+matches the query.
+
+The coordinator will received the final result document as today and assume it's
+correct, and forward it to the client.More work than needed will be carried out

Review Comment:
   91f99be2d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1093448316


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   What if instead of storing plain values in rows we'd allow also storing a marker which would indicate that a value was too large, and just fallback to reading the doc? We can still have a hard-limit (strict) mode perhaps which deals with failures, but this would allows us not to deal with indexing failures so to speak and punt it for later.
   
   There may be an optimal cut-off value where storing it in the index is more wasteful than reading the doc? Or there may be not, as technically I think we can write arbitrarily large values in the index, it will just spread over a lot of b-tree blocks, but we'd still duplicate the data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094386528


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.

Review Comment:
   I tend towards four spaces as, at least a few years ago, that used to be safest across Markdown renderers. For consistency, I've found the one item that _wasn't_ indented in that style and corrected it. Given CommonMark states that you'd use two spaces in this context, in future I'll probably drop this habit. But for now I've kept it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] janl commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "janl (via GitHub)" <gi...@apache.org>.
janl commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1089003450


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes

Review Comment:
   ```suggestion
   as it does when `include_docs` is used -- apart from when the user passes `r>=2`
   in the Mango query because then the coordinator reads and processes
   ```
   
   missin `\n`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091131120


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]

Review Comment:
   The order of the fields in `include` is not meaningful in any way? Should we add a note highlighting it in the API, just for completeness.
   
   As an implementation detail, perhaps we'd just want to normalize it by sorting when creating the design doc and the view signature. That would mean that two indexes with the same details an only the include in a different order would be equivalent and "point to" the same view signature.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] garrensmith commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "garrensmith (via GitHub)" <gi...@apache.org>.
garrensmith commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1089977986


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys

Review Comment:
   Just checking my understanding. The added subtlety here is because previously we would do an in-memory filter of the document to check the filter completely matches the document. Now if we can use the index alone we have to make sure all fields in the selector are also in the index keys. So if a selector has filters on `name`, `age` and `country` and the `fields` section in the query is `name` and `age`. Mango would have to choose an index with `name`, `age` and `country` even though it is only returning two fields. Is that correct?
   
   
   
   What happens if no index satisfies this?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).

Review Comment:
   +1 to removing the `r` option. It has been something I wanted to remove for a long time. 



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of

Review Comment:
   Is this something we should consider or rather if a user wants the whole document, they would need to list all the fields of the index?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1090525849


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys

Review Comment:
   Added some clarifying notes in 56ae1854f. I created a new section better describing the relationship required between an index's keys and the `selector` if an index is to be used during query processing. I tried to then cover the specifics for this work within the "Part 1" and "Part 2" sections.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091146250


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across

Review Comment:
   Tiny nit: Capitalize `Erlang`. Not sure which one is more correct but just to keep consistent with the other RFCs' style. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] pgj commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "pgj (via GitHub)" <gi...@apache.org>.
pgj commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092888346


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.

Review Comment:
   Nit: I think "Mongo inspired" is [a compound modifier before the word and it has to be written with a hyphen](https://www.grammarly.com/blog/hyphen/).



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.

Review Comment:
   Nit: The indentation of the line seems to be off here.



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an

Review Comment:
   Curiosity: Is there a specific reason why only these predicates can be used?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used

Review Comment:
   Should not this be valid JSON, e.g. be put as `{"$exists": true}` (or something like that) instead?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- `"include": []` is equivalent to omitting the `include` field.

Review Comment:
   I guess that is the same for `"include": null` too -- or is this handled globally, i.e. for every JSON attribute?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.

Review Comment:
   Nit: Capitalize "mango" to unify style.



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.

Review Comment:
   See my previous comment above about the syntax of `$exists`.



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- `"include": []` is equivalent to omitting the `include` field.
+- Ordering of the fields in `include` is not important. They can be reordered
+    before storing if needed (eg, sorted).
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+### Limits on included fields
+
+Adding "too much" data to indexes is likely to slow down index scans because
+there will be more data to process. We would also like to avoid users creating
+pathological cases just because they can. Therefore, limiting the data that can
+be stored seems wise. Saying that, for those that are willing to profile their
+workloads should have a get-out clause from limits.
+
+As an example, [postgres](https://www.postgresql.org/docs/current/limits.html) limits indexes to 32 columns. Its max field size is 1GB; I think we'd like something a little smaller!
+
+Therefore the feature will have the following limit enforcement settings:
+
+- `mango_json_index_include_fields_max` is the limit on the length of the
+    `include` list.
+- `mango_json_index_include_depth_max` is a limit on the depth of fields we will
+    pull out. Basically the maximum numbers of `.` in a path.
+- If the total number of bytes for values exceeds
+  `mango_json_index_include_size_bytes_max` then we will skip that document from
+  the index.
+
+I need to check whether these should be prefixed `mango_` given they would live
+in a `mango` configuration section.
+
+Defaults:
+
+- `mango_json_index_include_fields_max=16`
+- `mango_json_index_include_depth_max=8`
+- `mango_json_index_include_size_bytes_max=32768` (32kb)
+
+I have chosen power-of-two limits mostly because they feel like familiar
+numbers. Not a great reason, so these may be refined during code writing if I
+can work out suitable benchmarks.
+
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether it's a covering index. This means that an updated shard will
+be checking for those fields. When it can't find them, it'll fallback to the
+current behaviour in `view_cb/2`, meaning that it reads the document found via
+`include_docs=true`, execute `match_and_extract_doc/3` and return the row if it
+matches the query.
+
+The coordinator will received the final result document as today and assume it's

Review Comment:
   Nit: "will receive"?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   Nit: The markdown of `$gte` is missing the opening backtick.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094388016


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an

Review Comment:
   I think they were just the easiest to convert into startkey/endkey in JSON indexes. Text indexes, ie Lucene-backed, came later. I'm not sure what the rules are in the text cursor, but these are the rules for JSON. Likely the set can be increased (`$in` comes up commonly).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092474471


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   I am not sure what the behavior for `mango_json_index_include_size_bytes_max` should be either. It's pretty tricky. From the top of my head I don't know what currently happens if we fail to index a document. With mango we don't usually fail, if I had to guess it would be we end up in a crash loop on that document. A user then may index for a day and hit a large field value their index crashes, they'd remove the field, try again (get a new view signature) , index for 2 days and crash, etc.
   
   The alternative is to skip the doc but then there is danger of it looking like data loss - user has some very important document (medical record, say allergies to medicine), index skips it, nobody notices until the patient is prescribed the wrong medication (sorry being, a bit dramatic here with a silly user story).  Maybe that's something we solve outside of the RFC and just that if the value made it past the max_document_size limit then it will be indexed or crash horribly...?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091968959


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   Goodness -- great point :tada:
   
   I think that we could validate the length of the list of fields when the ddoc is updated, rather than failing during indexing. We could also limit the depth by counting the `.` characters.
   
   One thing we can only validate at index time are things like the length of included strings. I think here that we might want to place a limit on the total size of the values, say 32kb. Even that's quite a few disk pages, though hopefully they are sequential on disk so the kernel's prefilling the page cache ahead of us.
   
   Given it's easier to start with limits and increase them later, perhaps we should think about this more deeply. In a view we allow ~anything I believe, but here potentially we could be more conservative.
   
   As an example, [postgres](https://www.postgresql.org/docs/current/limits.html) limits indexes to 32 columns. Its max field size is 1GB; I think we'd like something a little smaller 😬 
   
   Are there other limits here?
   
   My thought is that we do limit, and make it configurable, and perhaps start relatively low for the defaults:
   
   - `mango_json_index_include_fields_max=16` (why 16? Powers of two always sound nice)
   - `mango_json_index_include_depth_max=8`
   - `mango_json_index_include_size_bytes_max=32768` (32kb)
   
   We can enforce `mango_json_index_include_fields_max` and `mango_json_index_include_depth_max` in `_index`. (We may have to belt-and-braces this as the user can go behind Mango's back to upload views that are the "right shape").
   
   `mango_json_index_include_size_bytes_max` would need to be checked per document at index time. I worry what the behaviour should be here -- I see options of marking the whole index bad; having rows with "missing" values fields, meaning complexity during query; skipping indexing the document entirely. I lean towards skipping the doc as the least likely to cause unpredictable behaviour, but what's the current behaviour for views if indexing a doc fails?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091968959


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   Goodness -- great point :tada:
   
   I think that we could validate the length of the list of fields when the ddoc is updated, rather than failing during indexing. We could also limit the depth by counting the `.` characters.
   
   One thing we can only validate at index time are things like the length of included strings. I think here that we might want to place a limit on the total size of the values, say 32kb. Even that's quite a few disk pages.
   
   Given it's easier to start with limits and increase them later, perhaps we should think about this more deeply. In a view we allow ~anything I believe, but here potentially we could be more conservative.
   
   As an example, [postgres](https://www.postgresql.org/docs/current/limits.html) limits indexes to 32 columns. Its max field size is 1GB; I think we'd like something a little smaller 😬 
   
   Are there other limits here?
   
   My thought is that we do limit, and make it configurable, and perhaps start relatively low for the defaults:
   
   - `mango_json_index_include_fields_max=16` (why 16? Powers of two always sound nice)
   - `mango_json_index_include_depth_max=8`
   - `mango_json_index_include_size_bytes_max=32768` (32kb)
   
   We can enforce `mango_json_index_include_fields_max` and `mango_json_index_include_depth_max` in `_index`. (We may have to belt-and-braces this as the user can go behind Mango's back to upload views that are the "right shape").
   
   `mango_json_index_include_size_bytes_max` would need to be checked per document at index time. I worry what the behaviour should be here -- I see options of marking the whole index bad; having rows with "missing" values fields, meaning complexity during query; skipping indexing the document entirely. I lean towards skipping the doc as the least likely to cause unpredictable behaviour, but what's the current behaviour for views if indexing a doc fails?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094392076


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   6f5dd5f5d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.

Review Comment:
   6f5dd5f5d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091122584


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.

Review Comment:
   Would the equivalent of `"include": []` be the same as not specifying an `"include"` field at all? When generating the view signature we'd probably also want to "normalize" this fact, i.e. transform the `"include": []` to a doc with a missing `"include" field or vice-versa. Or we could just reject `"include": []`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091122584


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.

Review Comment:
   Would the equivalent of `"include": []` be the same as not specifying an `"include"` field at all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091101499


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   Missing '`` for `$lte`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092474471


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   I am not sure what the behavior for `mango_json_index_include_size_bytes_max` should be either. It's pretty tricky. From the top of my head I don't know what currently happens if we fail to index a document. With mango we don't usually fail, if I had to guess it would be we end up in a crash loop on that document. A user then may index for a day and hit a large field value their index crashes, they'd remove the field, try again (get a new view signature) , index for 2 days and crash, etc.
   
   The alternative is to skip the doc but then there is danger of it looking like data loss - user has some very important document (medical record, say allergies to medicine), index skips it, nobody notices until the patient is prescribed the wrong medication (sorry being a bit dramatic here with a silly user story).  Maybe that's something we solve outside of the RFC and just that if the value made it past the max_document_size limit then it will be indexed or crash horribly...?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091131120


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]

Review Comment:
   The order of the fields in `include` is not meaningful in any way? Should we add a note highlighting it in the API, just for completeness.
   
   As an implementation detail, perhaps we'd just want to normalize it by sorting when creating the design doc and the view signature. That would mean that two indexes with the same details and only the `include` in a different order would be equivalent and "point to" the same view signature.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091968959


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   Goodness -- great point :tada:
   
   I think that we could validate the length of the list of fields when the ddoc is updated, rather than failing during indexing. We could also limit the depth by counting the `.` characters.
   
   One thing we can only validate at index time are things like the length of included strings. I think here that we might want to place a limit on the total size of the values, say 32kb. Even that's quite a few disk pages, though hopefully they are sequential on disk so the kernel's prefilling the page cache ahead of us.
   
   Given it's easier to start with limits and increase them later, perhaps we should think about this more deeply. In a view index we allow ~anything I believe, but here potentially we could be more conservative.
   
   As an example, [postgres](https://www.postgresql.org/docs/current/limits.html) limits indexes to 32 columns. Its max field size is 1GB; I think we'd like something a little smaller 😬 
   
   Are there other limits here?
   
   My thought is that we do limit, and make it configurable, and perhaps start relatively low for the defaults:
   
   - `mango_json_index_include_fields_max=16` (why 16? Powers of two always sound nice)
   - `mango_json_index_include_depth_max=8`
   - `mango_json_index_include_size_bytes_max=32768` (32kb)
   
   We can enforce `mango_json_index_include_fields_max` and `mango_json_index_include_depth_max` in `_index`. (We may have to belt-and-braces this as the user can go behind Mango's back to upload views that are the "right shape").
   
   `mango_json_index_include_size_bytes_max` would need to be checked per document at index time. I worry what the behaviour should be here -- I see options of marking the whole index bad; having rows with "missing" values fields, meaning complexity during query; skipping indexing the document entirely. I lean towards skipping the doc as the least likely to cause unpredictable behaviour, but what's the current behaviour for views if indexing a doc fails?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092266565


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   91f99be2d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.

Review Comment:
   91f99be2d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of
+    `emit(doc.name, doc)`.
+- It will be an error to include a field in both `fields` and `include`. This
+    should be rejected by the `_index` call.
+- The `include` field would be rejected for `text` type indexes.
+
+Alternatives considered:
+
+- Adding `include` outside `index`. This didn't seem right as the `index`
+    object already includes `partial_filter_selector` and `include` seems a
+    peer of this. ([docs](https://docs.couchdb.org/en/stable/api/database/find.html#db-index)).
+- Alternative name `store`. We use this for Lucene indexes when dreyfus/clouseau
+    is used. I elected to use a separate name to either `value` or `store` to
+    avoid index-type specificity. I take the name from Postgres, which uses
+    `INCLUDE` in its index definition to [support covering indexes][pgcover].
+
+[pgcover]: https://www.postgresql.org/docs/current/indexes-index-only-scans.html
+
+Adding this will require changes in `mango_idx_view` to store the definition and
+in how we process documents during indexing, which looks to be in
+`get_index_entries` in `mango_native_proc`.
+
+We'll then need to update the Mango cursor methods mentioned above to take
+account of the values within the covering index code.
+
+One thing to be careful about is again index selection. We will still need all
+index keys to be present in the `selector` as above so need differentiate
+between the fields in index's keys and values when selecting an index to ensure
+we retain the correct behaviour per [Mango JSON index
+selection](#mango-json-index-selection).
+
+## Mixed versions during cluster upgrades
+
+The relevant scenarios here are an updated coordinator talking to outdated
+shards, and the opposite of an outdated coordinator talking to upgraded shards.
+A further wrinkle is that while a coordinator is either upgraded or not, the
+shards that the coordinator speaks to can be a mixture of upgraded and outdated.
+
+For the purposes of this discussion, we only need to worry about when a covering
+index is in play during a query; the code path outside that use-case should not
+change.
+
+From what I can tell, we can avoid special code paths for cluster upgrades
+specific to this work. Instead we accept that some queries will take longer
+during cluster upgrade mixed version operation. This is described below.
+
+### Updated coordinator, outdated shard
+
+In this case, the coordinator will note the covering index, and set the view
+query option `include_docs=false`. This means that the row passed to `view_cb/2`
+will not have a document included. In the function, `case ViewRow#view_row.doc
+of` will hit the `undefined` clause, meaning that the row is passed through
+unchanged, without the document. When the row reaches the coordinator and is
+passed to `doc_member_and_extract/2` from `handle_message/2`, the `case
+couch_util:get_value(doc, RowProps) of` will also hit its `undefined` clause.
+The coordinator will then perform a quorum read with `r=1` of the document and
+carry out the match and extract.
+
+This will slow down the processing of results at the coordinator for that row,
+but shouldn't alter the correctness of the result. So we shouldn't need a
+special code path to support this case. Which is nice.
+
+### Outdated coordinator, updated shard
+
+In this case, the coordinator won't be checking for covering indexes, meaning
+that `include_docs=true` will be set when `r<2` as today.
+
+I suspect we'll set an option in `viewcbargs` that contains the index field
+names and whether its a covering index. This means that an updated shard will be

Review Comment:
   91f99be2d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1092267796


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   I've added a section on limits for `include` in 91f99be2d. I think that the topic is really important and it's a bit facepalm that I skipped it. Too excited about writing Erlang I guess 😬 What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] janl commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "janl (via GitHub)" <gi...@apache.org>.
janl commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094375596


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to ensure that only fields in the `selector`
+and not `fields` are used when choosing an index. This is because we need all
+fields in the `selector` to be present per [Mango JSON index
+selection](#mango-json-index-selection). This is because `fields` is only used
+after we generate the result set, and none of the field names in `fields` need
+to exist in result documents.
+
+As an example, an index `["age", "name"]` would still require the `selector` to
+imply `$exists=true` for both `age` and `name` even if the `fields` were just
+`["age"]` in order that correct results be returned.
+
+Of note, this means that if an index is unusable pre-covering-index support, it
+will continue to be unusable after this implementation: whether an index covers
+a query is only used to prefer one already usable index over another.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2`
+in the Mango query because then the coordinator reads and processes documents.
+(Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other

Review Comment:
   We discussed this a bit in [the developer meeting yesterday](https://lists.apache.org/thread/5x22cl09jv5f9o82v3cw3opvpz9ywlqg).
   
   First, we compared this to how this would manifest in M/R views, and the example we used was what if a doc causes a JS exception to be thrown. What we do there is skip the doc and not include any result rows. We do log this in couch.log but it is not surfaced to the HTTP API user.
   
   We went through the motions of whether to add fields inline with the result and also recording a metric for this and concluded from experience that most people do not really care about these things.
   
   However some folks take this seriously, and we want to accommodate those. Imagine this:
   
   1. we increment a metrics each time a mango covering index can’t include a doc because of some limit.
   2. a user sees this number growing and finds out that this isn’t good.
   3. next they want to know which document caused this and we can point them to couch.log where this should appear (TODO: decide log level)
   4. finally, it’d be great if there was a variant to `_explain`, say: `POST /db/_explain?include_result=true` which returns the result like normal, but rows that are missing have an error object in them (or maybe we just show the error rows)
   
   So some handwaving to be defined away, but: let’s do this right this time :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1094391867


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used
+because the query predicates can only match documents containing both `age` and
+`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
+30}` implies `"age": {"$exists"=true}`.

Review Comment:
   6f5dd5f5d



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,397 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the Erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, `$lte`, `$eq`, $gte` and `$gt`. These can easily be
+converted to key operations within a key ordered index. For an index to be
+chosen for a query, the first key within the indexes complex key MUST be used
+with a predicate operator that can be converted into an operation on the index.
+
+Secondly, a quirk of Mango indexes is that for a document to be included in an
+index it must contain all of the index's indexed fields. Documents without all
+the fields will not be included. This means that when we are choosing an index
+for a query, we must further choose an index where the predicates within the
+`selector` imply `$exists=true` for all fields in the index's key. Without that,
+we will have incomplete results.
+
+Why is this? Let's look at an index with these fields:
+
+```json
+["age", "name"]
+```
+
+Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
+
+
+```json
+{"_id": "foo", "age": 39, "name": "mike"}
+
+{"_id": "bar", "age": 39, "pet": "cat"}
+```
+
+The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
+we use the index above, we'd miss out `bar` because it's not in the index.
+Therefore we can't use the index.
+
+On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
+{"$exists"=true}}` requires that the `name` field exist so the index can be used

Review Comment:
   6f5dd5f5d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1091101499


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,360 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Mango JSON index selection
+
+A Mango JSON index is implemented as a view with a complex key. The first field
+in the index is the first entry in the complex key, the second field is the
+second key and so on. Even indexes with one field use a complex key with length
+`1`.
+
+When choosing a JSON index to use for a query, there are a couple of things that
+are important to covering indexes.
+
+Firstly, note there are certain predicate operators that can be used with an
+index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be

Review Comment:
   Missing ``` ` ``` for `$lte`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] mikerhodes commented on a diff in pull request #4410: Mango covering JSON indexes RFC

Posted by "mikerhodes (via GitHub)" <gi...@apache.org>.
mikerhodes commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1090461322


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. )
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used when
+choosing an index. This is because we require all keys in the index to be fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes `r>=2` in the Mango query because then the coordinator reads and processes

Review Comment:
   Added in 240b91280.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org