You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Jan Lehnardt (JIRA)" <ji...@apache.org> on 2017/04/09 08:41:42 UTC

[jira] [Commented] (COUCHDB-2867) Mango: should be able to index *within* arrays

    [ https://issues.apache.org/jira/browse/COUCHDB-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962070#comment-15962070 ] 

Jan Lehnardt commented on COUCHDB-2867:
---------------------------------------

I’m with [~buhrmi] here, I don’t quite follow, why this is a limitation.

Let me preface this by saying that I have not a lot of experience with Mango, and if there is a way to get “all documents with a tag `green`” out of a list of tags with the `json` index type, then I’d be happy to learn about this, so we can document it :)

Then, let’s clear up $in vs. $elemMatch:
- $in is for: here’s a list of things to match against an index
- $elemMatch is for: here is one thing to match against a “list in the index” (more precisely: match against a list in the sourced that got exploded into the index)

This issue is only concerned about $elemMatch.

The one thing that confuses me is that the way I read the indexer code is that it does not index JSON lists at all, but the tests include a case where a list of objects is matched against https://github.com/apache/couchdb/blob/master/src/mango/test/03-operator-test.py#L37-L64 — as far as I can tell, this only works on the special _id index that is just _all_docs applied with the selector, so it’s a filter, more than an indexed request, but it does _not_ work with a custom index that points to a JSON list.

If I’m correct in the last paragraph, the question is now whether we want to expand the indexer to dive into arrays.

And at this point, I get [~tonysun83]’s comment about “we didn't want to perform two index scans”, which confused me previously. The problem then is not only the indexing part (which should be not too hard), but thinking one step ahead.

Imagine a set of docs like this:

{code:json}
{"tags":["blue", "green", "black"]}
{"tags":["red", "yellow", "blue"]}
{"tags":["orange", "green", "brown"]}
{code}

Pretending we did index arrays as requested, satisfying a selector that looks like this is a straight single-k/v range lookup on the index:

[code:json}
{
  "selector": {
    "tags": {
      "$elemMatch": {
        "$eq": "green"
      }
    }
  }
}
{code}

But as [~tonysun83] points out, people should be able to expect to make this more complicated, e.g. not only, give me all documents with the tag "green", but also: give me all documents with the tags "green" or "yellow" or even "green" and "yellow":

{code:json}
{
  "selector": {
    "$[and|or]": [
      {"tags": {
        "$elemMatch": {
          "$eq": "green"
        }
      }},
      {"tags": {
        "$elemMatch": {
          "$eq": "yello"
        }
      }}
    ]
  }
}
{code}

At this point, at query time, we’d have to two index lookups for each branch of the $and/$or query, and merge the results, and that’s something that isn’t supported at query time.


Assuming all of the above is correct, the question now becomes: do we want to support this?

M/R views already have support for multiple start key/endkey ranges in one request, so I don’t really see why Mango shouldn’t get this as well.

Given that this produces (at least) O(ranges * shards) requests under the hood, we might want to add a safety limit into the amount or $and/$or values you can chain.



> Mango: should be able to index *within* arrays
> ----------------------------------------------
>
>                 Key: COUCHDB-2867
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2867
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Mango
>            Reporter: Nolan Lawson
>
> If you have a document like:
> {code:javascript}
> {
>   "_id": "foo",
>   "tags": ["a", "b", "c"]
> }
> {code}
> ...then you should be able to run queries that find e.g. all documents with "a" as a tag, and it should be *indexed*. Currently there doesn't seem to be any way to do this except as an in-memory selector, which is a real bummer, because it's a super common use case. (Tags, categories, labels, etc.)
> Originally I thought this was how {{$elemMatch}} worked, and I was surprised to learn that that's not the case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)