You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2020/06/24 12:03:10 UTC

[DISCUSS] New Reduce design for FDB

Quick Note I have a gist markdown version of this that might be easier to
read https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180

Hi Everyone,

The team at Cloudant have been relooking at Reduce indexes for CouchDB on
FDB and we want to simply what we had initially planned and change some of
the reduce behaviour compared to CouchDB 3.x

Our initial design was to use a skip list. However this hasn’t proven to be
particularly useful approach. It would take very long to update and I can’t
find a good algorithm to query the skip list effectively.

So instead I would like to propose a much simpler reduce implementation. I
would like to use this as the base for reduce and we can look at adding
more functionality later if we need to.

For the new reduce design, instead of creating a skip list, we will instead
create group_level indexes for a key. For example say we have the following
keys we want to add to a reduce index:

```
([2019, 6, 1] , 1)
([2019, 6, 20] , 1)
([2019, 7, 3] , 1)
```

We would then create the following group_level indexes:

```
Level 0:
(null, 3)

Level=1:
([2019], 3)

Level 2:
([2019,6], 2)
([2019, 7] , 1)

Level3:
([2019, 6, 1,] , 1)
([2019, 6, 20,] , 1)
([2019, 7, 3,] , 1)
```

All of these group_level indexes would form part of the reduce index and
would be updated at the same time. We don’t need to know the actual
`group_levels` ahead of time as we would take any key we need to index look
at its length and add it to the group_levels it would belong to.

Another nice optimization we can do with this is when a user creates a view
they can specify the number of group levels to index e.g:

```
{
_id: _design/my-ddoc
  views: {
     one: {
       map: function (doc) {emit(doc.val, 1)},
       reduce: "_sum"
     },

     two: {
       map: function (doc) {emit(doc.age, 1)},
          reduce: "_count"
     }
   },

   options: {group_levels: [1,3,5]}
}
```
This gives the user the ability to trade off index build speed, storage
overhead and performance.

One caveat of that, for now, is if a user changes the number of
`group_levels` to be indexed, the index is invalidated and we would have to
build it from scratch again. Later we could look at doing some work around
that so that isn’t the case.

This design will result in a behaviour change. Previously with reduce if
you set `group_level=2`. It will return all results with `group_level=2`
and below. E.g  reduce key/values of the following:

```
# group = true
("key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2,6],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3,6],"value":1},
{"key":[3,1],"value":1},
{"key":[3,1,5],"value":1},
{"key":[3,4,5],"value":1}
```

Then doing a query group_level=2 returns:

```
# group_level = 2
{"rows":[
{"key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```

I want to **CHANGE** this behaviour, so if a query specifies
`group_level=2` then **only** `group_level=2` returns would be returned.
E.g from the example above the results would be:

```
# group_level = 2
{"rows":[
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```


## Group_level=0
`Group_level=0` queries would work as follows:
1. `group_level=0` without startkey/endkey and then the group_level=0 index
is used
2. For a `group_level=0` with a startkey/endkey or where `group_level=0` is
not indexed, the query will look for the smallest `group_level` and use
that to calculate the `group_level=0` result
3. `group_level=0` indexes with a startkey/endkey could timeout and be slow
in some cases because we having to do quite a lot of aggregation when
reading keys. But I don’t think that is much different from how it is done
now.

## Group=true
We will always build the `group=true` index.

## Querying non-indexed group_level
If a query has a `group_level` that is not indexed. We can do two things
here, CouchDB could use the nearest  `group_level` to service the query or
it could return an error that this `group_level` is not available to query.
I would like to make this configurable so that an admin can choose how
reduce indexes behave.

## Supported Builtin Reduces
Initially, we would support reduces that can be updated by calculating a
delta change and applying it to all the group_levels. That means we can
support `_sum` and `_count` quite easily. Initially, we won’t implement
`max` and `min`. However, I would like to add them as soon after.

I would also later like to add support for `_stats` reducer. The best
option I can think of is breaking up each field in `_stats` into its own
k/v row in FDB.

I’m not sure how to handle the `_approx_count_distinct`. At the moment I
don’t understand the algorithm well enough to know if we could support it.

## Custom Reduces
Custom reduces will not be supported. This is because it will be tricky to
implement with this design and would not be very performant. Later if there
is a demand for this I guess we could look at doing it but with the caveat
that performance will be pretty bad. My view on custom reduces are that in
99% of cases they are a bad choice so I would prefer we don’t support them.

## Replication
Some basic replication rules:
1. If a design document is replicated across from an old database and does
not have the group_level option set. All `group_levels` will be supported.

2. If a design document is replicated across with a custom reduce or a
reduce we don’t support we will return an error when that reduce index is
queried. I’m thinking we would still build the map part of the index.

## limits
Reduce keys will have the same limits as map keys which are 8KB.

Hopefully that all makes sense, let me know if I need to explain a point
further and I would love to hear your thoughts on this.

Re: [DISCUSS] New Reduce design for FDB

Posted by Glynn Bird <gl...@gmail.com>.

It's confusing that the group_level parameter would behave differently, as
Joan points out, between CouchDB versions. Perhaps if group_level were to
be deprecated in CouchDB 4+, to be replaced by a new parameter
group_depth/depth/grouping or some such, that behaves as Garren proposes?
Or does that make it even more confusing?

Glynn

On Wed, 24 Jun 2020 at 19:00, Joan Touzet <wo...@apache.org> wrote:

>
>
> On 2020-06-24 1:32 p.m., Garren Smith wrote:
> > On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wo...@apache.org> wrote:
> >
> >> Hi Garren,
> >>
> >> If the "options" field is left out, what is the default behaviour?
> >>
> >
> > All group_levels will be indexed. I imagine this is what most CouchDB
> uses
> > will want.
>
> Great!
>
> >
> >
> >> Is there no way to specify multiple group_levels to get results that
> >> match the original CouchDB behaviour? Your changed behaviour would be
> >> acceptable if I could do something like `?group_level=2,3,4,5`.
> >>
> >
> > I imagine we could, it would make the code a lot more complex. What is
> the
> > reason for that?
> > I find the fact that we return multiple group_levels for a set
> group_level
> > very confusing. To me it feels like
> > the reason we return extra group_levels is because of how b-tree's work
> > rather than it being a useful thing for a user.
>
> This is the canonical example (and the previous 2-3 slides)
>
>
> https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=25
>
> There are ways to do this with your approach, but they'll require
> retooling.
>
> >
> >
> >> -Joan
> >>
> >> On 24/06/2020 08:03, Garren Smith wrote:
> >>> Quick Note I have a gist markdown version of this that might be easier
> to
> >>> read
> >> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> >>>
> >>> Hi Everyone,
> >>>
> >>> The team at Cloudant have been relooking at Reduce indexes for CouchDB
> on
> >>> FDB and we want to simply what we had initially planned and change some
> >> of
> >>> the reduce behaviour compared to CouchDB 3.x
> >>>
> >>> Our initial design was to use a skip list. However this hasn’t proven
> to
> >> be
> >>> particularly useful approach. It would take very long to update and I
> >> can’t
> >>> find a good algorithm to query the skip list effectively.
> >>>
> >>> So instead I would like to propose a much simpler reduce
> implementation.
> >> I
> >>> would like to use this as the base for reduce and we can look at adding
> >>> more functionality later if we need to.
> >>>
> >>> For the new reduce design, instead of creating a skip list, we will
> >> instead
> >>> create group_level indexes for a key. For example say we have the
> >> following
> >>> keys we want to add to a reduce index:
> >>>
> >>> ```
> >>> ([2019, 6, 1] , 1)
> >>> ([2019, 6, 20] , 1)
> >>> ([2019, 7, 3] , 1)
> >>> ```
> >>>
> >>> We would then create the following group_level indexes:
> >>>
> >>> ```
> >>> Level 0:
> >>> (null, 3)
> >>>
> >>> Level=1:
> >>> ([2019], 3)
> >>>
> >>> Level 2:
> >>> ([2019,6], 2)
> >>> ([2019, 7] , 1)
> >>>
> >>> Level3:
> >>> ([2019, 6, 1,] , 1)
> >>> ([2019, 6, 20,] , 1)
> >>> ([2019, 7, 3,] , 1)
> >>> ```
> >>>
> >>> All of these group_level indexes would form part of the reduce index
> and
> >>> would be updated at the same time. We don’t need to know the actual
> >>> `group_levels` ahead of time as we would take any key we need to index
> >> look
> >>> at its length and add it to the group_levels it would belong to.
> >>>
> >>> Another nice optimization we can do with this is when a user creates a
> >> view
> >>> they can specify the number of group levels to index e.g:
> >>>
> >>> ```
> >>> {
> >>> _id: _design/my-ddoc
> >>>     views: {
> >>>        one: {
> >>>          map: function (doc) {emit(doc.val, 1)},
> >>>          reduce: "_sum"
> >>>        },
> >>>
> >>>        two: {
> >>>          map: function (doc) {emit(doc.age, 1)},
> >>>             reduce: "_count"
> >>>        }
> >>>      },
> >>>
> >>>      options: {group_levels: [1,3,5]}
> >>> }
> >>> ```
> >>> This gives the user the ability to trade off index build speed, storage
> >>> overhead and performance.
> >>>
> >>> One caveat of that, for now, is if a user changes the number of
> >>> `group_levels` to be indexed, the index is invalidated and we would
> have
> >> to
> >>> build it from scratch again. Later we could look at doing some work
> >> around
> >>> that so that isn’t the case.
> >>>
> >>> This design will result in a behaviour change. Previously with reduce
> if
> >>> you set `group_level=2`. It will return all results with
> `group_level=2`
> >>> and below. E.g  reduce key/values of the following:
> >>>
> >>> ```
> >>> # group = true
> >>> ("key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2,6],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3,6],"value":1},
> >>> {"key":[3,1],"value":1},
> >>> {"key":[3,1,5],"value":1},
> >>> {"key":[3,4,5],"value":1}
> >>> ```
> >>>
> >>> Then doing a query group_level=2 returns:
> >>>
> >>> ```
> >>> # group_level = 2
> >>> {"rows":[
> >>> {"key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>> I want to **CHANGE** this behaviour, so if a query specifies
> >>> `group_level=2` then **only** `group_level=2` returns would be
> returned.
> >>> E.g from the example above the results would be:
> >>>
> >>> ```
> >>> # group_level = 2
> >>> {"rows":[
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>>
> >>> ## Group_level=0
> >>> `Group_level=0` queries would work as follows:
> >>> 1. `group_level=0` without startkey/endkey and then the group_level=0
> >> index
> >>> is used
> >>> 2. For a `group_level=0` with a startkey/endkey or where
> `group_level=0`
> >> is
> >>> not indexed, the query will look for the smallest `group_level` and use
> >>> that to calculate the `group_level=0` result
> >>> 3. `group_level=0` indexes with a startkey/endkey could timeout and be
> >> slow
> >>> in some cases because we having to do quite a lot of aggregation when
> >>> reading keys. But I don’t think that is much different from how it is
> >> done
> >>> now.
> >>>
> >>> ## Group=true
> >>> We will always build the `group=true` index.
> >>>
> >>> ## Querying non-indexed group_level
> >>> If a query has a `group_level` that is not indexed. We can do two
> things
> >>> here, CouchDB could use the nearest  `group_level` to service the query
> >> or
> >>> it could return an error that this `group_level` is not available to
> >> query.
> >>> I would like to make this configurable so that an admin can choose how
> >>> reduce indexes behave.
> >>>
> >>> ## Supported Builtin Reduces
> >>> Initially, we would support reduces that can be updated by calculating
> a
> >>> delta change and applying it to all the group_levels. That means we can
> >>> support `_sum` and `_count` quite easily. Initially, we won’t implement
> >>> `max` and `min`. However, I would like to add them as soon after.
> >>>
> >>> I would also later like to add support for `_stats` reducer. The best
> >>> option I can think of is breaking up each field in `_stats` into its
> own
> >>> k/v row in FDB.
> >>>
> >>> I’m not sure how to handle the `_approx_count_distinct`. At the moment
> I
> >>> don’t understand the algorithm well enough to know if we could support
> >> it.
> >>>
> >>> ## Custom Reduces
> >>> Custom reduces will not be supported. This is because it will be tricky
> >> to
> >>> implement with this design and would not be very performant. Later if
> >> there
> >>> is a demand for this I guess we could look at doing it but with the
> >> caveat
> >>> that performance will be pretty bad. My view on custom reduces are that
> >> in
> >>> 99% of cases they are a bad choice so I would prefer we don’t support
> >> them.
> >>>
> >>> ## Replication
> >>> Some basic replication rules:
> >>> 1. If a design document is replicated across from an old database and
> >> does
> >>> not have the group_level option set. All `group_levels` will be
> >> supported.
> >>>
> >>> 2. If a design document is replicated across with a custom reduce or a
> >>> reduce we don’t support we will return an error when that reduce index
> is
> >>> queried. I’m thinking we would still build the map part of the index.
> >>>
> >>> ## limits
> >>> Reduce keys will have the same limits as map keys which are 8KB.
> >>>
> >>> Hopefully that all makes sense, let me know if I need to explain a
> point
> >>> further and I would love to hear your thoughts on this.
> >>>
> >>
> >
>

Re: [DISCUSS] New Reduce design for FDB

Posted by Garren Smith <ga...@apache.org>.

Thanks for the example. The new design wouldn't support that. The idea of
supporting multiple group_levels for query at once is interesting.

On Wed, Jun 24, 2020 at 8:00 PM Joan Touzet <wo...@apache.org> wrote:

>
>
> On 2020-06-24 1:32 p.m., Garren Smith wrote:
> > On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wo...@apache.org> wrote:
> >
> >> Hi Garren,
> >>
> >> If the "options" field is left out, what is the default behaviour?
> >>
> >
> > All group_levels will be indexed. I imagine this is what most CouchDB
> uses
> > will want.
>
> Great!
>
> >
> >
> >> Is there no way to specify multiple group_levels to get results that
> >> match the original CouchDB behaviour? Your changed behaviour would be
> >> acceptable if I could do something like `?group_level=2,3,4,5`.
> >>
> >
> > I imagine we could, it would make the code a lot more complex. What is
> the
> > reason for that?
> > I find the fact that we return multiple group_levels for a set
> group_level
> > very confusing. To me it feels like
> > the reason we return extra group_levels is because of how b-tree's work
> > rather than it being a useful thing for a user.
>
> This is the canonical example (and the previous 2-3 slides)
>
>
> https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=25
>
> There are ways to do this with your approach, but they'll require
> retooling.
>
> >
> >
> >> -Joan
> >>
> >> On 24/06/2020 08:03, Garren Smith wrote:
> >>> Quick Note I have a gist markdown version of this that might be easier
> to
> >>> read
> >> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> >>>
> >>> Hi Everyone,
> >>>
> >>> The team at Cloudant have been relooking at Reduce indexes for CouchDB
> on
> >>> FDB and we want to simply what we had initially planned and change some
> >> of
> >>> the reduce behaviour compared to CouchDB 3.x
> >>>
> >>> Our initial design was to use a skip list. However this hasn’t proven
> to
> >> be
> >>> particularly useful approach. It would take very long to update and I
> >> can’t
> >>> find a good algorithm to query the skip list effectively.
> >>>
> >>> So instead I would like to propose a much simpler reduce
> implementation.
> >> I
> >>> would like to use this as the base for reduce and we can look at adding
> >>> more functionality later if we need to.
> >>>
> >>> For the new reduce design, instead of creating a skip list, we will
> >> instead
> >>> create group_level indexes for a key. For example say we have the
> >> following
> >>> keys we want to add to a reduce index:
> >>>
> >>> ```
> >>> ([2019, 6, 1] , 1)
> >>> ([2019, 6, 20] , 1)
> >>> ([2019, 7, 3] , 1)
> >>> ```
> >>>
> >>> We would then create the following group_level indexes:
> >>>
> >>> ```
> >>> Level 0:
> >>> (null, 3)
> >>>
> >>> Level=1:
> >>> ([2019], 3)
> >>>
> >>> Level 2:
> >>> ([2019,6], 2)
> >>> ([2019, 7] , 1)
> >>>
> >>> Level3:
> >>> ([2019, 6, 1,] , 1)
> >>> ([2019, 6, 20,] , 1)
> >>> ([2019, 7, 3,] , 1)
> >>> ```
> >>>
> >>> All of these group_level indexes would form part of the reduce index
> and
> >>> would be updated at the same time. We don’t need to know the actual
> >>> `group_levels` ahead of time as we would take any key we need to index
> >> look
> >>> at its length and add it to the group_levels it would belong to.
> >>>
> >>> Another nice optimization we can do with this is when a user creates a
> >> view
> >>> they can specify the number of group levels to index e.g:
> >>>
> >>> ```
> >>> {
> >>> _id: _design/my-ddoc
> >>>     views: {
> >>>        one: {
> >>>          map: function (doc) {emit(doc.val, 1)},
> >>>          reduce: "_sum"
> >>>        },
> >>>
> >>>        two: {
> >>>          map: function (doc) {emit(doc.age, 1)},
> >>>             reduce: "_count"
> >>>        }
> >>>      },
> >>>
> >>>      options: {group_levels: [1,3,5]}
> >>> }
> >>> ```
> >>> This gives the user the ability to trade off index build speed, storage
> >>> overhead and performance.
> >>>
> >>> One caveat of that, for now, is if a user changes the number of
> >>> `group_levels` to be indexed, the index is invalidated and we would
> have
> >> to
> >>> build it from scratch again. Later we could look at doing some work
> >> around
> >>> that so that isn’t the case.
> >>>
> >>> This design will result in a behaviour change. Previously with reduce
> if
> >>> you set `group_level=2`. It will return all results with
> `group_level=2`
> >>> and below. E.g  reduce key/values of the following:
> >>>
> >>> ```
> >>> # group = true
> >>> ("key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2,6],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3,6],"value":1},
> >>> {"key":[3,1],"value":1},
> >>> {"key":[3,1,5],"value":1},
> >>> {"key":[3,4,5],"value":1}
> >>> ```
> >>>
> >>> Then doing a query group_level=2 returns:
> >>>
> >>> ```
> >>> # group_level = 2
> >>> {"rows":[
> >>> {"key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>> I want to **CHANGE** this behaviour, so if a query specifies
> >>> `group_level=2` then **only** `group_level=2` returns would be
> returned.
> >>> E.g from the example above the results would be:
> >>>
> >>> ```
> >>> # group_level = 2
> >>> {"rows":[
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>>
> >>> ## Group_level=0
> >>> `Group_level=0` queries would work as follows:
> >>> 1. `group_level=0` without startkey/endkey and then the group_level=0
> >> index
> >>> is used
> >>> 2. For a `group_level=0` with a startkey/endkey or where
> `group_level=0`
> >> is
> >>> not indexed, the query will look for the smallest `group_level` and use
> >>> that to calculate the `group_level=0` result
> >>> 3. `group_level=0` indexes with a startkey/endkey could timeout and be
> >> slow
> >>> in some cases because we having to do quite a lot of aggregation when
> >>> reading keys. But I don’t think that is much different from how it is
> >> done
> >>> now.
> >>>
> >>> ## Group=true
> >>> We will always build the `group=true` index.
> >>>
> >>> ## Querying non-indexed group_level
> >>> If a query has a `group_level` that is not indexed. We can do two
> things
> >>> here, CouchDB could use the nearest  `group_level` to service the query
> >> or
> >>> it could return an error that this `group_level` is not available to
> >> query.
> >>> I would like to make this configurable so that an admin can choose how
> >>> reduce indexes behave.
> >>>
> >>> ## Supported Builtin Reduces
> >>> Initially, we would support reduces that can be updated by calculating
> a
> >>> delta change and applying it to all the group_levels. That means we can
> >>> support `_sum` and `_count` quite easily. Initially, we won’t implement
> >>> `max` and `min`. However, I would like to add them as soon after.
> >>>
> >>> I would also later like to add support for `_stats` reducer. The best
> >>> option I can think of is breaking up each field in `_stats` into its
> own
> >>> k/v row in FDB.
> >>>
> >>> I’m not sure how to handle the `_approx_count_distinct`. At the moment
> I
> >>> don’t understand the algorithm well enough to know if we could support
> >> it.
> >>>
> >>> ## Custom Reduces
> >>> Custom reduces will not be supported. This is because it will be tricky
> >> to
> >>> implement with this design and would not be very performant. Later if
> >> there
> >>> is a demand for this I guess we could look at doing it but with the
> >> caveat
> >>> that performance will be pretty bad. My view on custom reduces are that
> >> in
> >>> 99% of cases they are a bad choice so I would prefer we don’t support
> >> them.
> >>>
> >>> ## Replication
> >>> Some basic replication rules:
> >>> 1. If a design document is replicated across from an old database and
> >> does
> >>> not have the group_level option set. All `group_levels` will be
> >> supported.
> >>>
> >>> 2. If a design document is replicated across with a custom reduce or a
> >>> reduce we don’t support we will return an error when that reduce index
> is
> >>> queried. I’m thinking we would still build the map part of the index.
> >>>
> >>> ## limits
> >>> Reduce keys will have the same limits as map keys which are 8KB.
> >>>
> >>> Hopefully that all makes sense, let me know if I need to explain a
> point
> >>> further and I would love to hear your thoughts on this.
> >>>
> >>
> >
>

Re: [DISCUSS] New Reduce design for FDB

Posted by Joan Touzet <wo...@apache.org>.


On 2020-06-24 1:32 p.m., Garren Smith wrote:
> On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wo...@apache.org> wrote:
> 
>> Hi Garren,
>>
>> If the "options" field is left out, what is the default behaviour?
>>
> 
> All group_levels will be indexed. I imagine this is what most CouchDB uses
> will want.

Great!

> 
> 
>> Is there no way to specify multiple group_levels to get results that
>> match the original CouchDB behaviour? Your changed behaviour would be
>> acceptable if I could do something like `?group_level=2,3,4,5`.
>>
> 
> I imagine we could, it would make the code a lot more complex. What is the
> reason for that?
> I find the fact that we return multiple group_levels for a set group_level
> very confusing. To me it feels like
> the reason we return extra group_levels is because of how b-tree's work
> rather than it being a useful thing for a user.

This is the canonical example (and the previous 2-3 slides)

https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=25

There are ways to do this with your approach, but they'll require retooling.

> 
> 
>> -Joan
>>
>> On 24/06/2020 08:03, Garren Smith wrote:
>>> Quick Note I have a gist markdown version of this that might be easier to
>>> read
>> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
>>>
>>> Hi Everyone,
>>>
>>> The team at Cloudant have been relooking at Reduce indexes for CouchDB on
>>> FDB and we want to simply what we had initially planned and change some
>> of
>>> the reduce behaviour compared to CouchDB 3.x
>>>
>>> Our initial design was to use a skip list. However this hasn’t proven to
>> be
>>> particularly useful approach. It would take very long to update and I
>> can’t
>>> find a good algorithm to query the skip list effectively.
>>>
>>> So instead I would like to propose a much simpler reduce implementation.
>> I
>>> would like to use this as the base for reduce and we can look at adding
>>> more functionality later if we need to.
>>>
>>> For the new reduce design, instead of creating a skip list, we will
>> instead
>>> create group_level indexes for a key. For example say we have the
>> following
>>> keys we want to add to a reduce index:
>>>
>>> ```
>>> ([2019, 6, 1] , 1)
>>> ([2019, 6, 20] , 1)
>>> ([2019, 7, 3] , 1)
>>> ```
>>>
>>> We would then create the following group_level indexes:
>>>
>>> ```
>>> Level 0:
>>> (null, 3)
>>>
>>> Level=1:
>>> ([2019], 3)
>>>
>>> Level 2:
>>> ([2019,6], 2)
>>> ([2019, 7] , 1)
>>>
>>> Level3:
>>> ([2019, 6, 1,] , 1)
>>> ([2019, 6, 20,] , 1)
>>> ([2019, 7, 3,] , 1)
>>> ```
>>>
>>> All of these group_level indexes would form part of the reduce index and
>>> would be updated at the same time. We don’t need to know the actual
>>> `group_levels` ahead of time as we would take any key we need to index
>> look
>>> at its length and add it to the group_levels it would belong to.
>>>
>>> Another nice optimization we can do with this is when a user creates a
>> view
>>> they can specify the number of group levels to index e.g:
>>>
>>> ```
>>> {
>>> _id: _design/my-ddoc
>>>     views: {
>>>        one: {
>>>          map: function (doc) {emit(doc.val, 1)},
>>>          reduce: "_sum"
>>>        },
>>>
>>>        two: {
>>>          map: function (doc) {emit(doc.age, 1)},
>>>             reduce: "_count"
>>>        }
>>>      },
>>>
>>>      options: {group_levels: [1,3,5]}
>>> }
>>> ```
>>> This gives the user the ability to trade off index build speed, storage
>>> overhead and performance.
>>>
>>> One caveat of that, for now, is if a user changes the number of
>>> `group_levels` to be indexed, the index is invalidated and we would have
>> to
>>> build it from scratch again. Later we could look at doing some work
>> around
>>> that so that isn’t the case.
>>>
>>> This design will result in a behaviour change. Previously with reduce if
>>> you set `group_level=2`. It will return all results with `group_level=2`
>>> and below. E.g  reduce key/values of the following:
>>>
>>> ```
>>> # group = true
>>> ("key":1,"value":2},
>>> {"key":2,"value":2},
>>> {"key":3,"value":2},
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2,6],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3,6],"value":1},
>>> {"key":[3,1],"value":1},
>>> {"key":[3,1,5],"value":1},
>>> {"key":[3,4,5],"value":1}
>>> ```
>>>
>>> Then doing a query group_level=2 returns:
>>>
>>> ```
>>> # group_level = 2
>>> {"rows":[
>>> {"key":1,"value":2},
>>> {"key":2,"value":2},
>>> {"key":3,"value":2},
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3],"value":1},
>>> {"key":[3,1],"value":2},
>>> {"key":[3,4],"value":1}
>>> ]}
>>> ```
>>>
>>> I want to **CHANGE** this behaviour, so if a query specifies
>>> `group_level=2` then **only** `group_level=2` returns would be returned.
>>> E.g from the example above the results would be:
>>>
>>> ```
>>> # group_level = 2
>>> {"rows":[
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3],"value":1},
>>> {"key":[3,1],"value":2},
>>> {"key":[3,4],"value":1}
>>> ]}
>>> ```
>>>
>>>
>>> ## Group_level=0
>>> `Group_level=0` queries would work as follows:
>>> 1. `group_level=0` without startkey/endkey and then the group_level=0
>> index
>>> is used
>>> 2. For a `group_level=0` with a startkey/endkey or where `group_level=0`
>> is
>>> not indexed, the query will look for the smallest `group_level` and use
>>> that to calculate the `group_level=0` result
>>> 3. `group_level=0` indexes with a startkey/endkey could timeout and be
>> slow
>>> in some cases because we having to do quite a lot of aggregation when
>>> reading keys. But I don’t think that is much different from how it is
>> done
>>> now.
>>>
>>> ## Group=true
>>> We will always build the `group=true` index.
>>>
>>> ## Querying non-indexed group_level
>>> If a query has a `group_level` that is not indexed. We can do two things
>>> here, CouchDB could use the nearest  `group_level` to service the query
>> or
>>> it could return an error that this `group_level` is not available to
>> query.
>>> I would like to make this configurable so that an admin can choose how
>>> reduce indexes behave.
>>>
>>> ## Supported Builtin Reduces
>>> Initially, we would support reduces that can be updated by calculating a
>>> delta change and applying it to all the group_levels. That means we can
>>> support `_sum` and `_count` quite easily. Initially, we won’t implement
>>> `max` and `min`. However, I would like to add them as soon after.
>>>
>>> I would also later like to add support for `_stats` reducer. The best
>>> option I can think of is breaking up each field in `_stats` into its own
>>> k/v row in FDB.
>>>
>>> I’m not sure how to handle the `_approx_count_distinct`. At the moment I
>>> don’t understand the algorithm well enough to know if we could support
>> it.
>>>
>>> ## Custom Reduces
>>> Custom reduces will not be supported. This is because it will be tricky
>> to
>>> implement with this design and would not be very performant. Later if
>> there
>>> is a demand for this I guess we could look at doing it but with the
>> caveat
>>> that performance will be pretty bad. My view on custom reduces are that
>> in
>>> 99% of cases they are a bad choice so I would prefer we don’t support
>> them.
>>>
>>> ## Replication
>>> Some basic replication rules:
>>> 1. If a design document is replicated across from an old database and
>> does
>>> not have the group_level option set. All `group_levels` will be
>> supported.
>>>
>>> 2. If a design document is replicated across with a custom reduce or a
>>> reduce we don’t support we will return an error when that reduce index is
>>> queried. I’m thinking we would still build the map part of the index.
>>>
>>> ## limits
>>> Reduce keys will have the same limits as map keys which are 8KB.
>>>
>>> Hopefully that all makes sense, let me know if I need to explain a point
>>> further and I would love to hear your thoughts on this.
>>>
>>
>

Re: [DISCUSS] New Reduce design for FDB

Posted by Garren Smith <ga...@apache.org>.

On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wo...@apache.org> wrote:

> Hi Garren,
>
> If the "options" field is left out, what is the default behaviour?
>

All group_levels will be indexed. I imagine this is what most CouchDB uses
will want.


> Is there no way to specify multiple group_levels to get results that
> match the original CouchDB behaviour? Your changed behaviour would be
> acceptable if I could do something like `?group_level=2,3,4,5`.
>

I imagine we could, it would make the code a lot more complex. What is the
reason for that?
I find the fact that we return multiple group_levels for a set group_level
very confusing. To me it feels like
the reason we return extra group_levels is because of how b-tree's work
rather than it being a useful thing for a user.


> -Joan
>
> On 24/06/2020 08:03, Garren Smith wrote:
> > Quick Note I have a gist markdown version of this that might be easier to
> > read
> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> >
> > Hi Everyone,
> >
> > The team at Cloudant have been relooking at Reduce indexes for CouchDB on
> > FDB and we want to simply what we had initially planned and change some
> of
> > the reduce behaviour compared to CouchDB 3.x
> >
> > Our initial design was to use a skip list. However this hasn’t proven to
> be
> > particularly useful approach. It would take very long to update and I
> can’t
> > find a good algorithm to query the skip list effectively.
> >
> > So instead I would like to propose a much simpler reduce implementation.
> I
> > would like to use this as the base for reduce and we can look at adding
> > more functionality later if we need to.
> >
> > For the new reduce design, instead of creating a skip list, we will
> instead
> > create group_level indexes for a key. For example say we have the
> following
> > keys we want to add to a reduce index:
> >
> > ```
> > ([2019, 6, 1] , 1)
> > ([2019, 6, 20] , 1)
> > ([2019, 7, 3] , 1)
> > ```
> >
> > We would then create the following group_level indexes:
> >
> > ```
> > Level 0:
> > (null, 3)
> >
> > Level=1:
> > ([2019], 3)
> >
> > Level 2:
> > ([2019,6], 2)
> > ([2019, 7] , 1)
> >
> > Level3:
> > ([2019, 6, 1,] , 1)
> > ([2019, 6, 20,] , 1)
> > ([2019, 7, 3,] , 1)
> > ```
> >
> > All of these group_level indexes would form part of the reduce index and
> > would be updated at the same time. We don’t need to know the actual
> > `group_levels` ahead of time as we would take any key we need to index
> look
> > at its length and add it to the group_levels it would belong to.
> >
> > Another nice optimization we can do with this is when a user creates a
> view
> > they can specify the number of group levels to index e.g:
> >
> > ```
> > {
> > _id: _design/my-ddoc
> >    views: {
> >       one: {
> >         map: function (doc) {emit(doc.val, 1)},
> >         reduce: "_sum"
> >       },
> >
> >       two: {
> >         map: function (doc) {emit(doc.age, 1)},
> >            reduce: "_count"
> >       }
> >     },
> >
> >     options: {group_levels: [1,3,5]}
> > }
> > ```
> > This gives the user the ability to trade off index build speed, storage
> > overhead and performance.
> >
> > One caveat of that, for now, is if a user changes the number of
> > `group_levels` to be indexed, the index is invalidated and we would have
> to
> > build it from scratch again. Later we could look at doing some work
> around
> > that so that isn’t the case.
> >
> > This design will result in a behaviour change. Previously with reduce if
> > you set `group_level=2`. It will return all results with `group_level=2`
> > and below. E.g  reduce key/values of the following:
> >
> > ```
> > # group = true
> > ("key":1,"value":2},
> > {"key":2,"value":2},
> > {"key":3,"value":2},
> > {"key":[1,1],"value":1},
> > {"key":[1,2,6],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3,6],"value":1},
> > {"key":[3,1],"value":1},
> > {"key":[3,1,5],"value":1},
> > {"key":[3,4,5],"value":1}
> > ```
> >
> > Then doing a query group_level=2 returns:
> >
> > ```
> > # group_level = 2
> > {"rows":[
> > {"key":1,"value":2},
> > {"key":2,"value":2},
> > {"key":3,"value":2},
> > {"key":[1,1],"value":1},
> > {"key":[1,2],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3],"value":1},
> > {"key":[3,1],"value":2},
> > {"key":[3,4],"value":1}
> > ]}
> > ```
> >
> > I want to **CHANGE** this behaviour, so if a query specifies
> > `group_level=2` then **only** `group_level=2` returns would be returned.
> > E.g from the example above the results would be:
> >
> > ```
> > # group_level = 2
> > {"rows":[
> > {"key":[1,1],"value":1},
> > {"key":[1,2],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3],"value":1},
> > {"key":[3,1],"value":2},
> > {"key":[3,4],"value":1}
> > ]}
> > ```
> >
> >
> > ## Group_level=0
> > `Group_level=0` queries would work as follows:
> > 1. `group_level=0` without startkey/endkey and then the group_level=0
> index
> > is used
> > 2. For a `group_level=0` with a startkey/endkey or where `group_level=0`
> is
> > not indexed, the query will look for the smallest `group_level` and use
> > that to calculate the `group_level=0` result
> > 3. `group_level=0` indexes with a startkey/endkey could timeout and be
> slow
> > in some cases because we having to do quite a lot of aggregation when
> > reading keys. But I don’t think that is much different from how it is
> done
> > now.
> >
> > ## Group=true
> > We will always build the `group=true` index.
> >
> > ## Querying non-indexed group_level
> > If a query has a `group_level` that is not indexed. We can do two things
> > here, CouchDB could use the nearest  `group_level` to service the query
> or
> > it could return an error that this `group_level` is not available to
> query.
> > I would like to make this configurable so that an admin can choose how
> > reduce indexes behave.
> >
> > ## Supported Builtin Reduces
> > Initially, we would support reduces that can be updated by calculating a
> > delta change and applying it to all the group_levels. That means we can
> > support `_sum` and `_count` quite easily. Initially, we won’t implement
> > `max` and `min`. However, I would like to add them as soon after.
> >
> > I would also later like to add support for `_stats` reducer. The best
> > option I can think of is breaking up each field in `_stats` into its own
> > k/v row in FDB.
> >
> > I’m not sure how to handle the `_approx_count_distinct`. At the moment I
> > don’t understand the algorithm well enough to know if we could support
> it.
> >
> > ## Custom Reduces
> > Custom reduces will not be supported. This is because it will be tricky
> to
> > implement with this design and would not be very performant. Later if
> there
> > is a demand for this I guess we could look at doing it but with the
> caveat
> > that performance will be pretty bad. My view on custom reduces are that
> in
> > 99% of cases they are a bad choice so I would prefer we don’t support
> them.
> >
> > ## Replication
> > Some basic replication rules:
> > 1. If a design document is replicated across from an old database and
> does
> > not have the group_level option set. All `group_levels` will be
> supported.
> >
> > 2. If a design document is replicated across with a custom reduce or a
> > reduce we don’t support we will return an error when that reduce index is
> > queried. I’m thinking we would still build the map part of the index.
> >
> > ## limits
> > Reduce keys will have the same limits as map keys which are 8KB.
> >
> > Hopefully that all makes sense, let me know if I need to explain a point
> > further and I would love to hear your thoughts on this.
> >
>

Re: [DISCUSS] New Reduce design for FDB

Posted by Joan Touzet <wo...@apache.org>.

Hi Garren,

If the "options" field is left out, what is the default behaviour?

Is there no way to specify multiple group_levels to get results that 
match the original CouchDB behaviour? Your changed behaviour would be 
acceptable if I could do something like `?group_level=2,3,4,5`.

-Joan

On 24/06/2020 08:03, Garren Smith wrote:
> Quick Note I have a gist markdown version of this that might be easier to
> read https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> 
> Hi Everyone,
> 
> The team at Cloudant have been relooking at Reduce indexes for CouchDB on
> FDB and we want to simply what we had initially planned and change some of
> the reduce behaviour compared to CouchDB 3.x
> 
> Our initial design was to use a skip list. However this hasn’t proven to be
> particularly useful approach. It would take very long to update and I can’t
> find a good algorithm to query the skip list effectively.
> 
> So instead I would like to propose a much simpler reduce implementation. I
> would like to use this as the base for reduce and we can look at adding
> more functionality later if we need to.
> 
> For the new reduce design, instead of creating a skip list, we will instead
> create group_level indexes for a key. For example say we have the following
> keys we want to add to a reduce index:
> 
> ```
> ([2019, 6, 1] , 1)
> ([2019, 6, 20] , 1)
> ([2019, 7, 3] , 1)
> ```
> 
> We would then create the following group_level indexes:
> 
> ```
> Level 0:
> (null, 3)
> 
> Level=1:
> ([2019], 3)
> 
> Level 2:
> ([2019,6], 2)
> ([2019, 7] , 1)
> 
> Level3:
> ([2019, 6, 1,] , 1)
> ([2019, 6, 20,] , 1)
> ([2019, 7, 3,] , 1)
> ```
> 
> All of these group_level indexes would form part of the reduce index and
> would be updated at the same time. We don’t need to know the actual
> `group_levels` ahead of time as we would take any key we need to index look
> at its length and add it to the group_levels it would belong to.
> 
> Another nice optimization we can do with this is when a user creates a view
> they can specify the number of group levels to index e.g:
> 
> ```
> {
> _id: _design/my-ddoc
>    views: {
>       one: {
>         map: function (doc) {emit(doc.val, 1)},
>         reduce: "_sum"
>       },
> 
>       two: {
>         map: function (doc) {emit(doc.age, 1)},
>            reduce: "_count"
>       }
>     },
> 
>     options: {group_levels: [1,3,5]}
> }
> ```
> This gives the user the ability to trade off index build speed, storage
> overhead and performance.
> 
> One caveat of that, for now, is if a user changes the number of
> `group_levels` to be indexed, the index is invalidated and we would have to
> build it from scratch again. Later we could look at doing some work around
> that so that isn’t the case.
> 
> This design will result in a behaviour change. Previously with reduce if
> you set `group_level=2`. It will return all results with `group_level=2`
> and below. E.g  reduce key/values of the following:
> 
> ```
> # group = true
> ("key":1,"value":2},
> {"key":2,"value":2},
> {"key":3,"value":2},
> {"key":[1,1],"value":1},
> {"key":[1,2,6],"value":1},
> {"key":[2,1],"value":1},
> {"key":[2,3,6],"value":1},
> {"key":[3,1],"value":1},
> {"key":[3,1,5],"value":1},
> {"key":[3,4,5],"value":1}
> ```
> 
> Then doing a query group_level=2 returns:
> 
> ```
> # group_level = 2
> {"rows":[
> {"key":1,"value":2},
> {"key":2,"value":2},
> {"key":3,"value":2},
> {"key":[1,1],"value":1},
> {"key":[1,2],"value":1},
> {"key":[2,1],"value":1},
> {"key":[2,3],"value":1},
> {"key":[3,1],"value":2},
> {"key":[3,4],"value":1}
> ]}
> ```
> 
> I want to **CHANGE** this behaviour, so if a query specifies
> `group_level=2` then **only** `group_level=2` returns would be returned.
> E.g from the example above the results would be:
> 
> ```
> # group_level = 2
> {"rows":[
> {"key":[1,1],"value":1},
> {"key":[1,2],"value":1},
> {"key":[2,1],"value":1},
> {"key":[2,3],"value":1},
> {"key":[3,1],"value":2},
> {"key":[3,4],"value":1}
> ]}
> ```
> 
> 
> ## Group_level=0
> `Group_level=0` queries would work as follows:
> 1. `group_level=0` without startkey/endkey and then the group_level=0 index
> is used
> 2. For a `group_level=0` with a startkey/endkey or where `group_level=0` is
> not indexed, the query will look for the smallest `group_level` and use
> that to calculate the `group_level=0` result
> 3. `group_level=0` indexes with a startkey/endkey could timeout and be slow
> in some cases because we having to do quite a lot of aggregation when
> reading keys. But I don’t think that is much different from how it is done
> now.
> 
> ## Group=true
> We will always build the `group=true` index.
> 
> ## Querying non-indexed group_level
> If a query has a `group_level` that is not indexed. We can do two things
> here, CouchDB could use the nearest  `group_level` to service the query or
> it could return an error that this `group_level` is not available to query.
> I would like to make this configurable so that an admin can choose how
> reduce indexes behave.
> 
> ## Supported Builtin Reduces
> Initially, we would support reduces that can be updated by calculating a
> delta change and applying it to all the group_levels. That means we can
> support `_sum` and `_count` quite easily. Initially, we won’t implement
> `max` and `min`. However, I would like to add them as soon after.
> 
> I would also later like to add support for `_stats` reducer. The best
> option I can think of is breaking up each field in `_stats` into its own
> k/v row in FDB.
> 
> I’m not sure how to handle the `_approx_count_distinct`. At the moment I
> don’t understand the algorithm well enough to know if we could support it.
> 
> ## Custom Reduces
> Custom reduces will not be supported. This is because it will be tricky to
> implement with this design and would not be very performant. Later if there
> is a demand for this I guess we could look at doing it but with the caveat
> that performance will be pretty bad. My view on custom reduces are that in
> 99% of cases they are a bad choice so I would prefer we don’t support them.
> 
> ## Replication
> Some basic replication rules:
> 1. If a design document is replicated across from an old database and does
> not have the group_level option set. All `group_levels` will be supported.
> 
> 2. If a design document is replicated across with a custom reduce or a
> reduce we don’t support we will return an error when that reduce index is
> queried. I’m thinking we would still build the map part of the index.
> 
> ## limits
> Reduce keys will have the same limits as map keys which are 8KB.
> 
> Hopefully that all makes sense, let me know if I need to explain a point
> further and I would love to hear your thoughts on this.
>