You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2020/03/26 07:11:56 UTC

_all_docs collation

Hi Everyone,

While working on the Mango implementation for FDB, I've noticed that
_all_docs has some weird
ordering collation. If you do something like GET /db/_all_docs?startkey={}
it will return all the documents even though in view collation an object is
ordered after strings. The reason I've noticed this is that in the
pouchdb-find tests we have a few tests that check that {selector: {_id:
{$gt: {}}} return all the docs in the database [0].

This ordering feels wrong to me, but I'm guessing its been around for a
while. Currently for _all_docs on FDB, we have it that if you did the above
startkey query, it would not return any documents as we are following the
view collation ordering.

I want to know whether we should keep the old _all_docs ordering or rather
standardize on view collation ordering everywhere?

I would prefer we change it, but I'm not sure the implications of that for
client libraries and users.
Changing it would be a breaking change, but since 4.0 is going to be a lot
of breaking change I think this would be our best chance to do this.

Cheers
Garren



[0]
https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20

Re: _all_docs collation

Posted by Garren Smith <ga...@apache.org>.
Awesome. Thanks for explaining that. I imagined it had good historical
reasoning. I've changed _all_docs in fdb to follow the raw collation
https://github.com/apache/couchdb/commit/9b325b75814418b85ffb3642a5115635416f56a8

On Tue, Mar 31, 2020 at 11:07 AM Jan Lehnardt <ja...@apache.org> wrote:

>
>
> > On 26. Mar 2020, at 11:18, Garren Smith <ga...@apache.org> wrote:
> >
> > Oh interesting, reading the documentation more carefully I see we have
> raw
> > collation
> >
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> > So _all_docs is using that and that explains why an object comes before a
> > string.
> > So do we want to keep raw collation for _all_docs?
>
>
> The reason for this is a simplified codepath and maybe even performance
> for regular database operations. _all_docs internally is the by-id index
> that performs any and all document reads and writes, so the original design
> tried make this as lean as possible generally. Since we do Unicode
> collation in a NIF, that’s an extra step we did not want to take at the
> time.
>
> I can’t judge the impact of this for FDB since we already have to do
> key-mangling, is another NIF call there that much of a problem? Has it ever
> been? NIFs have vastly improved since the original design, so I don’t
> really know. If it doesn’t make a performance difference, I would not
> object to changing the behaviour, if it would simplify our _all_docs code.
> That said, since we have the raw option and want to keep it, we’ll have two
> paths anyway and switching the default for one route doesn’t sound like a
> hard problem.
>
> That leaves compatibility. I’d wager that there are few cases which rely
> on raw collation in _all_docs, and for those, it’d be easy enough to adjust
> to the new world. That said, If there is no overwhelming reason to change
> the current behaviour, I’d say we keep things as-is.
>
> Best
> Jan
> —
>
>
> >
> > On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <gl...@gmail.com>
> wrote:
> >
> >> It's not something I was aware of, but it's certainly a known "feature",
> >> documented here:
> >> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
> >>
> >> (probably because all keys are strings in all_docs, whereas they can be
> all
> >> sorts of mixed types with a view, and ascii collation would be faster
> with
> >> that assumption)
> >>
> >> On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:
> >>
> >>> Hi Everyone,
> >>>
> >>> While working on the Mango implementation for FDB, I've noticed that
> >>> _all_docs has some weird
> >>> ordering collation. If you do something like GET
> >> /db/_all_docs?startkey={}
> >>> it will return all the documents even though in view collation an
> object
> >> is
> >>> ordered after strings. The reason I've noticed this is that in the
> >>> pouchdb-find tests we have a few tests that check that {selector: {_id:
> >>> {$gt: {}}} return all the docs in the database [0].
> >>>
> >>> This ordering feels wrong to me, but I'm guessing its been around for a
> >>> while. Currently for _all_docs on FDB, we have it that if you did the
> >> above
> >>> startkey query, it would not return any documents as we are following
> the
> >>> view collation ordering.
> >>>
> >>> I want to know whether we should keep the old _all_docs ordering or
> >> rather
> >>> standardize on view collation ordering everywhere?
> >>>
> >>> I would prefer we change it, but I'm not sure the implications of that
> >> for
> >>> client libraries and users.
> >>> Changing it would be a breaking change, but since 4.0 is going to be a
> >> lot
> >>> of breaking change I think this would be our best chance to do this.
> >>>
> >>> Cheers
> >>> Garren
> >>>
> >>>
> >>>
> >>> [0]
> >>>
> >>>
> >>
> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
> >>>
> >>
>
>

Re: _all_docs collation

Posted by Jan Lehnardt <ja...@apache.org>.

> On 26. Mar 2020, at 11:18, Garren Smith <ga...@apache.org> wrote:
> 
> Oh interesting, reading the documentation more carefully I see we have raw
> collation
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> So _all_docs is using that and that explains why an object comes before a
> string.
> So do we want to keep raw collation for _all_docs?


The reason for this is a simplified codepath and maybe even performance for regular database operations. _all_docs internally is the by-id index that performs any and all document reads and writes, so the original design tried make this as lean as possible generally. Since we do Unicode collation in a NIF, that’s an extra step we did not want to take at the time.

I can’t judge the impact of this for FDB since we already have to do key-mangling, is another NIF call there that much of a problem? Has it ever been? NIFs have vastly improved since the original design, so I don’t really know. If it doesn’t make a performance difference, I would not object to changing the behaviour, if it would simplify our _all_docs code. That said, since we have the raw option and want to keep it, we’ll have two paths anyway and switching the default for one route doesn’t sound like a hard problem.

That leaves compatibility. I’d wager that there are few cases which rely on raw collation in _all_docs, and for those, it’d be easy enough to adjust to the new world. That said, If there is no overwhelming reason to change the current behaviour, I’d say we keep things as-is.

Best
Jan
—


> 
> On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <gl...@gmail.com> wrote:
> 
>> It's not something I was aware of, but it's certainly a known "feature",
>> documented here:
>> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
>> 
>> (probably because all keys are strings in all_docs, whereas they can be all
>> sorts of mixed types with a view, and ascii collation would be faster with
>> that assumption)
>> 
>> On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:
>> 
>>> Hi Everyone,
>>> 
>>> While working on the Mango implementation for FDB, I've noticed that
>>> _all_docs has some weird
>>> ordering collation. If you do something like GET
>> /db/_all_docs?startkey={}
>>> it will return all the documents even though in view collation an object
>> is
>>> ordered after strings. The reason I've noticed this is that in the
>>> pouchdb-find tests we have a few tests that check that {selector: {_id:
>>> {$gt: {}}} return all the docs in the database [0].
>>> 
>>> This ordering feels wrong to me, but I'm guessing its been around for a
>>> while. Currently for _all_docs on FDB, we have it that if you did the
>> above
>>> startkey query, it would not return any documents as we are following the
>>> view collation ordering.
>>> 
>>> I want to know whether we should keep the old _all_docs ordering or
>> rather
>>> standardize on view collation ordering everywhere?
>>> 
>>> I would prefer we change it, but I'm not sure the implications of that
>> for
>>> client libraries and users.
>>> Changing it would be a breaking change, but since 4.0 is going to be a
>> lot
>>> of breaking change I think this would be our best chance to do this.
>>> 
>>> Cheers
>>> Garren
>>> 
>>> 
>>> 
>>> [0]
>>> 
>>> 
>> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
>>> 
>> 


Re: _all_docs collation

Posted by Garren Smith <ga...@apache.org>.
Honestly, I'm not sure. I'm fine with keeping raw collation. But it was
definitely confusing when thinking about _all_docs.


On Thu, Mar 26, 2020 at 2:43 PM Jonathan Hall <fl...@flimzy.com> wrote:

> Would changing _all_docs, but keeping raw collation as an option, really
> simplify anything anyway?
>
> Or are you proposing removing the raw collation option entirely?
>
> Or maybe you're proposing this more as a UX change, than a technical
> simplification?



>
> Jonathan
>
>
> On 3/26/20 11:18 AM, Garren Smith wrote:
> > Oh interesting, reading the documentation more carefully I see we have
> raw
> > collation
> >
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> > So _all_docs is using that and that explains why an object comes before a
> > string.
> > So do we want to keep raw collation for _all_docs?
> >
> > On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <gl...@gmail.com>
> wrote:
> >
> >> It's not something I was aware of, but it's certainly a known "feature",
> >> documented here:
> >> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
> >>
> >> (probably because all keys are strings in all_docs, whereas they can be
> all
> >> sorts of mixed types with a view, and ascii collation would be faster
> with
> >> that assumption)
> >>
> >> On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:
> >>
> >>> Hi Everyone,
> >>>
> >>> While working on the Mango implementation for FDB, I've noticed that
> >>> _all_docs has some weird
> >>> ordering collation. If you do something like GET
> >> /db/_all_docs?startkey={}
> >>> it will return all the documents even though in view collation an
> object
> >> is
> >>> ordered after strings. The reason I've noticed this is that in the
> >>> pouchdb-find tests we have a few tests that check that {selector: {_id:
> >>> {$gt: {}}} return all the docs in the database [0].
> >>>
> >>> This ordering feels wrong to me, but I'm guessing its been around for a
> >>> while. Currently for _all_docs on FDB, we have it that if you did the
> >> above
> >>> startkey query, it would not return any documents as we are following
> the
> >>> view collation ordering.
> >>>
> >>> I want to know whether we should keep the old _all_docs ordering or
> >> rather
> >>> standardize on view collation ordering everywhere?
> >>>
> >>> I would prefer we change it, but I'm not sure the implications of that
> >> for
> >>> client libraries and users.
> >>> Changing it would be a breaking change, but since 4.0 is going to be a
> >> lot
> >>> of breaking change I think this would be our best chance to do this.
> >>>
> >>> Cheers
> >>> Garren
> >>>
> >>>
> >>>
> >>> [0]
> >>>
> >>>
> >>
> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
>

Re: _all_docs collation

Posted by Jonathan Hall <fl...@flimzy.com>.
Would changing _all_docs, but keeping raw collation as an option, really 
simplify anything anyway?

Or are you proposing removing the raw collation option entirely?

Or maybe you're proposing this more as a UX change, than a technical 
simplification?

Jonathan


On 3/26/20 11:18 AM, Garren Smith wrote:
> Oh interesting, reading the documentation more carefully I see we have raw
> collation
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> So _all_docs is using that and that explains why an object comes before a
> string.
> So do we want to keep raw collation for _all_docs?
>
> On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <gl...@gmail.com> wrote:
>
>> It's not something I was aware of, but it's certainly a known "feature",
>> documented here:
>> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
>>
>> (probably because all keys are strings in all_docs, whereas they can be all
>> sorts of mixed types with a view, and ascii collation would be faster with
>> that assumption)
>>
>> On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:
>>
>>> Hi Everyone,
>>>
>>> While working on the Mango implementation for FDB, I've noticed that
>>> _all_docs has some weird
>>> ordering collation. If you do something like GET
>> /db/_all_docs?startkey={}
>>> it will return all the documents even though in view collation an object
>> is
>>> ordered after strings. The reason I've noticed this is that in the
>>> pouchdb-find tests we have a few tests that check that {selector: {_id:
>>> {$gt: {}}} return all the docs in the database [0].
>>>
>>> This ordering feels wrong to me, but I'm guessing its been around for a
>>> while. Currently for _all_docs on FDB, we have it that if you did the
>> above
>>> startkey query, it would not return any documents as we are following the
>>> view collation ordering.
>>>
>>> I want to know whether we should keep the old _all_docs ordering or
>> rather
>>> standardize on view collation ordering everywhere?
>>>
>>> I would prefer we change it, but I'm not sure the implications of that
>> for
>>> client libraries and users.
>>> Changing it would be a breaking change, but since 4.0 is going to be a
>> lot
>>> of breaking change I think this would be our best chance to do this.
>>>
>>> Cheers
>>> Garren
>>>
>>>
>>>
>>> [0]
>>>
>>>
>> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20

Re: _all_docs collation

Posted by Garren Smith <ga...@apache.org>.
Oh interesting, reading the documentation more carefully I see we have raw
collation
https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
So _all_docs is using that and that explains why an object comes before a
string.
So do we want to keep raw collation for _all_docs?

On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <gl...@gmail.com> wrote:

> It's not something I was aware of, but it's certainly a known "feature",
> documented here:
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
>
> (probably because all keys are strings in all_docs, whereas they can be all
> sorts of mixed types with a view, and ascii collation would be faster with
> that assumption)
>
> On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:
>
> > Hi Everyone,
> >
> > While working on the Mango implementation for FDB, I've noticed that
> > _all_docs has some weird
> > ordering collation. If you do something like GET
> /db/_all_docs?startkey={}
> > it will return all the documents even though in view collation an object
> is
> > ordered after strings. The reason I've noticed this is that in the
> > pouchdb-find tests we have a few tests that check that {selector: {_id:
> > {$gt: {}}} return all the docs in the database [0].
> >
> > This ordering feels wrong to me, but I'm guessing its been around for a
> > while. Currently for _all_docs on FDB, we have it that if you did the
> above
> > startkey query, it would not return any documents as we are following the
> > view collation ordering.
> >
> > I want to know whether we should keep the old _all_docs ordering or
> rather
> > standardize on view collation ordering everywhere?
> >
> > I would prefer we change it, but I'm not sure the implications of that
> for
> > client libraries and users.
> > Changing it would be a breaking change, but since 4.0 is going to be a
> lot
> > of breaking change I think this would be our best chance to do this.
> >
> > Cheers
> > Garren
> >
> >
> >
> > [0]
> >
> >
> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
> >
>

Re: _all_docs collation

Posted by Glynn Bird <gl...@gmail.com>.
It's not something I was aware of, but it's certainly a known "feature",
documented here:
https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs

(probably because all keys are strings in all_docs, whereas they can be all
sorts of mixed types with a view, and ascii collation would be faster with
that assumption)

On Thu, 26 Mar 2020 at 07:12, Garren Smith <ga...@apache.org> wrote:

> Hi Everyone,
>
> While working on the Mango implementation for FDB, I've noticed that
> _all_docs has some weird
> ordering collation. If you do something like GET /db/_all_docs?startkey={}
> it will return all the documents even though in view collation an object is
> ordered after strings. The reason I've noticed this is that in the
> pouchdb-find tests we have a few tests that check that {selector: {_id:
> {$gt: {}}} return all the docs in the database [0].
>
> This ordering feels wrong to me, but I'm guessing its been around for a
> while. Currently for _all_docs on FDB, we have it that if you did the above
> startkey query, it would not return any documents as we are following the
> view collation ordering.
>
> I want to know whether we should keep the old _all_docs ordering or rather
> standardize on view collation ordering everywhere?
>
> I would prefer we change it, but I'm not sure the implications of that for
> client libraries and users.
> Changing it would be a breaking change, but since 4.0 is going to be a lot
> of breaking change I think this would be our best chance to do this.
>
> Cheers
> Garren
>
>
>
> [0]
>
> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
>