You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Daniel Wertheim <da...@wertheim.se> on 2014/04/01 16:12:03 UTC

Regarding "Skip as fast as startkey"

Was looking at this: https://issues.apache.org/jira/browse/COUCHDB-1076

Which in the final comment states: "As far as I'm aware, skip is
equivalently fast to a startkey search"

Glanced at the latest documentation, which has this pre v.1.2 vs after
v1.2:
http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html

Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
integer from 1 to 1.700.543

13000ms for limit=10&skip=500000
48ms with startkey, startkeydocid, skip, limit

Seems to me that the documents should still say "Don't use this" or what am
I missing?

//Daniel

Re: Regarding "Skip as fast as startkey"

Posted by Garren Smith <ga...@apache.org>.
Hi Daniel,

skip is definitely the quick fix, easy way out if performance isn’t a major concern or you not skipping with particularly high values.

cheers
Garren

On 01 Apr 2014, at 7:22 PM, Daniel Wertheim <da...@wertheim.se> wrote:

> Pagination of data sets for me is not only for humans sitting and clicking
> next, but also e.g. to batch process a certain range as well. But yes, for
> a GUI solution where someone probably will not "go that high" I guess I
> agree.
> 
> Do I get it right, only doing skip and limit and not combining it with
> startkey & startkey_docid is more for taking the "easy way" out?
> 
> Thanks,
> 
> //Dan
> 
> 
> On 1 April 2014 17:36, Alexander Shorin <kx...@gmail.com> wrote:
> 
>> Thanks for additions, Garren
>> 
>> You remind me the important edge case: using just startkey_docid
>> pagination isn't always enough. For instance, both Futon and Fauxton
>> with older pagination are affected to the issue when the last row in
>> fetched view result contains the same docid and key as were used for
>> current request.
>> 
>> https://issues.apache.org/jira/browse/COUCHDB-2192
>> 
>> Ironically, it could be only solved by using mixed approach: using
>> startkey_docid as primary pagination method for performance and
>> fallback to skip one for the COUCHDB-2192 edge case. Hopefully, for
>> small numbers of skip there is no any performance degradation could be
>> noticed.
>> 
>> --
>> ,,,^..^,,,
>> 
>> 
>> On Tue, Apr 1, 2014 at 7:18 PM, Garren Smith <ga...@apache.org> wrote:
>>> Alexander raises some good points. I've been working a lot on
>> implementing pagination with Couchdb and I find there are tradeoffs for
>> using either skip or startkey.
>>> The new version of pagination that will land in Fauxton soon (
>> https://github.com/apache/couchdb/pull/194) uses skip.
>>> This is not ideal in all cases as mentioned in the reasons in this mail.
>> However its not that bad for low values for skip. I found for skip values
>> less than 100000, the response is not that bad.
>>> Depending on your user interface or problem you are trying to solve, how
>> many people are paginating from page 1 to a massive page where your skip
>> value would be > 100 000? So I think its a fair tradeoff.
>>> I know Alexander would disagree.
>>> 
>>> Using startkey and startkey_docid is fine. You have to do a lot more
>> checking of the values and make sure you do the correct json encoding. You
>> also have to use startkey_docid for views and then startkey for _all_docs.
>>> You also get an edge case where you are forced to use skip if the keys
>> for the view are all the same.
>>> 
>>> Going forward with Fauxton we will be implementing a new pagination
>> algorithm that uses startkey and startkey_docid but then drops down to skip
>> for the above mentioned edge case.
>>> 
>>> Cheers
>>> Garren
>>> 
>>> 
>>> On 01 Apr 2014, at 5:04 PM, Alexander Shorin <kx...@gmail.com> wrote:
>>> 
>>>> There are two issues with limit&skip pagination:
>>>> 
>>>> 1. Performance, as you already noticed with your benchmark. However,
>>>> this performance issue is only actual when you made first request with
>>>> high skip value. If you'll increase it slowly, request time wouldn't
>>>> be too much (but still request time would be too high).
>>>> 
>>>> 2. Design: http://www.reddit.com/comments/1ae0tl
>>>> 
>>>> I have a draft of updates for this article telling why you should stay
>>>> with startkey method. Will submit it soon.
>>>> 
>>>> 
>>>> --
>>>> ,,,^..^,,,
>>>> 
>>>> 
>>>> On Tue, Apr 1, 2014 at 6:12 PM, Daniel Wertheim <da...@wertheim.se>
>> wrote:
>>>>> Was looking at this:
>> https://issues.apache.org/jira/browse/COUCHDB-1076
>>>>> 
>>>>> Which in the final comment states: "As far as I'm aware, skip is
>>>>> equivalently fast to a startkey search"
>>>>> 
>>>>> Glanced at the latest documentation, which has this pre v.1.2 vs after
>>>>> v1.2:
>>>>> 
>> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
>>>>> 
>>>>> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
>>>>> integer from 1 to 1.700.543
>>>>> 
>>>>> 13000ms for limit=10&skip=500000
>>>>> 48ms with startkey, startkeydocid, skip, limit
>>>>> 
>>>>> Seems to me that the documents should still say "Don't use this" or
>> what am
>>>>> I missing?
>>>>> 
>>>>> //Daniel
>>> 
>> 


Re: Regarding "Skip as fast as startkey"

Posted by Daniel Wertheim <da...@wertheim.se>.
Pagination of data sets for me is not only for humans sitting and clicking
next, but also e.g. to batch process a certain range as well. But yes, for
a GUI solution where someone probably will not "go that high" I guess I
agree.

Do I get it right, only doing skip and limit and not combining it with
startkey & startkey_docid is more for taking the "easy way" out?

Thanks,

//Dan


On 1 April 2014 17:36, Alexander Shorin <kx...@gmail.com> wrote:

> Thanks for additions, Garren
>
> You remind me the important edge case: using just startkey_docid
> pagination isn't always enough. For instance, both Futon and Fauxton
> with older pagination are affected to the issue when the last row in
> fetched view result contains the same docid and key as were used for
> current request.
>
> https://issues.apache.org/jira/browse/COUCHDB-2192
>
> Ironically, it could be only solved by using mixed approach: using
> startkey_docid as primary pagination method for performance and
> fallback to skip one for the COUCHDB-2192 edge case. Hopefully, for
> small numbers of skip there is no any performance degradation could be
> noticed.
>
> --
> ,,,^..^,,,
>
>
> On Tue, Apr 1, 2014 at 7:18 PM, Garren Smith <ga...@apache.org> wrote:
> > Alexander raises some good points. I've been working a lot on
> implementing pagination with Couchdb and I find there are tradeoffs for
> using either skip or startkey.
> > The new version of pagination that will land in Fauxton soon (
> https://github.com/apache/couchdb/pull/194) uses skip.
> > This is not ideal in all cases as mentioned in the reasons in this mail.
> However its not that bad for low values for skip. I found for skip values
> less than 100000, the response is not that bad.
> > Depending on your user interface or problem you are trying to solve, how
> many people are paginating from page 1 to a massive page where your skip
> value would be > 100 000? So I think its a fair tradeoff.
> > I know Alexander would disagree.
> >
> > Using startkey and startkey_docid is fine. You have to do a lot more
> checking of the values and make sure you do the correct json encoding. You
> also have to use startkey_docid for views and then startkey for _all_docs.
> > You also get an edge case where you are forced to use skip if the keys
> for the view are all the same.
> >
> > Going forward with Fauxton we will be implementing a new pagination
> algorithm that uses startkey and startkey_docid but then drops down to skip
> for the above mentioned edge case.
> >
> > Cheers
> > Garren
> >
> >
> > On 01 Apr 2014, at 5:04 PM, Alexander Shorin <kx...@gmail.com> wrote:
> >
> >> There are two issues with limit&skip pagination:
> >>
> >> 1. Performance, as you already noticed with your benchmark. However,
> >> this performance issue is only actual when you made first request with
> >> high skip value. If you'll increase it slowly, request time wouldn't
> >> be too much (but still request time would be too high).
> >>
> >> 2. Design: http://www.reddit.com/comments/1ae0tl
> >>
> >> I have a draft of updates for this article telling why you should stay
> >> with startkey method. Will submit it soon.
> >>
> >>
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Tue, Apr 1, 2014 at 6:12 PM, Daniel Wertheim <da...@wertheim.se>
> wrote:
> >>> Was looking at this:
> https://issues.apache.org/jira/browse/COUCHDB-1076
> >>>
> >>> Which in the final comment states: "As far as I'm aware, skip is
> >>> equivalently fast to a startkey search"
> >>>
> >>> Glanced at the latest documentation, which has this pre v.1.2 vs after
> >>> v1.2:
> >>>
> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
> >>>
> >>> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
> >>> integer from 1 to 1.700.543
> >>>
> >>> 13000ms for limit=10&skip=500000
> >>> 48ms with startkey, startkeydocid, skip, limit
> >>>
> >>> Seems to me that the documents should still say "Don't use this" or
> what am
> >>> I missing?
> >>>
> >>> //Daniel
> >
>

Re: Regarding "Skip as fast as startkey"

Posted by Alexander Shorin <kx...@gmail.com>.
Thanks for additions, Garren

You remind me the important edge case: using just startkey_docid
pagination isn't always enough. For instance, both Futon and Fauxton
with older pagination are affected to the issue when the last row in
fetched view result contains the same docid and key as were used for
current request.

https://issues.apache.org/jira/browse/COUCHDB-2192

Ironically, it could be only solved by using mixed approach: using
startkey_docid as primary pagination method for performance and
fallback to skip one for the COUCHDB-2192 edge case. Hopefully, for
small numbers of skip there is no any performance degradation could be
noticed.

--
,,,^..^,,,


On Tue, Apr 1, 2014 at 7:18 PM, Garren Smith <ga...@apache.org> wrote:
> Alexander raises some good points. I’ve been working a lot on implementing pagination with Couchdb and I find there are tradeoffs for using either skip or startkey.
> The new version of pagination that will land in Fauxton soon (https://github.com/apache/couchdb/pull/194) uses skip.
> This is not ideal in all cases as mentioned in the reasons in this mail. However its not that bad for low values for skip. I found for skip values less than 100000, the response is not that bad.
> Depending on your user interface or problem you are trying to solve, how many people are paginating from page 1 to a massive page where your skip value would be > 100 000? So I think its a fair tradeoff.
> I know Alexander would disagree.
>
> Using startkey and startkey_docid is fine. You have to do a lot more checking of the values and make sure you do the correct json encoding. You also have to use startkey_docid for views and then startkey for _all_docs.
> You also get an edge case where you are forced to use skip if the keys for the view are all the same.
>
> Going forward with Fauxton we will be implementing a new pagination algorithm that uses startkey and startkey_docid but then drops down to skip for the above mentioned edge case.
>
> Cheers
> Garren
>
>
> On 01 Apr 2014, at 5:04 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> There are two issues with limit&skip pagination:
>>
>> 1. Performance, as you already noticed with your benchmark. However,
>> this performance issue is only actual when you made first request with
>> high skip value. If you'll increase it slowly, request time wouldn't
>> be too much (but still request time would be too high).
>>
>> 2. Design: http://www.reddit.com/comments/1ae0tl
>>
>> I have a draft of updates for this article telling why you should stay
>> with startkey method. Will submit it soon.
>>
>>
>> --
>> ,,,^..^,,,
>>
>>
>> On Tue, Apr 1, 2014 at 6:12 PM, Daniel Wertheim <da...@wertheim.se> wrote:
>>> Was looking at this: https://issues.apache.org/jira/browse/COUCHDB-1076
>>>
>>> Which in the final comment states: "As far as I'm aware, skip is
>>> equivalently fast to a startkey search"
>>>
>>> Glanced at the latest documentation, which has this pre v.1.2 vs after
>>> v1.2:
>>> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
>>>
>>> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
>>> integer from 1 to 1.700.543
>>>
>>> 13000ms for limit=10&skip=500000
>>> 48ms with startkey, startkeydocid, skip, limit
>>>
>>> Seems to me that the documents should still say "Don't use this" or what am
>>> I missing?
>>>
>>> //Daniel
>

Re: Regarding "Skip as fast as startkey"

Posted by Garren Smith <ga...@apache.org>.
Alexander raises some good points. I’ve been working a lot on implementing pagination with Couchdb and I find there are tradeoffs for using either skip or startkey.
The new version of pagination that will land in Fauxton soon (https://github.com/apache/couchdb/pull/194) uses skip. 
This is not ideal in all cases as mentioned in the reasons in this mail. However its not that bad for low values for skip. I found for skip values less than 100000, the response is not that bad. 
Depending on your user interface or problem you are trying to solve, how many people are paginating from page 1 to a massive page where your skip value would be > 100 000? So I think its a fair tradeoff. 
I know Alexander would disagree.

Using startkey and startkey_docid is fine. You have to do a lot more checking of the values and make sure you do the correct json encoding. You also have to use startkey_docid for views and then startkey for _all_docs. 
You also get an edge case where you are forced to use skip if the keys for the view are all the same. 

Going forward with Fauxton we will be implementing a new pagination algorithm that uses startkey and startkey_docid but then drops down to skip for the above mentioned edge case.

Cheers
Garren


On 01 Apr 2014, at 5:04 PM, Alexander Shorin <kx...@gmail.com> wrote:

> There are two issues with limit&skip pagination:
> 
> 1. Performance, as you already noticed with your benchmark. However,
> this performance issue is only actual when you made first request with
> high skip value. If you'll increase it slowly, request time wouldn't
> be too much (but still request time would be too high).
> 
> 2. Design: http://www.reddit.com/comments/1ae0tl
> 
> I have a draft of updates for this article telling why you should stay
> with startkey method. Will submit it soon.
> 
> 
> --
> ,,,^..^,,,
> 
> 
> On Tue, Apr 1, 2014 at 6:12 PM, Daniel Wertheim <da...@wertheim.se> wrote:
>> Was looking at this: https://issues.apache.org/jira/browse/COUCHDB-1076
>> 
>> Which in the final comment states: "As far as I'm aware, skip is
>> equivalently fast to a startkey search"
>> 
>> Glanced at the latest documentation, which has this pre v.1.2 vs after
>> v1.2:
>> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
>> 
>> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
>> integer from 1 to 1.700.543
>> 
>> 13000ms for limit=10&skip=500000
>> 48ms with startkey, startkeydocid, skip, limit
>> 
>> Seems to me that the documents should still say "Don't use this" or what am
>> I missing?
>> 
>> //Daniel


Re: Regarding "Skip as fast as startkey"

Posted by Alexander Shorin <kx...@gmail.com>.
There are two issues with limit&skip pagination:

1. Performance, as you already noticed with your benchmark. However,
this performance issue is only actual when you made first request with
high skip value. If you'll increase it slowly, request time wouldn't
be too much (but still request time would be too high).

2. Design: http://www.reddit.com/comments/1ae0tl

I have a draft of updates for this article telling why you should stay
with startkey method. Will submit it soon.


--
,,,^..^,,,


On Tue, Apr 1, 2014 at 6:12 PM, Daniel Wertheim <da...@wertheim.se> wrote:
> Was looking at this: https://issues.apache.org/jira/browse/COUCHDB-1076
>
> Which in the final comment states: "As far as I'm aware, skip is
> equivalently fast to a startkey search"
>
> Glanced at the latest documentation, which has this pre v.1.2 vs after
> v1.2:
> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
>
> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
> integer from 1 to 1.700.543
>
> 13000ms for limit=10&skip=500000
> 48ms with startkey, startkeydocid, skip, limit
>
> Seems to me that the documents should still say "Don't use this" or what am
> I missing?
>
> //Daniel

Re: Regarding "Skip as fast as startkey"

Posted by Robert Samuel Newson <rn...@apache.org>.
Ah, good reminder. Yes, it’s not true, several people have performed the timings and confirmed that skip (while faster than it was) is not as fast as using the right startkey.

B.

On 1 Apr 2014, at 15:12, Daniel Wertheim <da...@wertheim.se> wrote:

> Was looking at this: https://issues.apache.org/jira/browse/COUCHDB-1076
> 
> Which in the final comment states: "As far as I'm aware, skip is
> equivalently fast to a startkey search"
> 
> Glanced at the latest documentation, which has this pre v.1.2 vs after
> v1.2:
> http://couchdb.readthedocs.org/en/latest/couchapp/views/pagination.html
> 
> Did a quick test with 1.700.543  docs. CouchDb v1.5 Windows. Key is an
> integer from 1 to 1.700.543
> 
> 13000ms for limit=10&skip=500000
> 48ms with startkey, startkeydocid, skip, limit
> 
> Seems to me that the documents should still say "Don't use this" or what am
> I missing?
> 
> //Daniel