You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2022/02/02 17:03:46 UTC

[GitHub] [lucenenet] eladmarg opened a new issue #614: deep paging is slow

eladmarg opened a new issue #614:
URL: https://github.com/apache/lucenenet/issues/614


   Hi, 
   
   I don't know if this is the right place for this, but generally, this is a design of Lucene.
   
   I have ~ 3M documents index, while trying to paginate and sort to page ~25K, the query takes about 3-4 seconds.
   this is because the index can't skip the first 1M records, and need to do a full scan and sort all the items.
   
   the issue is happening when nothing is filtered, hence the collection is very big.
   Now, you would ask who will want page 25K - and the answer is: search engines. they don't care about filtering.
   and they do this on parallel and this causes very high CPU usage. 
   
   yes, I know there is SearchAfter, but I don't have the last document from previous page.
   yes, I know that I can use collector (which is a bit faster and uses only priority queue), but then the items won't be sorted.
   
   currently, I solved this by holding an array with the relevant sorted docIds, then I can only slice the range that I need. O(1) because it's already sorted.
   
   the downside is, of course, the need to maintain this to be consistent.
   I wonder if there is a better way to do faster deep paging.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on issue #614: deep paging is slow

Posted by GitBox <gi...@apache.org>.
rclabo commented on issue #614:
URL: https://github.com/apache/lucenenet/issues/614#issuecomment-1028320228


   >yes, I know there is SearchAfter, but I don't have the last document from previous page
   
   If `SearchAfter` is a possible solution for you then I'd be really tempted to find a way to hang onto the last document and cache it in memory in a way that is related to a session or cookie. Then you would have it for the next query.  Although if the search engine is asking for several "pages of results" in parallel then that won't necessarily work. But there may be ways to beat that.  One way is to only give the option to "get more results" rather then provide links to lots of future pages of results.  But if you are set on doing the later then you could maybe pre-cache the last result of each page that a link is shown for.  Just thinking out loud here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #614: deep paging is slow

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #614:
URL: https://github.com/apache/lucenenet/issues/614#issuecomment-1028746624


   @eladmarg 
   
   This sounds like a good question to either post on [StackOverflow](https://stackoverflow.com/questions/tagged/lucene.net) (including the `lucene` tag) or to the [Lucene team](https://lucene.apache.org/core/discussion.html), as it doesn't sound like this is a technology specific question.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org