You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by root23 <s....@gmail.com> on 2018/05/18 15:09:25 UTC

Getting more documents from resultsSet

Hi all,
I am working on Solr 6. Our business requirement is that we need to return
2000 docs for every query we execute.
Now normally if i execute the same set to query with start=0 to rows=10. It
returns very fast(event for our most complex queries in like less then 3
seconds).
however the moment i add start=0 to rows =2000, the response time is like 30
seconds or so. 

I understand that solr has to do probably disk seek to get the documents
which might be the bottle neck in this case. 

Is there a way i can optimize around this knowingly that i might have to get
2000 results in one go and then might have to paginate also further and
showing 2000 results on each page. We could go to as much as 50 page.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Getting more documents from resultsSet

Posted by Pratik Patel <pr...@semandex.net>.

Using cursor marker might help as explained in this documentation
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

On Fri, May 18, 2018 at 4:13 PM, Deepak Goel <de...@gmail.com> wrote:

> I wonder if in-memory-filesystem would help...
>
> On Sat, 19 May 2018, 01:03 Erick Erickson, <er...@gmail.com>
> wrote:
>
> > If you only return fields that are docValue=true that'll largely
> > eliminate the disk seeks. 30 seconds does seem kind of excessive even
> > with disk seeks though.
> >
> > Here'r a reference:
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html
> >
> > Whenever I see anything like "...our business requirement is...", I
> > cringe. _Why_ is that a requirement? What is being done _for the user_
> > that requires 2000 documents? There may be legitimate reasons, but
> > there also may be better ways to get what you need. This may very well
> > be an XY problem.
> >
> > For instance, if you want to take the top 2,000 docs from query X and
> > score just those, see:
> > https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html,
> > specifically: ReRankQParserPlugin.
> >
> > Best,
> > Erick
> >
> > On Fri, May 18, 2018 at 11:09 AM, root23 <s....@gmail.com> wrote:
> > > Hi all,
> > > I am working on Solr 6. Our business requirement is that we need to
> > return
> > > 2000 docs for every query we execute.
> > > Now normally if i execute the same set to query with start=0 to
> rows=10.
> > It
> > > returns very fast(event for our most complex queries in like less then
> 3
> > > seconds).
> > > however the moment i add start=0 to rows =2000, the response time is
> > like 30
> > > seconds or so.
> > >
> > > I understand that solr has to do probably disk seek to get the
> documents
> > > which might be the bottle neck in this case.
> > >
> > > Is there a way i can optimize around this knowingly that i might have
> to
> > get
> > > 2000 results in one go and then might have to paginate also further and
> > > showing 2000 results on each page. We could go to as much as 50 page.
> > >
> > >
> > >
> > > --
> > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>

Re: Getting more documents from resultsSet

Posted by Deepak Goel <de...@gmail.com>.

I wonder if in-memory-filesystem would help...

On Sat, 19 May 2018, 01:03 Erick Erickson, <er...@gmail.com> wrote:

> If you only return fields that are docValue=true that'll largely
> eliminate the disk seeks. 30 seconds does seem kind of excessive even
> with disk seeks though.
>
> Here'r a reference:
> https://lucene.apache.org/solr/guide/6_6/docvalues.html
>
> Whenever I see anything like "...our business requirement is...", I
> cringe. _Why_ is that a requirement? What is being done _for the user_
> that requires 2000 documents? There may be legitimate reasons, but
> there also may be better ways to get what you need. This may very well
> be an XY problem.
>
> For instance, if you want to take the top 2,000 docs from query X and
> score just those, see:
> https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html,
> specifically: ReRankQParserPlugin.
>
> Best,
> Erick
>
> On Fri, May 18, 2018 at 11:09 AM, root23 <s....@gmail.com> wrote:
> > Hi all,
> > I am working on Solr 6. Our business requirement is that we need to
> return
> > 2000 docs for every query we execute.
> > Now normally if i execute the same set to query with start=0 to rows=10.
> It
> > returns very fast(event for our most complex queries in like less then 3
> > seconds).
> > however the moment i add start=0 to rows =2000, the response time is
> like 30
> > seconds or so.
> >
> > I understand that solr has to do probably disk seek to get the documents
> > which might be the bottle neck in this case.
> >
> > Is there a way i can optimize around this knowingly that i might have to
> get
> > 2000 results in one go and then might have to paginate also further and
> > showing 2000 results on each page. We could go to as much as 50 page.
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Getting more documents from resultsSet

Posted by Erick Erickson <er...@gmail.com>.

If you only return fields that are docValue=true that'll largely
eliminate the disk seeks. 30 seconds does seem kind of excessive even
with disk seeks though.

Here'r a reference: https://lucene.apache.org/solr/guide/6_6/docvalues.html

Whenever I see anything like "...our business requirement is...", I
cringe. _Why_ is that a requirement? What is being done _for the user_
that requires 2000 documents? There may be legitimate reasons, but
there also may be better ways to get what you need. This may very well
be an XY problem.

For instance, if you want to take the top 2,000 docs from query X and
score just those, see:
https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html,
specifically: ReRankQParserPlugin.

Best,
Erick

On Fri, May 18, 2018 at 11:09 AM, root23 <s....@gmail.com> wrote:
> Hi all,
> I am working on Solr 6. Our business requirement is that we need to return
> 2000 docs for every query we execute.
> Now normally if i execute the same set to query with start=0 to rows=10. It
> returns very fast(event for our most complex queries in like less then 3
> seconds).
> however the moment i add start=0 to rows =2000, the response time is like 30
> seconds or so.
>
> I understand that solr has to do probably disk seek to get the documents
> which might be the bottle neck in this case.
>
> Is there a way i can optimize around this knowingly that i might have to get
> 2000 results in one go and then might have to paginate also further and
> showing 2000 results on each page. We could go to as much as 50 page.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html