You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sandy Ding <sa...@gmail.com> on 2015/01/06 06:30:58 UTC

How to limit the number of result sets of the 'export' handler

Using rows=xxx doesn't seem to work.
Is there a way to do this?

Re: How to limit the number of result sets of the 'export' handler

Posted by Joel Bernstein <jo...@gmail.com>.
Sandy,

Export uses a very different approach then the normal select approach.
Export uses an incremental stream sorting approach that won't run out of
memory when sorting very large result sets. And Export does not use stored
fields to return results, it uses docValues caches to return results.

The main limitation that you'll run into with export is that it's not
designed to export large text fields. You'll notice that it exports
multi-value string fields, but not text fields. So if your use-case doesn't
require you to export large blocks of text, then the export feature should
work for you.

You'll want to be using the 4.10.3 version of export which has an important
bug fix in it.



Joel Bernstein
Search Engineer at Heliosearch

On Wed, Jan 7, 2015 at 9:49 AM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I believe export is streaming and it avoids building various caches,
> so it will not blow up Solr's memory on large datasets.
>
> You can read a lot more details in the JIRA that introduced it:
> https://issues.apache.org/jira/browse/SOLR-5244
>
> I am not sure how it compares with deep-paging though.
>
> Regards,
>    Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 7 January 2015 at 01:26, Sandy Ding <sa...@gmail.com> wrote:
> > Thanks Alexandre.
> > I actually need the whole result set. But it is large(perhaps 10m-100m)
> and
> > I find select is slow.
> > How does export differ from select except that select will make
> distributed
> > requests and do the merge?
> > Will select with ‘distrib=false’ have comparable performance with export?
> >
> >
> > 2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch <ar...@gmail.com>:
> >
> >> Export was specifically designed to get everything which is very
> >> expensive otherwise.
> >>
> >> If you just want the subset, you might be better off with normal
> >> queries and/or with deep paging (cursor).
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> >>
> >>
> >> On 6 January 2015 at 00:30, Sandy Ding <sa...@gmail.com> wrote:
> >> > Using rows=xxx doesn't seem to work.
> >> > Is there a way to do this?
> >>
>

Re: How to limit the number of result sets of the 'export' handler

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I believe export is streaming and it avoids building various caches,
so it will not blow up Solr's memory on large datasets.

You can read a lot more details in the JIRA that introduced it:
https://issues.apache.org/jira/browse/SOLR-5244

I am not sure how it compares with deep-paging though.

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 7 January 2015 at 01:26, Sandy Ding <sa...@gmail.com> wrote:
> Thanks Alexandre.
> I actually need the whole result set. But it is large(perhaps 10m-100m) and
> I find select is slow.
> How does export differ from select except that select will make distributed
> requests and do the merge?
> Will select with ‘distrib=false’ have comparable performance with export?
>
>
> 2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch <ar...@gmail.com>:
>
>> Export was specifically designed to get everything which is very
>> expensive otherwise.
>>
>> If you just want the subset, you might be better off with normal
>> queries and/or with deep paging (cursor).
>>
>> Regards,
>>    Alex.
>> ----
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 6 January 2015 at 00:30, Sandy Ding <sa...@gmail.com> wrote:
>> > Using rows=xxx doesn't seem to work.
>> > Is there a way to do this?
>>

Re: How to limit the number of result sets of the 'export' handler

Posted by Sandy Ding <sa...@gmail.com>.
Thanks Alexandre.
I actually need the whole result set. But it is large(perhaps 10m-100m) and
I find select is slow.
How does export differ from select except that select will make distributed
requests and do the merge?
Will select with ‘distrib=false’ have comparable performance with export?


2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch <ar...@gmail.com>:

> Export was specifically designed to get everything which is very
> expensive otherwise.
>
> If you just want the subset, you might be better off with normal
> queries and/or with deep paging (cursor).
>
> Regards,
>    Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 6 January 2015 at 00:30, Sandy Ding <sa...@gmail.com> wrote:
> > Using rows=xxx doesn't seem to work.
> > Is there a way to do this?
>

Re: How to limit the number of result sets of the 'export' handler

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Export was specifically designed to get everything which is very
expensive otherwise.

If you just want the subset, you might be better off with normal
queries and/or with deep paging (cursor).

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 6 January 2015 at 00:30, Sandy Ding <sa...@gmail.com> wrote:
> Using rows=xxx doesn't seem to work.
> Is there a way to do this?