You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Salman Ansari <sa...@gmail.com> on 2016/01/20 12:55:14 UTC

Returning all documents in a collection

Hi,

I am looking for a way to return all documents from a collection.
Currently, I am restricted to specifying the number of rows using Solr.NET
but I am looking for a better approach to actually return all documents. If
I specify a huge number such as 1M, the processing takes a long time.

Any feedback/comment will be appreciated.

Regards,
Salman

Re: Returning all documents in a collection

Posted by Joel Bernstein <jo...@gmail.com>.
The limitations of the /export handler should already be documented.

Lot's of documentation still todo for Solr 6 around Streaming Expressions
and some left todo on SQL. The SQL interface in Solr 6 can also select and
sort entire result sets as it's built on top of the Streaming API.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jan 20, 2016 at 10:37 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> It would be nice to have an explicit section in the doc on the topic of
> "Dealing with Large Result Sets" to point people to the various approaches
> (paging, caching, export, streaming expressions, and how to select the best
> one for a given use case.)
>
> (And Joel is going to promise to update the doc for this stored field
> restriction, right?!)
>
> -- Jack Krupansky
>
> On Wed, Jan 20, 2016 at 9:38 AM, Joel Bernstein <jo...@gmail.com>
> wrote:
>
> > CloudSolrStream is available in Solr 5. The "search" streaming expression
> > can used or CloudSolrStream can be used in directly.
> >
> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> >
> > The export handler does not export stored fields though. It only exports
> > fields using DocValues caches. So you may need to re-index your data to
> use
> > this feature.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Jan 20, 2016 at 9:29 AM, Salman Ansari <sa...@gmail.com>
> > wrote:
> >
> > > Thanks Emir, Susheel and Jack for your responses. Just to update, I am
> > > using Solr Cloud plus I want to get the data completely without
> > pagination
> > > or cursor (I mean in one shot). Is there a way to do this in Solr?
> > >
> > > Regards,
> > > Salman
> > >
> > > On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky <
> > jack.krupansky@gmail.com>
> > > wrote:
> > >
> > > > Yes, Exporting Results Sets is the preferred and recommended
> technique
> > > for
> > > > returning all documents in a collection, or even simply for queries
> > that
> > > > select a large number of documents, all of which are to be returned.
> It
> > > > uses efficient streaming rather than paging.
> > > >
> > > > But... this great feature currently does not have support for
> > > > distributed/SolrCloud mode:
> > > > "The initial release treats all queries as non-distributed requests.
> So
> > > the
> > > > client is responsible for making the calls to each Solr instance and
> > > > merging the results.
> > > > Using SolrJ’s CloudSolrClient as a model, developers could build
> > clients
> > > > that automatically send requests to all the shards in a collection
> (or
> > > > multiple collections) and then merge the sorted sets any way they
> > wish."
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <
> susheel2777@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Salman,
> > > > >
> > > > > Please checkout the export functionality
> > > > >
> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> > > > >
> > > > > Thanks,
> > > > > Susheel
> > > > >
> > > > > On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> > > > > emir.arnautovic@sematext.com> wrote:
> > > > >
> > > > > > Hi Salman,
> > > > > > You should use cursors in order to avoid "deep paging issues".
> > Take a
> > > > > look
> > > > > > at
> > > > >
> > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> > > .
> > > > > >
> > > > > > Regards,
> > > > > > Emir
> > > > > >
> > > > > > --
> > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > > Management
> > > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 20.01.2016 12:55, Salman Ansari wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >>
> > > > > >> I am looking for a way to return all documents from a
> collection.
> > > > > >> Currently, I am restricted to specifying the number of rows
> using
> > > > > Solr.NET
> > > > > >> but I am looking for a better approach to actually return all
> > > > documents.
> > > > > >> If
> > > > > >> I specify a huge number such as 1M, the processing takes a long
> > > time.
> > > > > >>
> > > > > >> Any feedback/comment will be appreciated.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Salman
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Returning all documents in a collection

Posted by Jack Krupansky <ja...@gmail.com>.
It would be nice to have an explicit section in the doc on the topic of
"Dealing with Large Result Sets" to point people to the various approaches
(paging, caching, export, streaming expressions, and how to select the best
one for a given use case.)

(And Joel is going to promise to update the doc for this stored field
restriction, right?!)

-- Jack Krupansky

On Wed, Jan 20, 2016 at 9:38 AM, Joel Bernstein <jo...@gmail.com> wrote:

> CloudSolrStream is available in Solr 5. The "search" streaming expression
> can used or CloudSolrStream can be used in directly.
>
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> The export handler does not export stored fields though. It only exports
> fields using DocValues caches. So you may need to re-index your data to use
> this feature.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jan 20, 2016 at 9:29 AM, Salman Ansari <sa...@gmail.com>
> wrote:
>
> > Thanks Emir, Susheel and Jack for your responses. Just to update, I am
> > using Solr Cloud plus I want to get the data completely without
> pagination
> > or cursor (I mean in one shot). Is there a way to do this in Solr?
> >
> > Regards,
> > Salman
> >
> > On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky <
> jack.krupansky@gmail.com>
> > wrote:
> >
> > > Yes, Exporting Results Sets is the preferred and recommended technique
> > for
> > > returning all documents in a collection, or even simply for queries
> that
> > > select a large number of documents, all of which are to be returned. It
> > > uses efficient streaming rather than paging.
> > >
> > > But... this great feature currently does not have support for
> > > distributed/SolrCloud mode:
> > > "The initial release treats all queries as non-distributed requests. So
> > the
> > > client is responsible for making the calls to each Solr instance and
> > > merging the results.
> > > Using SolrJ’s CloudSolrClient as a model, developers could build
> clients
> > > that automatically send requests to all the shards in a collection (or
> > > multiple collections) and then merge the sorted sets any way they
> wish."
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <su...@gmail.com>
> > > wrote:
> > >
> > > > Hello Salman,
> > > >
> > > > Please checkout the export functionality
> > > >
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> > > >
> > > > Thanks,
> > > > Susheel
> > > >
> > > > On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> > > > emir.arnautovic@sematext.com> wrote:
> > > >
> > > > > Hi Salman,
> > > > > You should use cursors in order to avoid "deep paging issues".
> Take a
> > > > look
> > > > > at
> > > >
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> > .
> > > > >
> > > > > Regards,
> > > > > Emir
> > > > >
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > > >
> > > > > On 20.01.2016 12:55, Salman Ansari wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I am looking for a way to return all documents from a collection.
> > > > >> Currently, I am restricted to specifying the number of rows using
> > > > Solr.NET
> > > > >> but I am looking for a better approach to actually return all
> > > documents.
> > > > >> If
> > > > >> I specify a huge number such as 1M, the processing takes a long
> > time.
> > > > >>
> > > > >> Any feedback/comment will be appreciated.
> > > > >>
> > > > >> Regards,
> > > > >> Salman
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Returning all documents in a collection

Posted by Joel Bernstein <jo...@gmail.com>.
CloudSolrStream is available in Solr 5. The "search" streaming expression
can used or CloudSolrStream can be used in directly.

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

The export handler does not export stored fields though. It only exports
fields using DocValues caches. So you may need to re-index your data to use
this feature.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jan 20, 2016 at 9:29 AM, Salman Ansari <sa...@gmail.com>
wrote:

> Thanks Emir, Susheel and Jack for your responses. Just to update, I am
> using Solr Cloud plus I want to get the data completely without pagination
> or cursor (I mean in one shot). Is there a way to do this in Solr?
>
> Regards,
> Salman
>
> On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
> > Yes, Exporting Results Sets is the preferred and recommended technique
> for
> > returning all documents in a collection, or even simply for queries that
> > select a large number of documents, all of which are to be returned. It
> > uses efficient streaming rather than paging.
> >
> > But... this great feature currently does not have support for
> > distributed/SolrCloud mode:
> > "The initial release treats all queries as non-distributed requests. So
> the
> > client is responsible for making the calls to each Solr instance and
> > merging the results.
> > Using SolrJ’s CloudSolrClient as a model, developers could build clients
> > that automatically send requests to all the shards in a collection (or
> > multiple collections) and then merge the sorted sets any way they wish."
> >
> > -- Jack Krupansky
> >
> > On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <su...@gmail.com>
> > wrote:
> >
> > > Hello Salman,
> > >
> > > Please checkout the export functionality
> > > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> > >
> > > Thanks,
> > > Susheel
> > >
> > > On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> > > emir.arnautovic@sematext.com> wrote:
> > >
> > > > Hi Salman,
> > > > You should use cursors in order to avoid "deep paging issues". Take a
> > > look
> > > > at
> > > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> .
> > > >
> > > > Regards,
> > > > Emir
> > > >
> > > > --
> > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > > >
> > > > On 20.01.2016 12:55, Salman Ansari wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I am looking for a way to return all documents from a collection.
> > > >> Currently, I am restricted to specifying the number of rows using
> > > Solr.NET
> > > >> but I am looking for a better approach to actually return all
> > documents.
> > > >> If
> > > >> I specify a huge number such as 1M, the processing takes a long
> time.
> > > >>
> > > >> Any feedback/comment will be appreciated.
> > > >>
> > > >> Regards,
> > > >> Salman
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Returning all documents in a collection

Posted by Salman Ansari <sa...@gmail.com>.
Thanks Emir, Susheel and Jack for your responses. Just to update, I am
using Solr Cloud plus I want to get the data completely without pagination
or cursor (I mean in one shot). Is there a way to do this in Solr?

Regards,
Salman

On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> Yes, Exporting Results Sets is the preferred and recommended technique for
> returning all documents in a collection, or even simply for queries that
> select a large number of documents, all of which are to be returned. It
> uses efficient streaming rather than paging.
>
> But... this great feature currently does not have support for
> distributed/SolrCloud mode:
> "The initial release treats all queries as non-distributed requests. So the
> client is responsible for making the calls to each Solr instance and
> merging the results.
> Using SolrJ’s CloudSolrClient as a model, developers could build clients
> that automatically send requests to all the shards in a collection (or
> multiple collections) and then merge the sorted sets any way they wish."
>
> -- Jack Krupansky
>
> On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <su...@gmail.com>
> wrote:
>
> > Hello Salman,
> >
> > Please checkout the export functionality
> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> >
> > Thanks,
> > Susheel
> >
> > On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> > emir.arnautovic@sematext.com> wrote:
> >
> > > Hi Salman,
> > > You should use cursors in order to avoid "deep paging issues". Take a
> > look
> > > at
> > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
> > >
> > > Regards,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On 20.01.2016 12:55, Salman Ansari wrote:
> > >
> > >> Hi,
> > >>
> > >> I am looking for a way to return all documents from a collection.
> > >> Currently, I am restricted to specifying the number of rows using
> > Solr.NET
> > >> but I am looking for a better approach to actually return all
> documents.
> > >> If
> > >> I specify a huge number such as 1M, the processing takes a long time.
> > >>
> > >> Any feedback/comment will be appreciated.
> > >>
> > >> Regards,
> > >> Salman
> > >>
> > >>
> > >
> >
>

Re: Returning all documents in a collection

Posted by Jack Krupansky <ja...@gmail.com>.
Yes, Exporting Results Sets is the preferred and recommended technique for
returning all documents in a collection, or even simply for queries that
select a large number of documents, all of which are to be returned. It
uses efficient streaming rather than paging.

But... this great feature currently does not have support for
distributed/SolrCloud mode:
"The initial release treats all queries as non-distributed requests. So the
client is responsible for making the calls to each Solr instance and
merging the results.
Using SolrJ’s CloudSolrClient as a model, developers could build clients
that automatically send requests to all the shards in a collection (or
multiple collections) and then merge the sorted sets any way they wish."

-- Jack Krupansky

On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <su...@gmail.com>
wrote:

> Hello Salman,
>
> Please checkout the export functionality
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>
> Thanks,
> Susheel
>
> On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
> > Hi Salman,
> > You should use cursors in order to avoid "deep paging issues". Take a
> look
> > at
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
> >
> > Regards,
> > Emir
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On 20.01.2016 12:55, Salman Ansari wrote:
> >
> >> Hi,
> >>
> >> I am looking for a way to return all documents from a collection.
> >> Currently, I am restricted to specifying the number of rows using
> Solr.NET
> >> but I am looking for a better approach to actually return all documents.
> >> If
> >> I specify a huge number such as 1M, the processing takes a long time.
> >>
> >> Any feedback/comment will be appreciated.
> >>
> >> Regards,
> >> Salman
> >>
> >>
> >
>

Re: Returning all documents in a collection

Posted by Susheel Kumar <su...@gmail.com>.
Hello Salman,

Please checkout the export functionality
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

Thanks,
Susheel

On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi Salman,
> You should use cursors in order to avoid "deep paging issues". Take a look
> at https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>
> Regards,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 20.01.2016 12:55, Salman Ansari wrote:
>
>> Hi,
>>
>> I am looking for a way to return all documents from a collection.
>> Currently, I am restricted to specifying the number of rows using Solr.NET
>> but I am looking for a better approach to actually return all documents.
>> If
>> I specify a huge number such as 1M, the processing takes a long time.
>>
>> Any feedback/comment will be appreciated.
>>
>> Regards,
>> Salman
>>
>>
>

Re: Returning all documents in a collection

Posted by Emir Arnautovic <em...@sematext.com>.
Hi Salman,
You should use cursors in order to avoid "deep paging issues". Take a 
look at 
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.

Regards,
Emir

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 20.01.2016 12:55, Salman Ansari wrote:
> Hi,
>
> I am looking for a way to return all documents from a collection.
> Currently, I am restricted to specifying the number of rows using Solr.NET
> but I am looking for a better approach to actually return all documents. If
> I specify a huge number such as 1M, the processing takes a long time.
>
> Any feedback/comment will be appreciated.
>
> Regards,
> Salman
>