You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Osborn <ke...@cbsi.com> on 2013/05/06 18:48:49 UTC

how to quickly export data from SolrCloud

I am looking to export a large amount of data from Solr. This export will
be done by a Java application and then written to file. Initially, I was
thinking of using direct HTTP calls and using the CSV response writer. And
then my Java application can quickly parse each line from a stream.

But, with SolrCloud, I prefer to use SolrJ due to its communication with
Zookeeper. Is there any way to use the CSV response writer with SolrJ?

Would the overhead of using SolrJ's "solrbin" format be much slower than
the CSV response writer?

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: how to quickly export data from SolrCloud

Posted by Kevin Osborn <ke...@cbsi.com>.
This is actually something I will do quite frequently. I basically export
from Solr into a CSV file as part of a workflow sequence.

CSV is nice and fast, but does not have the ZooKeeper integration that I
like with SolrJ.


On Mon, May 6, 2013 at 10:11 AM, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/6/2013 10:48 AM, Kevin Osborn wrote:
>
>> I am looking to export a large amount of data from Solr. This export will
>> be done by a Java application and then written to file. Initially, I was
>> thinking of using direct HTTP calls and using the CSV response writer. And
>> then my Java application can quickly parse each line from a stream.
>>
>> But, with SolrCloud, I prefer to use SolrJ due to its communication with
>> Zookeeper. Is there any way to use the CSV response writer with SolrJ?
>>
>> Would the overhead of using SolrJ's "solrbin" format be much slower than
>> the CSV response writer?
>>
>
> What do you intend to do with the exported data?  If you're going to use
> it to import into a new Solr index, you might be better off using the
> dataimport handler with SolrEntityProcessor.  Just point it at one of your
> servers and include the collection name in the URL.
>
> If the export will have other uses and CSV format will work for you, that
> would probably be more efficient than something you could whip together
> quickly with SolrJ.  If you've got really excellent java skills and have a
> lot of time to work on it, you might be able to write something efficient,
> but Solr can already do it.
>
> If you plan to page through your data rather than grab it all with one
> query, it is MUCH more efficient to use a range query on a field with
> sequential data than to use the start and rows parameters.  This is
> *especially* true if you're using a sharded index, which is typically the
> case with SolrCloud.
>
> By the way, I am assuming that this process will be a one-time (or very
> rare) thing for migration purposes, or possibly something that you
> occasionally do for some kind of index verification.  If this is something
> that you'll be doing all the time, then you probably want to develop a
> SolrJ application.
>
> Thanks,
> Shawn
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: how to quickly export data from SolrCloud

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/6/2013 10:48 AM, Kevin Osborn wrote:
> I am looking to export a large amount of data from Solr. This export will
> be done by a Java application and then written to file. Initially, I was
> thinking of using direct HTTP calls and using the CSV response writer. And
> then my Java application can quickly parse each line from a stream.
>
> But, with SolrCloud, I prefer to use SolrJ due to its communication with
> Zookeeper. Is there any way to use the CSV response writer with SolrJ?
>
> Would the overhead of using SolrJ's "solrbin" format be much slower than
> the CSV response writer?

What do you intend to do with the exported data?  If you're going to use 
it to import into a new Solr index, you might be better off using the 
dataimport handler with SolrEntityProcessor.  Just point it at one of 
your servers and include the collection name in the URL.

If the export will have other uses and CSV format will work for you, 
that would probably be more efficient than something you could whip 
together quickly with SolrJ.  If you've got really excellent java skills 
and have a lot of time to work on it, you might be able to write 
something efficient, but Solr can already do it.

If you plan to page through your data rather than grab it all with one 
query, it is MUCH more efficient to use a range query on a field with 
sequential data than to use the start and rows parameters.  This is 
*especially* true if you're using a sharded index, which is typically 
the case with SolrCloud.

By the way, I am assuming that this process will be a one-time (or very 
rare) thing for migration purposes, or possibly something that you 
occasionally do for some kind of index verification.  If this is 
something that you'll be doing all the time, then you probably want to 
develop a SolrJ application.

Thanks,
Shawn