You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vadim Kisselmann <v....@googlemail.com> on 2012/02/08 12:26:54 UTC

How to reindex about 10Mio. docs

Hello folks,

i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another
Solr(1.4.1).
I changed my schema.xml (field types sing to slong), standard
replication would fail.
what is the fastest and smartest way to manage this?
this here sound great (EntityProcessor):
http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
But would it work with Solr 1.4.1?

Best Regards
Vadim

Re: How to reindex about 10Mio. docs

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi Otis,
thanks for your response:)
We had a solution yesterday. It works with an ruby script, curl and saxon/xslt.
The performance is great. We moved all the docs in 50000-batches to
prevent an overload of our machines.
Best regards
Vadim



2012/2/8 Otis Gospodnetic <ot...@yahoo.com>:
> Vadim,
>
> Would using xslt output help?
>
> Otis
> ----
> Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html
>
>
>
>>________________________________
>> From: Vadim Kisselmann <v....@googlemail.com>
>>To: solr-user@lucene.apache.org
>>Sent: Wednesday, February 8, 2012 7:09 AM
>>Subject: Re: How to reindex about 10Mio. docs
>>
>>Another problem appeared ;)
>>how can i export my docs in csv-format?
>>In Solr 3.1+ i can use the query-param &wt=csv, but in Solr 1.4.1?
>>Best Regards
>>Vadim
>>
>>
>>2012/2/8 Vadim Kisselmann <v....@googlemail.com>:
>>> Hi Ahmet,
>>> thanks for quick response:)
>>> I've already thought the same...
>>> And it will be a pain to export and import this huge doc-set as CSV.
>>> Do i have an another solution?
>>> Regards
>>> Vadim
>>>
>>>
>>> 2012/2/8 Ahmet Arslan <io...@yahoo.com>:
>>>>> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
>>>>> another
>>>>> Solr(1.4.1).
>>>>> I changed my schema.xml (field types sing to slong),
>>>>> standard
>>>>> replication would fail.
>>>>> what is the fastest and smartest way to manage this?
>>>>> this here sound great (EntityProcessor):
>>>>> http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
>>>>> But would it work with Solr 1.4.1?
>>>>
>>>> SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.
>>
>>
>>

Re: How to reindex about 10Mio. docs

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Vadim,

Would using xslt output help?

OtisĀ 
----
Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: Vadim Kisselmann <v....@googlemail.com>
>To: solr-user@lucene.apache.org 
>Sent: Wednesday, February 8, 2012 7:09 AM
>Subject: Re: How to reindex about 10Mio. docs
> 
>Another problem appeared ;)
>how can i export my docs in csv-format?
>In Solr 3.1+ i can use the query-param &wt=csv, but in Solr 1.4.1?
>Best Regards
>Vadim
>
>
>2012/2/8 Vadim Kisselmann <v....@googlemail.com>:
>> Hi Ahmet,
>> thanks for quick response:)
>> I've already thought the same...
>> And it will be a pain to export and import this huge doc-set as CSV.
>> Do i have an another solution?
>> Regards
>> Vadim
>>
>>
>> 2012/2/8 Ahmet Arslan <io...@yahoo.com>:
>>>> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
>>>> another
>>>> Solr(1.4.1).
>>>> I changed my schema.xml (field types sing to slong),
>>>> standard
>>>> replication would fail.
>>>> what is the fastest and smartest way to manage this?
>>>> this here sound great (EntityProcessor):
>>>> http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
>>>> But would it work with Solr 1.4.1?
>>>
>>> SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.
>
>
>

Re: How to reindex about 10Mio. docs

Posted by Vadim Kisselmann <v....@googlemail.com>.
Another problem appeared ;)
how can i export my docs in csv-format?
In Solr 3.1+ i can use the query-param &wt=csv, but in Solr 1.4.1?
Best Regards
Vadim


2012/2/8 Vadim Kisselmann <v....@googlemail.com>:
> Hi Ahmet,
> thanks for quick response:)
> I've already thought the same...
> And it will be a pain to export and import this huge doc-set as CSV.
> Do i have an another solution?
> Regards
> Vadim
>
>
> 2012/2/8 Ahmet Arslan <io...@yahoo.com>:
>>> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
>>> another
>>> Solr(1.4.1).
>>> I changed my schema.xml (field types sing to slong),
>>> standard
>>> replication would fail.
>>> what is the fastest and smartest way to manage this?
>>> this here sound great (EntityProcessor):
>>> http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
>>> But would it work with Solr 1.4.1?
>>
>> SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.

Re: How to reindex about 10Mio. docs

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi Ahmet,
thanks for quick response:)
I've already thought the same...
And it will be a pain to export and import this huge doc-set as CSV.
Do i have an another solution?
Regards
Vadim


2012/2/8 Ahmet Arslan <io...@yahoo.com>:
>> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
>> another
>> Solr(1.4.1).
>> I changed my schema.xml (field types sing to slong),
>> standard
>> replication would fail.
>> what is the fastest and smartest way to manage this?
>> this here sound great (EntityProcessor):
>> http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
>> But would it work with Solr 1.4.1?
>
> SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.

Re: How to reindex about 10Mio. docs

Posted by Ahmet Arslan <io...@yahoo.com>.
> i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
> another
> Solr(1.4.1).
> I changed my schema.xml (field types sing to slong),
> standard
> replication would fail.
> what is the fastest and smartest way to manage this?
> this here sound great (EntityProcessor):
> http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
> But would it work with Solr 1.4.1?

SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.