You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sushant Vengurlekar <sv...@curvolabs.com> on 2018/06/19 17:50:30 UTC

Import data from standalone solr into a solrcloud collection

I created a solr cloud collection with 2 shards and a replication factor of
2. How can I load data into this collection which I have currently stored
in a core on a standalone solr. I used the conf from this core on
standalone solr to create the collection on the solrcloud

Thank you

Re: Import data from standalone solr into a solrcloud collection

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/19/2018 11:50 AM, Sushant Vengurlekar wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud

Erick's suggestion of creating a collection with one shard and one
replica, then splitting the shard and adding replicas is one solution. 
If properly executed, it can work very well.

Another possibility is to create the collection with the number of
shards and replicas that you want right up front and then use the
dataimport handler to import documents from the standalone Solr.  One of
the sources you can use with DIH is another Solr install.

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor

If you're using a new enough version of SolrCloud (6.4 or later), you
should definitely be using cursorMark in the DIH config and a sort
parameter that includes a sort on the uniqueKey field.

Thanks,
Shawn


Re: Import data from standalone solr into a solrcloud collection

Posted by Erick Erickson <er...@gmail.com>.
Personally I'd start with a 1-shard, 1-replica collection (i.e. leader-only).

From there split the shard.

once all that had been done satisfactorily, just use the collections
API ADDREPLICA command to build out your collection to whatever degree
of redundancy you need.

Best,
Erick

On Tue, Jun 19, 2018 at 1:04 PM, Aroop Ganguly <ar...@icloud.com> wrote:
> I see.
> By definition of splitting, the new shards will have the same number of replicas as the original shard.
> You could use the replicationFactor>=2 to ensure that both of your solr nodes are used.
> You could also use the maxShardsPerNode parameter alone or in conjunction with the replicationFactor property to achieve your target state.
>
>
>
>> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar <sv...@curvolabs.com> wrote:
>>
>> Thank you Aroop
>>
>> After I import the data into the collection from the standalone solr core I
>> want to split it into 2 shards across 2 nodes that I have. So I will have
>> to set replicationfactor of 2 & numShards =2 ?
>>
>> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <ar...@icloud.com>
>> wrote:
>>
>>> Hi Sushant
>>>
>>> replicationFactor defaults to 1 and is not mandatory.
>>> numShards is mandatory, where you’d equate it to 1.
>>>
>>> Aroop
>>>
>>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>>> svengurlekar@curvolabs.com> wrote:
>>>>
>>>> Thank you Eric.
>>>>
>>>> In the create collection command I need to set the replication factor
>>>> though correct?
>>>>
>>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <erickerickson@gmail.com
>>>>
>>>> wrote:
>>>>
>>>>> Probably the easiest way would be to recreate your collection with 1
>>>>> shard. Then copy the index from your standalone setup.
>>>>>
>>>>> After verifying your setup, use the Collections SPLITSHARD command.
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>>>> <sv...@curvolabs.com> wrote:
>>>>>> I created a solr cloud collection with 2 shards and a replication
>>> factor
>>>>> of
>>>>>> 2. How can I load data into this collection which I have currently
>>> stored
>>>>>> in a core on a standalone solr. I used the conf from this core on
>>>>>> standalone solr to create the collection on the solrcloud
>>>>>>
>>>>>> Thank you
>>>>>
>>>
>>>
>

Re: Import data from standalone solr into a solrcloud collection

Posted by Aroop Ganguly <ar...@icloud.com>.
I see. 
By definition of splitting, the new shards will have the same number of replicas as the original shard.
You could use the replicationFactor>=2 to ensure that both of your solr nodes are used.
You could also use the maxShardsPerNode parameter alone or in conjunction with the replicationFactor property to achieve your target state.



> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar <sv...@curvolabs.com> wrote:
> 
> Thank you Aroop
> 
> After I import the data into the collection from the standalone solr core I
> want to split it into 2 shards across 2 nodes that I have. So I will have
> to set replicationfactor of 2 & numShards =2 ?
> 
> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <ar...@icloud.com>
> wrote:
> 
>> Hi Sushant
>> 
>> replicationFactor defaults to 1 and is not mandatory.
>> numShards is mandatory, where you’d equate it to 1.
>> 
>> Aroop
>> 
>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>> svengurlekar@curvolabs.com> wrote:
>>> 
>>> Thank you Eric.
>>> 
>>> In the create collection command I need to set the replication factor
>>> though correct?
>>> 
>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <erickerickson@gmail.com
>>> 
>>> wrote:
>>> 
>>>> Probably the easiest way would be to recreate your collection with 1
>>>> shard. Then copy the index from your standalone setup.
>>>> 
>>>> After verifying your setup, use the Collections SPLITSHARD command.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>>> <sv...@curvolabs.com> wrote:
>>>>> I created a solr cloud collection with 2 shards and a replication
>> factor
>>>> of
>>>>> 2. How can I load data into this collection which I have currently
>> stored
>>>>> in a core on a standalone solr. I used the conf from this core on
>>>>> standalone solr to create the collection on the solrcloud
>>>>> 
>>>>> Thank you
>>>> 
>> 
>> 


Re: Import data from standalone solr into a solrcloud collection

Posted by Sushant Vengurlekar <sv...@curvolabs.com>.
Thank you Aroop

After I import the data into the collection from the standalone solr core I
want to split it into 2 shards across 2 nodes that I have. So I will have
to set replicationfactor of 2 & numShards =2 ?

On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <ar...@icloud.com>
wrote:

> Hi Sushant
>
> replicationFactor defaults to 1 and is not mandatory.
> numShards is mandatory, where you’d equate it to 1.
>
> Aroop
>
> > On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
> svengurlekar@curvolabs.com> wrote:
> >
> > Thank you Eric.
> >
> > In the create collection command I need to set the replication factor
> > though correct?
> >
> > On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> Probably the easiest way would be to recreate your collection with 1
> >> shard. Then copy the index from your standalone setup.
> >>
> >> After verifying your setup, use the Collections SPLITSHARD command.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
> >> <sv...@curvolabs.com> wrote:
> >>> I created a solr cloud collection with 2 shards and a replication
> factor
> >> of
> >>> 2. How can I load data into this collection which I have currently
> stored
> >>> in a core on a standalone solr. I used the conf from this core on
> >>> standalone solr to create the collection on the solrcloud
> >>>
> >>> Thank you
> >>
>
>

Re: Import data from standalone solr into a solrcloud collection

Posted by Aroop Ganguly <ar...@icloud.com>.
Hi Sushant

replicationFactor defaults to 1 and is not mandatory.
numShards is mandatory, where you’d equate it to 1.

Aroop

> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <sv...@curvolabs.com> wrote:
> 
> Thank you Eric.
> 
> In the create collection command I need to set the replication factor
> though correct?
> 
> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> Probably the easiest way would be to recreate your collection with 1
>> shard. Then copy the index from your standalone setup.
>> 
>> After verifying your setup, use the Collections SPLITSHARD command.
>> 
>> Best,
>> Erick
>> 
>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>> <sv...@curvolabs.com> wrote:
>>> I created a solr cloud collection with 2 shards and a replication factor
>> of
>>> 2. How can I load data into this collection which I have currently stored
>>> in a core on a standalone solr. I used the conf from this core on
>>> standalone solr to create the collection on the solrcloud
>>> 
>>> Thank you
>> 


Re: Import data from standalone solr into a solrcloud collection

Posted by Sushant Vengurlekar <sv...@curvolabs.com>.
Thank you Eric.

In the create collection command I need to set the replication factor
though correct?

On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <er...@gmail.com>
wrote:

> Probably the easiest way would be to recreate your collection with 1
> shard. Then copy the index from your standalone setup.
>
> After verifying your setup, use the Collections SPLITSHARD command.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
> <sv...@curvolabs.com> wrote:
> > I created a solr cloud collection with 2 shards and a replication factor
> of
> > 2. How can I load data into this collection which I have currently stored
> > in a core on a standalone solr. I used the conf from this core on
> > standalone solr to create the collection on the solrcloud
> >
> > Thank you
>

Re: Import data from standalone solr into a solrcloud collection

Posted by Erick Erickson <er...@gmail.com>.
Probably the easiest way would be to recreate your collection with 1
shard. Then copy the index from your standalone setup.

After verifying your setup, use the Collections SPLITSHARD command.

Best,
Erick

On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
<sv...@curvolabs.com> wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud
>
> Thank you