You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Beale, Jim (US-KOP)" <Ji...@hibu.com> on 2013/11/15 17:47:57 UTC

SolrCloud question

Hello all,

I am trying to set up a SolrCloud deployment consisting of 5 boxes each of which is running Solr under jetty.  A zookeeper ensemble is running separately on 3 of the boxes.

Each Solr instance has 2 cores, one of which is sharded across the five boxes and the other not sharded at all because it is a much smaller index.  numShards is set to 5 in the command to start jetty, -DnumShards=5.

It turns out that getting this configuration to work is not as easy as I had hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core setup, you currently have to settle for the same
numShards for every core."  Unfortunately that JIRA was closed without any implementation.

Is this limitation still in effect?  Does the new core discovery mode offer anything in this regard?

Is there any way at all to deploy two cores with different numShards?

How hard would it be to implement this?  Is it compatible with the architecture of Solr 5?

Thanks,
Jim Beale


The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

Re: SolrCloud question

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/18/2013 3:53 PM, Beale, Jim (US-KOP) wrote:
> I shouldn't be configuring the replication handler?  I didn't know that!
> 
> The documentation describes how to do it, e.g., for Solr 4.6
> 
> https://cwiki.apache.org/confluence/display/solr/Index+Replication
> 
> Now I'm evenmore confused than ever.  If a replication handler isn't defined, then I get "replication handler isn't defined" errors in the logs, and the added core fails to do anything.

The replication handler must be *defined* -- but not actually
configured.  SolrCloud handles all replication details itself -
assigning master and slave designations and initiating any replication
that is required are handled dynamically at the moment they are needed,
not in solrconfig.xml.

The following is what I have for replication config on my SolrCloud
setup.  This is all you need.

<requestHandler name="/replication" class="solr.ReplicationHandler">
</requestHandler>

XML generally allows you to use "/>" to end the opening tag and remove
the closing tag, but I haven't done that with my config.

> It seems like such a simple task: create 1 sharded and 1 unsharded core.
> 
> But nothing I've tried so far works.  Why can't numShards be a property of the core???

The numShards parameter is a property of the *collection*.  It's not
something that gets defined at the core level.  When you create your
collection using the collections API, you can include the numShards
parameter.  For an unsharded collection, use numShards=1.

Thanks,
Shawn


RE: SolrCloud question

Posted by "Beale, Jim (US-KOP)" <Ji...@hibu.com>.
I shouldn't be configuring the replication handler?  I didn't know that!

The documentation describes how to do it, e.g., for Solr 4.6

https://cwiki.apache.org/confluence/display/solr/Index+Replication

Now I'm evenmore confused than ever.  If a replication handler isn't defined, then I get "replication handler isn't defined" errors in the logs, and the added core fails to do anything.

It seems like such a simple task: create 1 sharded and 1 unsharded core.

But nothing I've tried so far works.  Why can't numShards be a property of the core???

Thanks,
Jim



-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com]
Sent: Monday, November 18, 2013 5:00 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud question

You shouldn't be configuring the replication handler if you are using solrcloud.

- Mark

On Nov 18, 2013, at 3:51 PM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> Thanks Michael,
>
> I am having a terrible time getting this non-sharded index up.  Everything I try leads to a dead-end.
>
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5
>
> it uses the solrconfig.xml from another core.  That solrconfig.xml is deployed in conjunction with a solrcore.properties and the replication handler is configured with properties from that core's solrcore.properties file.  The CREATE action uses the solrconfig.xml but not the properties so it fails.
>
> I tried to upload a different solrconfig.xml to zookeeper using the zkcli script -cmd upconfig and then to specify that config in the creation of the TP core like so
>
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5&collection.configName=solrconfigTP.xml
>
> However, how can replication masters and slaves be configured with a single solrconfig.xml file unless each node is allowed to have its own config?
>
> This is a royal PITA. I may be wrong, but I think it is broken.  Without a way to specify numShards per core in solr.xml, it seems impossible to have one sharded core and one non-sharded core.
>
> To be honest, I don't even care about replication.  Why can't I specify a core that is non-sharded, non-replicated and have the exact same core on all five of my boxes?
>
>
>
> Thanks,
> Jim
>
>
> -----Original Message-----
> From: michael.boom [mailto:my_sky_mc@yahoo.com]
> Sent: Monday, November 18, 2013 7:14 AM
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud question
>
> Hi,
>
> The CollectionAPI provides some more options that will prove to be very
> usefull to you:
> /admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname
>
> Have a look at:
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> Regarding your observations:
> 1. Completely normal, that's standard naming
> 2. When you created the collection you did not specify a configuration so
> the new collection will use the conf already stored in ZK. If you have more
> than one not sure which one will be picked as default.
> 3. You should be able to create replicas, by adding new cores on the other
> machines, and specifying the collection name and shard id. The data will
> then be replicated automatically to the new node. If you already tried that
> and get errors/problems while doing it provide some more details.
>
> As far as i know you should be able to move/replace the index data, as long
> as the source collection has the same config as the target collection.
> Afterwards you'll have to reload your core / restart the Solr instance - not
> sure which one will do it - most likely the latter.
> But it will be easier if you use the method described at point 3 above.
> Please someone correct me, if i'm wrong.
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

Re: SolrCloud question

Posted by Mark Miller <ma...@gmail.com>.
You shouldn’t be configuring the replication handler if you are using solrcloud.

- Mark

On Nov 18, 2013, at 3:51 PM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> Thanks Michael,
> 
> I am having a terrible time getting this non-sharded index up.  Everything I try leads to a dead-end.
> 
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5
> 
> it uses the solrconfig.xml from another core.  That solrconfig.xml is deployed in conjunction with a solrcore.properties and the replication handler is configured with properties from that core's solrcore.properties file.  The CREATE action uses the solrconfig.xml but not the properties so it fails.
> 
> I tried to upload a different solrconfig.xml to zookeeper using the zkcli script -cmd upconfig and then to specify that config in the creation of the TP core like so
> 
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5&collection.configName=solrconfigTP.xml
> 
> However, how can replication masters and slaves be configured with a single solrconfig.xml file unless each node is allowed to have its own config?
> 
> This is a royal PITA. I may be wrong, but I think it is broken.  Without a way to specify numShards per core in solr.xml, it seems impossible to have one sharded core and one non-sharded core.
> 
> To be honest, I don't even care about replication.  Why can't I specify a core that is non-sharded, non-replicated and have the exact same core on all five of my boxes?
> 
> 
> 
> Thanks,
> Jim
> 
> 
> -----Original Message-----
> From: michael.boom [mailto:my_sky_mc@yahoo.com]
> Sent: Monday, November 18, 2013 7:14 AM
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud question
> 
> Hi,
> 
> The CollectionAPI provides some more options that will prove to be very
> usefull to you:
> /admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname
> 
> Have a look at:
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> 
> Regarding your observations:
> 1. Completely normal, that's standard naming
> 2. When you created the collection you did not specify a configuration so
> the new collection will use the conf already stored in ZK. If you have more
> than one not sure which one will be picked as default.
> 3. You should be able to create replicas, by adding new cores on the other
> machines, and specifying the collection name and shard id. The data will
> then be replicated automatically to the new node. If you already tried that
> and get errors/problems while doing it provide some more details.
> 
> As far as i know you should be able to move/replace the index data, as long
> as the source collection has the same config as the target collection.
> Afterwards you'll have to reload your core / restart the Solr instance - not
> sure which one will do it - most likely the latter.
> But it will be easier if you use the method described at point 3 above.
> Please someone correct me, if i'm wrong.
> 
> 
> 
> -----
> Thanks,
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.


RE: SolrCloud question

Posted by "Beale, Jim (US-KOP)" <Ji...@hibu.com>.
Thanks Michael,

I am having a terrible time getting this non-sharded index up.  Everything I try leads to a dead-end.

http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5

it uses the solrconfig.xml from another core.  That solrconfig.xml is deployed in conjunction with a solrcore.properties and the replication handler is configured with properties from that core's solrcore.properties file.  The CREATE action uses the solrconfig.xml but not the properties so it fails.

I tried to upload a different solrconfig.xml to zookeeper using the zkcli script -cmd upconfig and then to specify that config in the creation of the TP core like so

http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5&collection.configName=solrconfigTP.xml

However, how can replication masters and slaves be configured with a single solrconfig.xml file unless each node is allowed to have its own config?

This is a royal PITA. I may be wrong, but I think it is broken.  Without a way to specify numShards per core in solr.xml, it seems impossible to have one sharded core and one non-sharded core.

To be honest, I don't even care about replication.  Why can't I specify a core that is non-sharded, non-replicated and have the exact same core on all five of my boxes?



Thanks,
Jim


-----Original Message-----
From: michael.boom [mailto:my_sky_mc@yahoo.com]
Sent: Monday, November 18, 2013 7:14 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud question

Hi,

The CollectionAPI provides some more options that will prove to be very
usefull to you:
/admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname

Have a look at:
https://cwiki.apache.org/confluence/display/solr/Collections+API

Regarding your observations:
1. Completely normal, that's standard naming
2. When you created the collection you did not specify a configuration so
the new collection will use the conf already stored in ZK. If you have more
than one not sure which one will be picked as default.
3. You should be able to create replicas, by adding new cores on the other
machines, and specifying the collection name and shard id. The data will
then be replicated automatically to the new node. If you already tried that
and get errors/problems while doing it provide some more details.

As far as i know you should be able to move/replace the index data, as long
as the source collection has the same config as the target collection.
Afterwards you'll have to reload your core / restart the Solr instance - not
sure which one will do it - most likely the latter.
But it will be easier if you use the method described at point 3 above.
Please someone correct me, if i'm wrong.



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
Sent from the Solr - User mailing list archive at Nabble.com.
The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

RE: SolrCloud question

Posted by "michael.boom" <my...@yahoo.com>.
Hi,

The CollectionAPI provides some more options that will prove to be very
usefull to you:
/admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname

Have a look at:
https://cwiki.apache.org/confluence/display/solr/Collections+API

Regarding your observations:
1. Completely normal, that's standard naming
2. When you created the collection you did not specify a configuration so
the new collection will use the conf already stored in ZK. If you have more
than one not sure which one will be picked as default.
3. You should be able to create replicas, by adding new cores on the other
machines, and specifying the collection name and shard id. The data will
then be replicated automatically to the new node. If you already tried that
and get errors/problems while doing it provide some more details.

As far as i know you should be able to move/replace the index data, as long
as the source collection has the same config as the target collection. 
Afterwards you'll have to reload your core / restart the Solr instance - not
sure which one will do it - most likely the latter.
But it will be easier if you use the method described at point 3 above.
Please someone correct me, if i'm wrong.



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud question

Posted by "Beale, Jim (US-KOP)" <Ji...@hibu.com>.
Hi Mark,

Thanks for the reply.

I am struggling a bit here. Sorry if these are basic questions!  I can't find the answers anywhere.

I modified my solr.xml on all boxes to comment out the core definition for 'tp'.
Then, I used /admin/collections?action=CREATE&name=tp&numShards=1 against one of the boxes.  That created 'shard1' for the tp index.

(1) It named the dir 'tp_shard1_replica1'
(2) The core seems to be using the same config as the bn core
(3) I am unable to create a similar core on the other boxes.

When I use replicationFactor=5, it creates replicas of the index on the other boxes.

Can I then copy a pre-existing LCN index into the data/index directory and have it replicate to the other boxes?

Thanks!

Jim



-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com]
Sent: Friday, November 15, 2013 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud question

We are moving away from pre defining SolrCores for SolrCloud. The correct approach would be to use thew Collections API - then it is quite simple to change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> Hello all,
>
> I am trying to set up a SolrCloud deployment consisting of 5 boxes each of which is running Solr under jetty.  A zookeeper ensemble is running separately on 3 of the boxes.
>
> Each Solr instance has 2 cores, one of which is sharded across the five boxes and the other not sharded at all because it is a much smaller index.  numShards is set to 5 in the command to start jetty, -DnumShards=5.
>
> It turns out that getting this configuration to work is not as easy as I had hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core setup, you currently have to settle for the same
> numShards for every core."  Unfortunately that JIRA was closed without any implementation.
>
> Is this limitation still in effect?  Does the new core discovery mode offer anything in this regard?
>
> Is there any way at all to deploy two cores with different numShards?
>
> How hard would it be to implement this?  Is it compatible with the architecture of Solr 5?
>
> Thanks,
> Jim Beale
>
>
> The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

Re: SolrCloud question

Posted by Mark Miller <ma...@gmail.com>.
We are moving away from pre defining SolrCores for SolrCloud. The correct approach would be to use thew Collections API - then it is quite simple to change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> Hello all,
> 
> I am trying to set up a SolrCloud deployment consisting of 5 boxes each of which is running Solr under jetty.  A zookeeper ensemble is running separately on 3 of the boxes.
> 
> Each Solr instance has 2 cores, one of which is sharded across the five boxes and the other not sharded at all because it is a much smaller index.  numShards is set to 5 in the command to start jetty, -DnumShards=5.
> 
> It turns out that getting this configuration to work is not as easy as I had hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core setup, you currently have to settle for the same
> numShards for every core."  Unfortunately that JIRA was closed without any implementation.
> 
> Is this limitation still in effect?  Does the new core discovery mode offer anything in this regard?
> 
> Is there any way at all to deploy two cores with different numShards?
> 
> How hard would it be to implement this?  Is it compatible with the architecture of Solr 5?
> 
> Thanks,
> Jim Beale
> 
> 
> The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.