You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chuck Reynolds <cr...@ancestry.com> on 2018/09/21 20:07:28 UTC

Rule-based replication or sharing

I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
factor of three.  I'm using AWS EC2 instances and I have a requirement to
replicate the data into 3 AWS availability zones.  

So 30 servers in each zone and I don't see a create collection rule that
will put one replica in each of the three zones.

What am I missing?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Thanks Steve,

I saw this option but it will mean a bit of re-working our automation to implement.

The documentation talks about an EC2Snitch. I wish it worked like Cassandra where you just say that's the one you're using and it figures out how to replicate the data

On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:

    Hi Chuck,
    
    One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
    
    For example, you could use sysprop “AWSAZ”, then use rules like:
    
       replica:<2,sysprop.AWSAZ:us-east-1
       replica:<2,sysprop.AWSAZ:us-west-1
       replica:<2,sysprop.AWSAZ:ca-central-1
    
    --
    Steve
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
    
    > On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
    > 
    > I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
    > factor of three.  I'm using AWS EC2 instances and I have a requirement to
    > replicate the data into 3 AWS availability zones.  
    > 
    > So 30 servers in each zone and I don't see a create collection rule that
    > will put one replica in each of the three zones.
    > 
    > What am I missing?
    > 
    > 
    > 
    > --
    > Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=

Re: Rule-based replication or sharing

Posted by Noble Paul <no...@gmail.com>.

Yes, it uses a the autoscaling policies to achieve the same. Please refer
to the documentation here
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

On Thu, Sep 27, 2018, 02:11 Chuck Reynolds <cr...@ancestry.com> wrote:

> Noble,
>
> Are you saying in the latest version of Solr that this would work with
> three instances of Solr running on each server?
>
> If so how?
>
> Thanks again for your help.
>
> On 9/26/18, 9:11 AM, "Noble Paul" <no...@gmail.com> wrote:
>
>     I'm not sure if it is pertinent to ask you to move to the latest Solr
>     which has the policy based replica placement. Unfortunately, I don't
>     have any other solution I can think of
>
>     On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
> creynolds@ancestry.com> wrote:
>     >
>     > Noble,
>     >
>     > So other than manually moving replicas of shard do you have a
> suggestion of how one might accomplish the multiple availability zone with
> multiple instances of Solr running on each server?
>     >
>     > Thanks
>     >
>     > On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
>     >
>     >     The rules suggested by Steve is correct. I tested it locally and
> I got
>     >     the same errors. That means a bug exists probably.
>     >     All the new development efforts are invested in the new policy
> feature
>     >     .
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
>     >
>     >     The old one is going to be deprecated pretty soon. So, I'm not
> sure if
>     >     we should be investing our resources here
>     >     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
> creynolds@ancestry.com> wrote:
>     >     >
>     >     > Shawn,
>     >     >
>     >     > Thanks for the info. We’ve been running this way for the past
> 4 years.
>     >     >
>     >     > We were running on very large hardware, 20 physical cores with
> 256 gigs of ram with 3 billion document and it was the only way we could
> take advantage of the hardware.
>     >     >
>     >     > Running 1 Solr instance per server never gave us the
> throughput we needed.
>     >     >
>     >     > So I somewhat disagree with your statement because our test
> proved otherwise.
>     >     >
>     >     > Thanks for the info.
>     >     >
>     >     > Sent from my iPhone
>     >     >
>     >     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
> apache@elyograg.org> wrote:
>     >     > >
>     >     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>     >     > >> Each server has three instances of Solr running on it so
> every instance on the server has to be in the same replica set.
>     >     > >
>     >     > > You should be running exactly one Solr instance per server.
> When evaluating rules for replica placement, SolrCloud will treat each
> instance as completely separate from all others, including others on the
> same machine.  It will not know that those three instances are on the same
> machine.  One Solr instance can handle MANY indexes.
>     >     > >
>     >     > > There is only ONE situation where it makes sense to run
> multiple instances per machine, and in my strong opinion, even that
> situation should not be handled with multiple instances. That situation is
> this:  When running one instance would require a REALLY large heap.
> Garbage collection pauses can become extreme in that situation, so some
> people will run multiple instances that each have a smaller heap, and
> divide their indexes between them. In my opinion, when you have enough
> index data on an instance that it requires a huge heap, instead of running
> two or more instances on one server, it's time to add more servers.
>     >     > >
>     >     > > Thanks,
>     >     > > Shawn
>     >     > >
>     >
>     >
>     >
>     >     --
>     >     -----------------------------------------------------
>     >     Noble Paul
>     >
>     >
>
>
>     --
>     -----------------------------------------------------
>     Noble Paul
>
>
>

Re: Rule-based replication or sharing

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/2/2018 9:11 AM, Chuck Reynolds wrote:
> Until we move to Solr 7.5 is there a way that we can control sharding with the core.properties file?
>
> It seems to me that you use to be able to put a core.properties file in the Solr home path with something like the following.
>
> coreNodeName=bts_shard3_01
> shard=shard3
> collection=BTS
>
> Then start Solr and it would create the sharding base on the information in the core.properties file.
>
> When I try it with Solr 6.6 it seem to ignore the core.properties file.

When running SolrCloud, don't try to manually add cores, mess with the 
core.properties file, or use the CoreAdmin API unless you understand 
****EXACTLY**** how SolrCloud works internally.  And even if you do have 
that level of understanding, I strongly recommend not doing it.  It's 
easy to get wrong.  Use the Collections API to make changes to your 
indexes.  Virtually any action that people need to do to their indexes 
is supported by the Collections API, and if there's something important 
missing, then we can talk about adding it.  If the Collections API is 
bypassed, there's a good chance that something will be missing/incorrect 
in either zookeeper or core.properties, maybe both.

If you're trying to create a new shard on a collection with the implicit 
router, this is probably what you're looking for:

https://lucene.apache.org/solr/guide/7_5/collections-api.html#createshard

Thanks,
Shawn

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Thanks Varun,

Until we move to Solr 7.5 is there a way that we can control sharding with the core.properties file?

It seems to me that you use to be able to put a core.properties file in the Solr home path with something like the following.

coreNodeName=bts_shard3_01
shard=shard3
collection=BTS

Then start Solr and it would create the sharding base on the information in the core.properties file.

When I try it with Solr 6.6 it seem to ignore the core.properties file.


Thanks again for your help

On 10/1/18, 11:21 PM, "Varun Thacker" <va...@vthacker.in> wrote:

    Hi Chuck,
    
    I was chatting with Noble offline and he suggested we could use this
    starting 7.5
    
    {replica:'#EQUAL', shard : ''#EACH' , sysprop.az :'#EACH'}
    
    where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1 )
    
    It's documented
    https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=X42m0Brzz07baCIHc6ET01cc-NNABXey5TQ9MH5Xpgk&s=EieqaN5wcawBMOhwizH3PjnTnu--ixDDLyLs9zRzDa8&e=
    
    Let me know if this works for you.
    
    ( Looks like my previous email had some formatting issues )
    
    On Mon, Oct 1, 2018 at 10:17 PM Varun Thacker <va...@vthacker.in> wrote:
    
    > Hi Chuck,
    >
    > I was chatting with Noble offline and he suggested we could use this
    > starting 7.5
    >
    > *{replica:'#EQUAL', shard : ''#EACH' , sysprop.az <https://urldefense.proofpoint.com/v2/url?u=http-3A__sysprop.az&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=X42m0Brzz07baCIHc6ET01cc-NNABXey5TQ9MH5Xpgk&s=Vq1BAf4MKvMlSKiKZsTvnMpz7uu7FP7VV3EPW4bNEiU&e=>
    > :'#EACH'}*
    >
    > where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1
    > )
    >
    > It's documented
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=X42m0Brzz07baCIHc6ET01cc-NNABXey5TQ9MH5Xpgk&s=EieqaN5wcawBMOhwizH3PjnTnu--ixDDLyLs9zRzDa8&e=
    >
    > Let me know if this works for you.
    >
    >
    > On Wed, Sep 26, 2018 at 9:11 AM Chuck Reynolds <cr...@ancestry.com>
    > wrote:
    >
    >> Noble,
    >>
    >> Are you saying in the latest version of Solr that this would work with
    >> three instances of Solr running on each server?
    >>
    >> If so how?
    >>
    >> Thanks again for your help.
    >>
    >> On 9/26/18, 9:11 AM, "Noble Paul" <no...@gmail.com> wrote:
    >>
    >>     I'm not sure if it is pertinent to ask you to move to the latest Solr
    >>     which has the policy based replica placement. Unfortunately, I don't
    >>     have any other solution I can think of
    >>
    >>     On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
    >> creynolds@ancestry.com> wrote:
    >>     >
    >>     > Noble,
    >>     >
    >>     > So other than manually moving replicas of shard do you have a
    >> suggestion of how one might accomplish the multiple availability zone with
    >> multiple instances of Solr running on each server?
    >>     >
    >>     > Thanks
    >>     >
    >>     > On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
    >>     >
    >>     >     The rules suggested by Steve is correct. I tested it locally
    >> and I got
    >>     >     the same errors. That means a bug exists probably.
    >>     >     All the new development efforts are invested in the new policy
    >> feature
    >>     >     .
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
    >>     >
    >>     >     The old one is going to be deprecated pretty soon. So, I'm not
    >> sure if
    >>     >     we should be investing our resources here
    >>     >     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
    >> creynolds@ancestry.com> wrote:
    >>     >     >
    >>     >     > Shawn,
    >>     >     >
    >>     >     > Thanks for the info. We’ve been running this way for the past
    >> 4 years.
    >>     >     >
    >>     >     > We were running on very large hardware, 20 physical cores
    >> with 256 gigs of ram with 3 billion document and it was the only way we
    >> could take advantage of the hardware.
    >>     >     >
    >>     >     > Running 1 Solr instance per server never gave us the
    >> throughput we needed.
    >>     >     >
    >>     >     > So I somewhat disagree with your statement because our test
    >> proved otherwise.
    >>     >     >
    >>     >     > Thanks for the info.
    >>     >     >
    >>     >     > Sent from my iPhone
    >>     >     >
    >>     >     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
    >> apache@elyograg.org> wrote:
    >>     >     > >
    >>     >     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
    >>     >     > >> Each server has three instances of Solr running on it so
    >> every instance on the server has to be in the same replica set.
    >>     >     > >
    >>     >     > > You should be running exactly one Solr instance per
    >> server.  When evaluating rules for replica placement, SolrCloud will treat
    >> each instance as completely separate from all others, including others on
    >> the same machine.  It will not know that those three instances are on the
    >> same machine.  One Solr instance can handle MANY indexes.
    >>     >     > >
    >>     >     > > There is only ONE situation where it makes sense to run
    >> multiple instances per machine, and in my strong opinion, even that
    >> situation should not be handled with multiple instances. That situation is
    >> this:  When running one instance would require a REALLY large heap.
    >> Garbage collection pauses can become extreme in that situation, so some
    >> people will run multiple instances that each have a smaller heap, and
    >> divide their indexes between them. In my opinion, when you have enough
    >> index data on an instance that it requires a huge heap, instead of running
    >> two or more instances on one server, it's time to add more servers.
    >>     >     > >
    >>     >     > > Thanks,
    >>     >     > > Shawn
    >>     >     > >
    >>     >
    >>     >
    >>     >
    >>     >     --
    >>     >     -----------------------------------------------------
    >>     >     Noble Paul
    >>     >
    >>     >
    >>
    >>
    >>     --
    >>     -----------------------------------------------------
    >>     Noble Paul
    >>
    >>
    >>

Re: Rule-based replication or sharing

Posted by Varun Thacker <va...@vthacker.in>.

Hi Chuck,

I was chatting with Noble offline and he suggested we could use this
starting 7.5

{replica:'#EQUAL', shard : ''#EACH' , sysprop.az :'#EACH'}

where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1 )

It's documented
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

Let me know if this works for you.

( Looks like my previous email had some formatting issues )

On Mon, Oct 1, 2018 at 10:17 PM Varun Thacker <va...@vthacker.in> wrote:

> Hi Chuck,
>
> I was chatting with Noble offline and he suggested we could use this
> starting 7.5
>
> *{replica:'#EQUAL', shard : ''#EACH' , sysprop.az <http://sysprop.az>
> :'#EACH'}*
>
> where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1
> )
>
> It's documented
> https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html
>
> Let me know if this works for you.
>
>
> On Wed, Sep 26, 2018 at 9:11 AM Chuck Reynolds <cr...@ancestry.com>
> wrote:
>
>> Noble,
>>
>> Are you saying in the latest version of Solr that this would work with
>> three instances of Solr running on each server?
>>
>> If so how?
>>
>> Thanks again for your help.
>>
>> On 9/26/18, 9:11 AM, "Noble Paul" <no...@gmail.com> wrote:
>>
>>     I'm not sure if it is pertinent to ask you to move to the latest Solr
>>     which has the policy based replica placement. Unfortunately, I don't
>>     have any other solution I can think of
>>
>>     On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
>> creynolds@ancestry.com> wrote:
>>     >
>>     > Noble,
>>     >
>>     > So other than manually moving replicas of shard do you have a
>> suggestion of how one might accomplish the multiple availability zone with
>> multiple instances of Solr running on each server?
>>     >
>>     > Thanks
>>     >
>>     > On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
>>     >
>>     >     The rules suggested by Steve is correct. I tested it locally
>> and I got
>>     >     the same errors. That means a bug exists probably.
>>     >     All the new development efforts are invested in the new policy
>> feature
>>     >     .
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
>>     >
>>     >     The old one is going to be deprecated pretty soon. So, I'm not
>> sure if
>>     >     we should be investing our resources here
>>     >     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
>> creynolds@ancestry.com> wrote:
>>     >     >
>>     >     > Shawn,
>>     >     >
>>     >     > Thanks for the info. We’ve been running this way for the past
>> 4 years.
>>     >     >
>>     >     > We were running on very large hardware, 20 physical cores
>> with 256 gigs of ram with 3 billion document and it was the only way we
>> could take advantage of the hardware.
>>     >     >
>>     >     > Running 1 Solr instance per server never gave us the
>> throughput we needed.
>>     >     >
>>     >     > So I somewhat disagree with your statement because our test
>> proved otherwise.
>>     >     >
>>     >     > Thanks for the info.
>>     >     >
>>     >     > Sent from my iPhone
>>     >     >
>>     >     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
>> apache@elyograg.org> wrote:
>>     >     > >
>>     >     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>>     >     > >> Each server has three instances of Solr running on it so
>> every instance on the server has to be in the same replica set.
>>     >     > >
>>     >     > > You should be running exactly one Solr instance per
>> server.  When evaluating rules for replica placement, SolrCloud will treat
>> each instance as completely separate from all others, including others on
>> the same machine.  It will not know that those three instances are on the
>> same machine.  One Solr instance can handle MANY indexes.
>>     >     > >
>>     >     > > There is only ONE situation where it makes sense to run
>> multiple instances per machine, and in my strong opinion, even that
>> situation should not be handled with multiple instances. That situation is
>> this:  When running one instance would require a REALLY large heap.
>> Garbage collection pauses can become extreme in that situation, so some
>> people will run multiple instances that each have a smaller heap, and
>> divide their indexes between them. In my opinion, when you have enough
>> index data on an instance that it requires a huge heap, instead of running
>> two or more instances on one server, it's time to add more servers.
>>     >     > >
>>     >     > > Thanks,
>>     >     > > Shawn
>>     >     > >
>>     >
>>     >
>>     >
>>     >     --
>>     >     -----------------------------------------------------
>>     >     Noble Paul
>>     >
>>     >
>>
>>
>>     --
>>     -----------------------------------------------------
>>     Noble Paul
>>
>>
>>

Re: Rule-based replication or sharing

Posted by Varun Thacker <va...@vthacker.in>.

Hi Chuck,

I was chatting with Noble offline and he suggested we could use this
starting 7.5

*{replica:'#EQUAL', shard : ''#EACH' , sysprop.az <http://sysprop.az>
:'#EACH'}*

where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1 )

It's documented
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

Let me know if this works for you.


On Wed, Sep 26, 2018 at 9:11 AM Chuck Reynolds <cr...@ancestry.com>
wrote:

> Noble,
>
> Are you saying in the latest version of Solr that this would work with
> three instances of Solr running on each server?
>
> If so how?
>
> Thanks again for your help.
>
> On 9/26/18, 9:11 AM, "Noble Paul" <no...@gmail.com> wrote:
>
>     I'm not sure if it is pertinent to ask you to move to the latest Solr
>     which has the policy based replica placement. Unfortunately, I don't
>     have any other solution I can think of
>
>     On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
> creynolds@ancestry.com> wrote:
>     >
>     > Noble,
>     >
>     > So other than manually moving replicas of shard do you have a
> suggestion of how one might accomplish the multiple availability zone with
> multiple instances of Solr running on each server?
>     >
>     > Thanks
>     >
>     > On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
>     >
>     >     The rules suggested by Steve is correct. I tested it locally and
> I got
>     >     the same errors. That means a bug exists probably.
>     >     All the new development efforts are invested in the new policy
> feature
>     >     .
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
>     >
>     >     The old one is going to be deprecated pretty soon. So, I'm not
> sure if
>     >     we should be investing our resources here
>     >     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
> creynolds@ancestry.com> wrote:
>     >     >
>     >     > Shawn,
>     >     >
>     >     > Thanks for the info. We’ve been running this way for the past
> 4 years.
>     >     >
>     >     > We were running on very large hardware, 20 physical cores with
> 256 gigs of ram with 3 billion document and it was the only way we could
> take advantage of the hardware.
>     >     >
>     >     > Running 1 Solr instance per server never gave us the
> throughput we needed.
>     >     >
>     >     > So I somewhat disagree with your statement because our test
> proved otherwise.
>     >     >
>     >     > Thanks for the info.
>     >     >
>     >     > Sent from my iPhone
>     >     >
>     >     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
> apache@elyograg.org> wrote:
>     >     > >
>     >     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>     >     > >> Each server has three instances of Solr running on it so
> every instance on the server has to be in the same replica set.
>     >     > >
>     >     > > You should be running exactly one Solr instance per server.
> When evaluating rules for replica placement, SolrCloud will treat each
> instance as completely separate from all others, including others on the
> same machine.  It will not know that those three instances are on the same
> machine.  One Solr instance can handle MANY indexes.
>     >     > >
>     >     > > There is only ONE situation where it makes sense to run
> multiple instances per machine, and in my strong opinion, even that
> situation should not be handled with multiple instances. That situation is
> this:  When running one instance would require a REALLY large heap.
> Garbage collection pauses can become extreme in that situation, so some
> people will run multiple instances that each have a smaller heap, and
> divide their indexes between them. In my opinion, when you have enough
> index data on an instance that it requires a huge heap, instead of running
> two or more instances on one server, it's time to add more servers.
>     >     > >
>     >     > > Thanks,
>     >     > > Shawn
>     >     > >
>     >
>     >
>     >
>     >     --
>     >     -----------------------------------------------------
>     >     Noble Paul
>     >
>     >
>
>
>     --
>     -----------------------------------------------------
>     Noble Paul
>
>
>

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Noble,

Are you saying in the latest version of Solr that this would work with three instances of Solr running on each server?

If so how?

Thanks again for your help.

On 9/26/18, 9:11 AM, "Noble Paul" <no...@gmail.com> wrote:

    I'm not sure if it is pertinent to ask you to move to the latest Solr
    which has the policy based replica placement. Unfortunately, I don't
    have any other solution I can think of
    
    On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <cr...@ancestry.com> wrote:
    >
    > Noble,
    >
    > So other than manually moving replicas of shard do you have a suggestion of how one might accomplish the multiple availability zone with multiple instances of Solr running on each server?
    >
    > Thanks
    >
    > On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
    >
    >     The rules suggested by Steve is correct. I tested it locally and I got
    >     the same errors. That means a bug exists probably.
    >     All the new development efforts are invested in the new policy feature
    >     .https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
    >
    >     The old one is going to be deprecated pretty soon. So, I'm not sure if
    >     we should be investing our resources here
    >     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <cr...@ancestry.com> wrote:
    >     >
    >     > Shawn,
    >     >
    >     > Thanks for the info. We’ve been running this way for the past 4 years.
    >     >
    >     > We were running on very large hardware, 20 physical cores with 256 gigs of ram with 3 billion document and it was the only way we could take advantage of the hardware.
    >     >
    >     > Running 1 Solr instance per server never gave us the throughput we needed.
    >     >
    >     > So I somewhat disagree with your statement because our test proved otherwise.
    >     >
    >     > Thanks for the info.
    >     >
    >     > Sent from my iPhone
    >     >
    >     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
    >     > >
    >     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
    >     > >> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
    >     > >
    >     > > You should be running exactly one Solr instance per server.  When evaluating rules for replica placement, SolrCloud will treat each instance as completely separate from all others, including others on the same machine.  It will not know that those three instances are on the same machine.  One Solr instance can handle MANY indexes.
    >     > >
    >     > > There is only ONE situation where it makes sense to run multiple instances per machine, and in my strong opinion, even that situation should not be handled with multiple instances. That situation is this:  When running one instance would require a REALLY large heap.  Garbage collection pauses can become extreme in that situation, so some people will run multiple instances that each have a smaller heap, and divide their indexes between them. In my opinion, when you have enough index data on an instance that it requires a huge heap, instead of running two or more instances on one server, it's time to add more servers.
    >     > >
    >     > > Thanks,
    >     > > Shawn
    >     > >
    >
    >
    >
    >     --
    >     -----------------------------------------------------
    >     Noble Paul
    >
    >
    
    
    -- 
    -----------------------------------------------------
    Noble Paul

Re: Rule-based replication or sharing

Posted by Noble Paul <no...@gmail.com>.

I'm not sure if it is pertinent to ask you to move to the latest Solr
which has the policy based replica placement. Unfortunately, I don't
have any other solution I can think of

On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <cr...@ancestry.com> wrote:
>
> Noble,
>
> So other than manually moving replicas of shard do you have a suggestion of how one might accomplish the multiple availability zone with multiple instances of Solr running on each server?
>
> Thanks
>
> On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:
>
>     The rules suggested by Steve is correct. I tested it locally and I got
>     the same errors. That means a bug exists probably.
>     All the new development efforts are invested in the new policy feature
>     .https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
>
>     The old one is going to be deprecated pretty soon. So, I'm not sure if
>     we should be investing our resources here
>     On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <cr...@ancestry.com> wrote:
>     >
>     > Shawn,
>     >
>     > Thanks for the info. We’ve been running this way for the past 4 years.
>     >
>     > We were running on very large hardware, 20 physical cores with 256 gigs of ram with 3 billion document and it was the only way we could take advantage of the hardware.
>     >
>     > Running 1 Solr instance per server never gave us the throughput we needed.
>     >
>     > So I somewhat disagree with your statement because our test proved otherwise.
>     >
>     > Thanks for the info.
>     >
>     > Sent from my iPhone
>     >
>     > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>     > >
>     > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>     > >> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
>     > >
>     > > You should be running exactly one Solr instance per server.  When evaluating rules for replica placement, SolrCloud will treat each instance as completely separate from all others, including others on the same machine.  It will not know that those three instances are on the same machine.  One Solr instance can handle MANY indexes.
>     > >
>     > > There is only ONE situation where it makes sense to run multiple instances per machine, and in my strong opinion, even that situation should not be handled with multiple instances. That situation is this:  When running one instance would require a REALLY large heap.  Garbage collection pauses can become extreme in that situation, so some people will run multiple instances that each have a smaller heap, and divide their indexes between them. In my opinion, when you have enough index data on an instance that it requires a huge heap, instead of running two or more instances on one server, it's time to add more servers.
>     > >
>     > > Thanks,
>     > > Shawn
>     > >
>
>
>
>     --
>     -----------------------------------------------------
>     Noble Paul
>
>


-- 
-----------------------------------------------------
Noble Paul

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Noble,

So other than manually moving replicas of shard do you have a suggestion of how one might accomplish the multiple availability zone with multiple instances of Solr running on each server?

Thanks

On 9/26/18, 12:56 AM, "Noble Paul" <no...@gmail.com> wrote:

    The rules suggested by Steve is correct. I tested it locally and I got
    the same errors. That means a bug exists probably.
    All the new development efforts are invested in the new policy feature
    .https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM&s=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA&e=
    
    The old one is going to be deprecated pretty soon. So, I'm not sure if
    we should be investing our resources here
    On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <cr...@ancestry.com> wrote:
    >
    > Shawn,
    >
    > Thanks for the info. We’ve been running this way for the past 4 years.
    >
    > We were running on very large hardware, 20 physical cores with 256 gigs of ram with 3 billion document and it was the only way we could take advantage of the hardware.
    >
    > Running 1 Solr instance per server never gave us the throughput we needed.
    >
    > So I somewhat disagree with your statement because our test proved otherwise.
    >
    > Thanks for the info.
    >
    > Sent from my iPhone
    >
    > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
    > >
    > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
    > >> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
    > >
    > > You should be running exactly one Solr instance per server.  When evaluating rules for replica placement, SolrCloud will treat each instance as completely separate from all others, including others on the same machine.  It will not know that those three instances are on the same machine.  One Solr instance can handle MANY indexes.
    > >
    > > There is only ONE situation where it makes sense to run multiple instances per machine, and in my strong opinion, even that situation should not be handled with multiple instances. That situation is this:  When running one instance would require a REALLY large heap.  Garbage collection pauses can become extreme in that situation, so some people will run multiple instances that each have a smaller heap, and divide their indexes between them. In my opinion, when you have enough index data on an instance that it requires a huge heap, instead of running two or more instances on one server, it's time to add more servers.
    > >
    > > Thanks,
    > > Shawn
    > >
    
    
    
    -- 
    -----------------------------------------------------
    Noble Paul

Re: Rule-based replication or sharing

Posted by Noble Paul <no...@gmail.com>.

The rules suggested by Steve is correct. I tested it locally and I got
the same errors. That means a bug exists probably.
All the new development efforts are invested in the new policy feature
.https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-policy-preferences.html

The old one is going to be deprecated pretty soon. So, I'm not sure if
we should be investing our resources here
On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <cr...@ancestry.com> wrote:
>
> Shawn,
>
> Thanks for the info. We’ve been running this way for the past 4 years.
>
> We were running on very large hardware, 20 physical cores with 256 gigs of ram with 3 billion document and it was the only way we could take advantage of the hardware.
>
> Running 1 Solr instance per server never gave us the throughput we needed.
>
> So I somewhat disagree with your statement because our test proved otherwise.
>
> Thanks for the info.
>
> Sent from my iPhone
>
> > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> >> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
> >
> > You should be running exactly one Solr instance per server.  When evaluating rules for replica placement, SolrCloud will treat each instance as completely separate from all others, including others on the same machine.  It will not know that those three instances are on the same machine.  One Solr instance can handle MANY indexes.
> >
> > There is only ONE situation where it makes sense to run multiple instances per machine, and in my strong opinion, even that situation should not be handled with multiple instances. That situation is this:  When running one instance would require a REALLY large heap.  Garbage collection pauses can become extreme in that situation, so some people will run multiple instances that each have a smaller heap, and divide their indexes between them. In my opinion, when you have enough index data on an instance that it requires a huge heap, instead of running two or more instances on one server, it's time to add more servers.
> >
> > Thanks,
> > Shawn
> >



-- 
-----------------------------------------------------
Noble Paul

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Shawn,

Thanks for the info. We’ve been running this way for the past 4 years. 

We were running on very large hardware, 20 physical cores with 256 gigs of ram with 3 billion document and it was the only way we could take advantage of the hardware. 

Running 1 Solr instance per server never gave us the throughput we needed. 

So I somewhat disagree with your statement because our test proved otherwise. 

Thanks for the info. 

Sent from my iPhone

> On Sep 25, 2018, at 4:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
> 
> You should be running exactly one Solr instance per server.  When evaluating rules for replica placement, SolrCloud will treat each instance as completely separate from all others, including others on the same machine.  It will not know that those three instances are on the same machine.  One Solr instance can handle MANY indexes.
> 
> There is only ONE situation where it makes sense to run multiple instances per machine, and in my strong opinion, even that situation should not be handled with multiple instances. That situation is this:  When running one instance would require a REALLY large heap.  Garbage collection pauses can become extreme in that situation, so some people will run multiple instances that each have a smaller heap, and divide their indexes between them. In my opinion, when you have enough index data on an instance that it requires a huge heap, instead of running two or more instances on one server, it's time to add more servers.
> 
> Thanks,
> Shawn
>

Re: Rule-based replication or sharing

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.

You should be running exactly one Solr instance per server.  When 
evaluating rules for replica placement, SolrCloud will treat each 
instance as completely separate from all others, including others on the 
same machine.  It will not know that those three instances are on the 
same machine.  One Solr instance can handle MANY indexes.

There is only ONE situation where it makes sense to run multiple 
instances per machine, and in my strong opinion, even that situation 
should not be handled with multiple instances. That situation is this:  
When running one instance would require a REALLY large heap.  Garbage 
collection pauses can become extreme in that situation, so some people 
will run multiple instances that each have a smaller heap, and divide 
their indexes between them. In my opinion, when you have enough index 
data on an instance that it requires a huge heap, instead of running two 
or more instances on one server, it's time to add more servers.

Thanks,
Shawn

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Steve,

Sorry must have omitted it from a past response.

Here is what came back from the response.


<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">91</int></lst><str name="Operation create caused exception:">org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}</str><lst name="exception"><str name="msg">Could not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}</str><int name="rspCode">400</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name="msg">Could not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}</str><int name="code">400</int></lst>
</response>





On 9/25/18, 11:33 AM, "Steve Rowe" <sa...@gmail.com> wrote:

    Chuck, see my responses inline below:
    
    > On Sep 25, 2018, at 12:50 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
    > The bottom line is I guess I'm confused by the documentation and the reference to replicas. Normally when referring to replicas in the documentation it is referring to the number of times you want the data replicated. As in replication factor.  That's where the confusion was for me.
    
    We can always use help improving Solr’s documentation, and your perspective is valuable.  Please see https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_solr_HowToContribute&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=_ZWThlXl48Sa2f_pVyPzwxiCmVnOtDdddq8wfK6CVqM&s=XR2tqAPyyNaSPvrlnjhqm81DZj3WP3v7s8iEciH1xss&e= and open JIRA issues with the problems you find, and ideally with patches against the ref guide sources.
    
    From the Solr 6.4 ref guide’s “Solr Glossary”:
    
      Replica: A Core that acts as a physical copy of a Shard in a SolrCloud Collection.
    
    As ^^ indicates, a “replica” is not a replication factor.  (Though “replica:1” in a rule-based replica placement rule is a condition on replica *count*, so I can see where that could be confusing.)
    
    > If I want to create a rule that insures that my replication factor of three correctly shards the data across three AZ so if I was to lose one or even two AZ's in AWS Solr would still have 1 - 2 copies of the data.   How would that rule work?
    
    I thought I already answered that exact question:
    
    > If you mean “exactly one Solr instance in an AZ must host exactly one replica of each shard of the collection”, then yes, that makes sense :).
    > 
    > Okay, one more try :) - here are the rules that should do the trick for you (i.e., what I wrote in the previous sentence):
    > 
    > -----
    > rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
    > &rule=shard:*,replica:1,sysprop.AWSAZ:AZ2
    > &rule=shard:*,replica:1,sysprop.AWSAZ:AZ3
    > -----
    
    Have you tried ^^ ?
    
    --
    Steve
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=_ZWThlXl48Sa2f_pVyPzwxiCmVnOtDdddq8wfK6CVqM&s=0npMKzqmYyK1N3r08vl6qYeDyiOMVsr_iXcvW5nNUqk&e=

Re: Rule-based replication or sharing

Posted by Steve Rowe <sa...@gmail.com>.

Chuck, see my responses inline below:

> On Sep 25, 2018, at 12:50 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
> The bottom line is I guess I'm confused by the documentation and the reference to replicas. Normally when referring to replicas in the documentation it is referring to the number of times you want the data replicated. As in replication factor.  That's where the confusion was for me.

We can always use help improving Solr’s documentation, and your perspective is valuable.  Please see https://wiki.apache.org/solr/HowToContribute and open JIRA issues with the problems you find, and ideally with patches against the ref guide sources.

From the Solr 6.4 ref guide’s “Solr Glossary”:

  Replica: A Core that acts as a physical copy of a Shard in a SolrCloud Collection.

As ^^ indicates, a “replica” is not a replication factor.  (Though “replica:1” in a rule-based replica placement rule is a condition on replica *count*, so I can see where that could be confusing.)

> If I want to create a rule that insures that my replication factor of three correctly shards the data across three AZ so if I was to lose one or even two AZ's in AWS Solr would still have 1 - 2 copies of the data.   How would that rule work?

I thought I already answered that exact question:

> If you mean “exactly one Solr instance in an AZ must host exactly one replica of each shard of the collection”, then yes, that makes sense :).
> 
> Okay, one more try :) - here are the rules that should do the trick for you (i.e., what I wrote in the previous sentence):
> 
> -----
> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
> &rule=shard:*,replica:1,sysprop.AWSAZ:AZ2
> &rule=shard:*,replica:1,sysprop.AWSAZ:AZ3
> -----

Have you tried ^^ ?

--
Steve
www.lucidworks.com

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Steve,

No doubt I confused you.  I'm confused myself__

When I said replica set what I was referring to was one of the three replicas of the data.  Each replica needing to be in a different AZ.

What is a "replica set”?  And why does each instance of Solr (referred to in the reference guide as a “node”, BTW) running on a server need to be in the same “replica set”?
	What I should of said is each node on the server which there are three per server needs to be in the same AZ.

The bottom line is I guess I'm confused by the documentation and the reference to replicas. Normally when referring to replicas in the documentation it is referring to the number of times you want the data replicated. As in replication factor.  That's where the confusion was for me.

So let me ask this simple question.

If I want to create a rule that insures that my replication factor of three correctly shards the data across three AZ so if I was to lose one or even two AZ's in AWS Solr would still have 1 - 2 copies of the data.   How would that rule work?



On 9/25/18, 10:17 AM, "Steve Rowe" <sa...@gmail.com> wrote:

    Hi Chuck, see my replies inline below:
    
    > On Sep 25, 2018, at 11:21 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
    > 
    > So we have 90 server in AWS, 30 servers per AZ's.
    > 90 shards for the cluster.
    > Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
    
    You lost me here.  What is a "replica set”?  And why does each instance of Solr (referred to in the reference guide as a “node”, BTW) running on a server need to be in the same “replica set”?
    
    (I’m guessing you have theorized that “replica:3” is a way of referring to "replica set #3”, but that’s incorrect; “replica:3” means that exactly 3 replicas must be placed on the bucket of nodes you specify in the rule; more info below.)
    
    > So for example shard 1 will have three replicas and each replica needs to be in a separate AZ.
    
    Okay, I understand this part, but I fail to see how this is an example of your “replica set” assertion above.
    
    > So does the rule of replica:>2 work?
    
    I assume you did not mean ^^ literally, since you wrote “>” where I wrote “<“ in my previous response. 
    
    I checked offline with Noble Paul, who wrote the rule-based replica placement feature, and he corrected a misunderstanding of mine:
    
    > On 9/25/18, 9:08 AM, "Steve Rowe" <sa...@gmail.com> wrote:
    
    > So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.
    
    But ^^ is incorrect. 
    
    “replica:<2” means that either zero or one replica of each shard of the collection to be created may be hosted on the bucket of *all* of the nodes that have the specified AWSAZ sysprop value.  That is, when placing replicas, Solr will put either zero or one replica on one of the nodes in the bucket.  And AFAICT that’s not exatly what you want, since zero replicas of a shard on an AZ is not acceptable. 
    
    > I just need all of the servers in an AZ to be in the same replica.  Does that make sense?
    
    I’m not sure?  This sounds like something different from your above example: "shard 1 will have three replicas and each replica needs to be in a separate AZ.”
    
    If you mean “exactly one Solr instance in an AZ must host exactly one replica of each shard of the collection”, then yes, that makes sense :).
    
    Okay, one more try :) - here are the rules that should do the trick for you (i.e., what I wrote in the previous sentence):
    
    -----
     rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
    &rule=shard:*,replica:1,sysprop.AWSAZ:AZ2
    &rule=shard:*,replica:1,sysprop.AWSAZ:AZ3
    -----
    
    --
    Steve
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=e6E6P07QRR0Gn9oI2V1tk3sWdVDq5EF_tIgdoh4DxpE&s=Ddb5KOc_t4p64xyxt5rmqnWWwMcByecGQ2iJYv2BWiY&e=
    
    > On 9/25/18, 9:08 AM, "Steve Rowe" <sa...@gmail.com> wrote:
    > 
    >    Chuck,
    > 
    >    The default Snitch is the one that’s used if you don’t specify one in a rule.  The sysprop.* tag is provided by the default Snitch.
    > 
    >    The only thing that seems wrong to me in your rules is “replica:1”, “replica:2”, and “replica:3” - these say that exactly one, two, and three replicas of each shard, respectively, must be on each of the nodes that has the respective sysprop value.
    > 
    >    Since these rules will apply to all nodes that match the sysprop value, you have to allow for the possibility that some nodes will have *zero* replicas of a shard.  So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.
    > 
    >    Did you set system property AWSAZ on each Solr node with an appropriate value?
    > 
    >    --
    >    Steve
    >    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=uG91WrgZB5UTKLOAB53AcrY5LyBsJ3VyBH8cN7xe2mU&s=9V6TJoE0h5NMjEWZ38ipa3zgYvoLJ1H9GHplSz1DJLU&e=
    > 
    >> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
    >> 
    >> Steve,
    >> 
    >> I wasn't able to get the sysprop to work.  I think maybe there is a disconnect on my part.
    >> 
    >> From the documentation it looks like I can only use the sysprop tag if I'm using a Snitch.  Is that correct.
    >> 
    >> I can't find any example of anyone using the default Snitch.
    >> 
    >> Here is what I have for my rule:
    >> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1&rule=shard:*,replica:2,sysprop.AWSAZ:AZ2&rule=shard:*,replica:3,sysprop.AWSAZ:AZ3
    >> 
    >> I'm not specifying a snitch.  Is that my problem or is there a problem with my rule?
    >> 
    >> Thanks for your help.
    >> On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:
    >> 
    >>   Hi Chuck,
    >> 
    >>   One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
    >> 
    >>   For example, you could use sysprop “AWSAZ”, then use rules like:
    >> 
    >>      replica:<2,sysprop.AWSAZ:us-east-1
    >>      replica:<2,sysprop.AWSAZ:us-west-1
    >>      replica:<2,sysprop.AWSAZ:ca-central-1
    >> 
    >>   --
    >>   Steve
    >>   https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
    >> 
    >>> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
    >>> 
    >>> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
    >>> factor of three.  I'm using AWS EC2 instances and I have a requirement to
    >>> replicate the data into 3 AWS availability zones.  
    >>> 
    >>> So 30 servers in each zone and I don't see a create collection rule that
    >>> will put one replica in each of the three zones.
    >>> 
    >>> What am I missing?
    >>> 
    >>> 
    >>> 
    >>> --
    >>> Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=

Re: Rule-based replication or sharing

Posted by Steve Rowe <sa...@gmail.com>.

Hi Chuck, see my replies inline below:

> On Sep 25, 2018, at 11:21 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
> 
> So we have 90 server in AWS, 30 servers per AZ's.
> 90 shards for the cluster.
> Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.

You lost me here.  What is a "replica set”?  And why does each instance of Solr (referred to in the reference guide as a “node”, BTW) running on a server need to be in the same “replica set”?

(I’m guessing you have theorized that “replica:3” is a way of referring to "replica set #3”, but that’s incorrect; “replica:3” means that exactly 3 replicas must be placed on the bucket of nodes you specify in the rule; more info below.)

> So for example shard 1 will have three replicas and each replica needs to be in a separate AZ.

Okay, I understand this part, but I fail to see how this is an example of your “replica set” assertion above.

> So does the rule of replica:>2 work?

I assume you did not mean ^^ literally, since you wrote “>” where I wrote “<“ in my previous response. 

I checked offline with Noble Paul, who wrote the rule-based replica placement feature, and he corrected a misunderstanding of mine:

> On 9/25/18, 9:08 AM, "Steve Rowe" <sa...@gmail.com> wrote:

> So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.

But ^^ is incorrect. 

“replica:<2” means that either zero or one replica of each shard of the collection to be created may be hosted on the bucket of *all* of the nodes that have the specified AWSAZ sysprop value.  That is, when placing replicas, Solr will put either zero or one replica on one of the nodes in the bucket.  And AFAICT that’s not exatly what you want, since zero replicas of a shard on an AZ is not acceptable. 

> I just need all of the servers in an AZ to be in the same replica.  Does that make sense?

I’m not sure?  This sounds like something different from your above example: "shard 1 will have three replicas and each replica needs to be in a separate AZ.”

If you mean “exactly one Solr instance in an AZ must host exactly one replica of each shard of the collection”, then yes, that makes sense :).

Okay, one more try :) - here are the rules that should do the trick for you (i.e., what I wrote in the previous sentence):

-----
 rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
&rule=shard:*,replica:1,sysprop.AWSAZ:AZ2
&rule=shard:*,replica:1,sysprop.AWSAZ:AZ3
-----

--
Steve
www.lucidworks.com

> On 9/25/18, 9:08 AM, "Steve Rowe" <sa...@gmail.com> wrote:
> 
>    Chuck,
> 
>    The default Snitch is the one that’s used if you don’t specify one in a rule.  The sysprop.* tag is provided by the default Snitch.
> 
>    The only thing that seems wrong to me in your rules is “replica:1”, “replica:2”, and “replica:3” - these say that exactly one, two, and three replicas of each shard, respectively, must be on each of the nodes that has the respective sysprop value.
> 
>    Since these rules will apply to all nodes that match the sysprop value, you have to allow for the possibility that some nodes will have *zero* replicas of a shard.  So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.
> 
>    Did you set system property AWSAZ on each Solr node with an appropriate value?
> 
>    --
>    Steve
>    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=uG91WrgZB5UTKLOAB53AcrY5LyBsJ3VyBH8cN7xe2mU&s=9V6TJoE0h5NMjEWZ38ipa3zgYvoLJ1H9GHplSz1DJLU&e=
> 
>> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
>> 
>> Steve,
>> 
>> I wasn't able to get the sysprop to work.  I think maybe there is a disconnect on my part.
>> 
>> From the documentation it looks like I can only use the sysprop tag if I'm using a Snitch.  Is that correct.
>> 
>> I can't find any example of anyone using the default Snitch.
>> 
>> Here is what I have for my rule:
>> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1&rule=shard:*,replica:2,sysprop.AWSAZ:AZ2&rule=shard:*,replica:3,sysprop.AWSAZ:AZ3
>> 
>> I'm not specifying a snitch.  Is that my problem or is there a problem with my rule?
>> 
>> Thanks for your help.
>> On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:
>> 
>>   Hi Chuck,
>> 
>>   One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
>> 
>>   For example, you could use sysprop “AWSAZ”, then use rules like:
>> 
>>      replica:<2,sysprop.AWSAZ:us-east-1
>>      replica:<2,sysprop.AWSAZ:us-west-1
>>      replica:<2,sysprop.AWSAZ:ca-central-1
>> 
>>   --
>>   Steve
>>   https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
>> 
>>> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
>>> 
>>> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
>>> factor of three.  I'm using AWS EC2 instances and I have a requirement to
>>> replicate the data into 3 AWS availability zones.  
>>> 
>>> So 30 servers in each zone and I don't see a create collection rule that
>>> will put one replica in each of the three zones.
>>> 
>>> What am I missing?
>>> 
>>> 
>>> 
>>> --
>>> Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Steve,

Yes I set the set system property AWSAZ and I've checked the java properties in Solr and I can see them.

It maybe the way we are configuring Solr so let me explain that first.

So we have 90 server in AWS, 30 servers per AZ's.
90 shards for the cluster.
Each server has three instances of Solr running on it so every instance on the server has to be in the same replica set.
So for example shard 1 will have three replicas and each replica needs to be in a separate AZ.

So does the rule of replica:>2 work?

I just need all of the servers in an AZ to be in the same replica.  Does that make sense?

On 9/25/18, 9:08 AM, "Steve Rowe" <sa...@gmail.com> wrote:

    Chuck,
    
    The default Snitch is the one that’s used if you don’t specify one in a rule.  The sysprop.* tag is provided by the default Snitch.
    
    The only thing that seems wrong to me in your rules is “replica:1”, “replica:2”, and “replica:3” - these say that exactly one, two, and three replicas of each shard, respectively, must be on each of the nodes that has the respective sysprop value.
    
    Since these rules will apply to all nodes that match the sysprop value, you have to allow for the possibility that some nodes will have *zero* replicas of a shard.  So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.
    
    Did you set system property AWSAZ on each Solr node with an appropriate value?
    
    --
    Steve
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=uG91WrgZB5UTKLOAB53AcrY5LyBsJ3VyBH8cN7xe2mU&s=9V6TJoE0h5NMjEWZ38ipa3zgYvoLJ1H9GHplSz1DJLU&e=
    
    > On Sep 25, 2018, at 10:39 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
    > 
    > Steve,
    > 
    > I wasn't able to get the sysprop to work.  I think maybe there is a disconnect on my part.
    > 
    > From the documentation it looks like I can only use the sysprop tag if I'm using a Snitch.  Is that correct.
    > 
    > I can't find any example of anyone using the default Snitch.
    > 
    > Here is what I have for my rule:
    > rule=shard:*,replica:1,sysprop.AWSAZ:AZ1&rule=shard:*,replica:2,sysprop.AWSAZ:AZ2&rule=shard:*,replica:3,sysprop.AWSAZ:AZ3
    > 
    > I'm not specifying a snitch.  Is that my problem or is there a problem with my rule?
    > 
    > Thanks for your help.
    > On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:
    > 
    >    Hi Chuck,
    > 
    >    One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
    > 
    >    For example, you could use sysprop “AWSAZ”, then use rules like:
    > 
    >       replica:<2,sysprop.AWSAZ:us-east-1
    >       replica:<2,sysprop.AWSAZ:us-west-1
    >       replica:<2,sysprop.AWSAZ:ca-central-1
    > 
    >    --
    >    Steve
    >    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
    > 
    >> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
    >> 
    >> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
    >> factor of three.  I'm using AWS EC2 instances and I have a requirement to
    >> replicate the data into 3 AWS availability zones.  
    >> 
    >> So 30 servers in each zone and I don't see a create collection rule that
    >> will put one replica in each of the three zones.
    >> 
    >> What am I missing?
    >> 
    >> 
    >> 
    >> --
    >> Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=
    > 
    > 
    >

Re: Rule-based replication or sharing

Posted by Steve Rowe <sa...@gmail.com>.

Chuck,

The default Snitch is the one that’s used if you don’t specify one in a rule.  The sysprop.* tag is provided by the default Snitch.

The only thing that seems wrong to me in your rules is “replica:1”, “replica:2”, and “replica:3” - these say that exactly one, two, and three replicas of each shard, respectively, must be on each of the nodes that has the respective sysprop value.

Since these rules will apply to all nodes that match the sysprop value, you have to allow for the possibility that some nodes will have *zero* replicas of a shard.  So you could specify “replica:<2”, which means that no node can host more than one replica, but it's acceptable for a node to host zero replicas.

Did you set system property AWSAZ on each Solr node with an appropriate value?

--
Steve
www.lucidworks.com

> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds <cr...@ancestry.com> wrote:
> 
> Steve,
> 
> I wasn't able to get the sysprop to work.  I think maybe there is a disconnect on my part.
> 
> From the documentation it looks like I can only use the sysprop tag if I'm using a Snitch.  Is that correct.
> 
> I can't find any example of anyone using the default Snitch.
> 
> Here is what I have for my rule:
> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1&rule=shard:*,replica:2,sysprop.AWSAZ:AZ2&rule=shard:*,replica:3,sysprop.AWSAZ:AZ3
> 
> I'm not specifying a snitch.  Is that my problem or is there a problem with my rule?
> 
> Thanks for your help.
> On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:
> 
>    Hi Chuck,
> 
>    One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
> 
>    For example, you could use sysprop “AWSAZ”, then use rules like:
> 
>       replica:<2,sysprop.AWSAZ:us-east-1
>       replica:<2,sysprop.AWSAZ:us-west-1
>       replica:<2,sysprop.AWSAZ:ca-central-1
> 
>    --
>    Steve
>    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
> 
>> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
>> 
>> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
>> factor of three.  I'm using AWS EC2 instances and I have a requirement to
>> replicate the data into 3 AWS availability zones.  
>> 
>> So 30 servers in each zone and I don't see a create collection rule that
>> will put one replica in each of the three zones.
>> 
>> What am I missing?
>> 
>> 
>> 
>> --
>> Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=
> 
> 
>

Re: Rule-based replication or sharing

Posted by Chuck Reynolds <cr...@ancestry.com>.

Steve,

I wasn't able to get the sysprop to work.  I think maybe there is a disconnect on my part.

From the documentation it looks like I can only use the sysprop tag if I'm using a Snitch.  Is that correct.

I can't find any example of anyone using the default Snitch.

Here is what I have for my rule:
rule=shard:*,replica:1,sysprop.AWSAZ:AZ1&rule=shard:*,replica:2,sysprop.AWSAZ:AZ2&rule=shard:*,replica:3,sysprop.AWSAZ:AZ3

I'm not specifying a snitch.  Is that my problem or is there a problem with my rule?

Thanks for your help.
On 9/21/18, 2:40 PM, "Steve Rowe" <sa...@gmail.com> wrote:

    Hi Chuck,
    
    One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.
    
    For example, you could use sysprop “AWSAZ”, then use rules like:
    
       replica:<2,sysprop.AWSAZ:us-east-1
       replica:<2,sysprop.AWSAZ:us-west-1
       replica:<2,sysprop.AWSAZ:ca-central-1
    
    --
    Steve
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8&e=
    
    > On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
    > 
    > I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
    > factor of three.  I'm using AWS EC2 instances and I have a requirement to
    > replicate the data into 3 AWS availability zones.  
    > 
    > So 30 servers in each zone and I don't see a create collection rule that
    > will put one replica in each of the three zones.
    > 
    > What am I missing?
    > 
    > 
    > 
    > --
    > Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwIFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0&m=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo&s=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY&e=

Re: Rule-based replication or sharing

Posted by Steve Rowe <sa...@gmail.com>.

Hi Chuck,

One way to do it is to set a system property on the JVM running each Solr node, corresponding to the the AWS availability zone on which the node is hosted.

For example, you could use sysprop “AWSAZ”, then use rules like:

   replica:<2,sysprop.AWSAZ:us-east-1
   replica:<2,sysprop.AWSAZ:us-west-1
   replica:<2,sysprop.AWSAZ:ca-central-1

--
Steve
www.lucidworks.com

> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds <cr...@ancestry.com> wrote:
> 
> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
> factor of three.  I'm using AWS EC2 instances and I have a requirement to
> replicate the data into 3 AWS availability zones.  
> 
> So 30 servers in each zone and I don't see a create collection rule that
> will put one replica in each of the three zones.
> 
> What am I missing?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rule-based replication or sharing

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/21/2018 2:07 PM, Chuck Reynolds wrote:
> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
> factor of three.  I'm using AWS EC2 instances and I have a requirement to
> replicate the data into 3 AWS availability zones.
>
> So 30 servers in each zone and I don't see a create collection rule that
> will put one replica in each of the three zones.
>
> What am I missing?

The documentation says that one of the uses is to do things like only 
allowing one replica per rack.

http://lucene.apache.org/solr/guide/7_4/rule-based-replica-placement.html

But it doesn't say how to do this.  Reading the documentation, I cannot 
figure it out.

There's a little more data on the issue that implemented the feature:

https://jira.apache.org/jira/browse/SOLR-6220

But I still can't tell from reading both of these how to actually DO it.

Unanswered questions:

How do I assign arbitrary tags to each node that are then used in a snitch?
How are these rules added to Solr's configuration?
The MODIFYCOLLECTION action is mentioned.  But what if I want to create 
rules for collections that don't exist yet?  I can't modify a collection 
if it doesn't exist yet.
Exactly what text is required, and where?  SOLR-6220 shows JSON-type 
syntax, but that type of syntax is not mentioned in the documentation AT 
ALL.

I remember seeing something come through somewhere (jira? dev list? not 
sure where!) about this feature being deprecated in favor of autoscale 
settings, but I can't find it now.  If autoscaling is the preferred way 
to handle this, how is it done? I can't find anything useful in that 
documentation section either.

Thanks,
Shawn