You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Torsten Albrecht <to...@soahc.eu> on 2013/08/07 19:15:12 UTC

Internal shard communication - performance?

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per second. At my live system (20 shards) I get 200 requests per second.

Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high availability system?


Regards,

Torsten

RE: Internal shard communication - performance?

Posted by Alexey Kozhemiakin <Al...@epam.com>.
Hi Tim, Torsten,

Please review following threads which covers chatty shard-shard and shard-replica conversations, and since you index large volumes of data it can be a  potential bottleneck in your case.

http://lucene.472066.n3.nabble.com/Sharding-and-Replication-td4071614.html

http://lucene.472066.n3.nabble.com/Performance-vs-maxBufferedAddsPerServer-10-td4080283.html 

https://issues.apache.org/jira/browse/SOLR-4956 


-----Original Message-----
From: Tim Vaillancourt [mailto:tim@elementspace.com] 
Sent: Monday, August 12, 2013 08:19
To: solr-user@lucene.apache.org
Subject: Re: Internal shard communication - performance?

For me the biggest deal with increased chatter between SolrCloud is object creation and GCs.

The resulting CPU load from the increase GCing seems to affect performance for me in some load tests, but I'm still trying to gather hard numbers on it.

Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:
> On 8/7/2013 2:45 PM, Torsten Albrecht wrote:
>> I would like to run zookeeper external at my old master server.
>>
>> So I have two zookeeper to control my cloud. The third and fourth 
>> zookeeper will be a virtual machine.
>
> For true HA with zookepeer, you need at least three instances on 
> separate physical hardware.  If you want to use VMs, that would be 
> fine, but you must ensure that you aren't running more than one 
> instance on the same physical server.
>
> For best results, use an odd number of ZK instances.  With three ZK 
> instances, one can go down and everything still works.  With five, two 
> can go down and everything still works.
>
> If you've got a fully switched network that's at least gigabit speed, 
> then the network latency involved in internal communication shouldn't 
> really matter.
>
> Thanks,
> Shawn
>

Re: Internal shard communication - performance?

Posted by Tim Vaillancourt <ti...@elementspace.com>.
For me the biggest deal with increased chatter between SolrCloud is 
object creation and GCs.

The resulting CPU load from the increase GCing seems to affect 
performance for me in some load tests, but I'm still trying to gather 
hard numbers on it.

Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:
> On 8/7/2013 2:45 PM, Torsten Albrecht wrote:
>> I would like to run zookeeper external at my old master server.
>>
>> So I have two zookeeper to control my cloud. The third and fourth 
>> zookeeper will be a virtual machine.
>
> For true HA with zookepeer, you need at least three instances on 
> separate physical hardware.  If you want to use VMs, that would be 
> fine, but you must ensure that you aren't running more than one 
> instance on the same physical server.
>
> For best results, use an odd number of ZK instances.  With three ZK 
> instances, one can go down and everything still works.  With five, two 
> can go down and everything still works.
>
> If you've got a fully switched network that's at least gigabit speed, 
> then the network latency involved in internal communication shouldn't 
> really matter.
>
> Thanks,
> Shawn
>

Re: Internal shard communication - performance?

Posted by Shawn Heisey <so...@elyograg.org>.
On 8/7/2013 2:45 PM, Torsten Albrecht wrote:
> I would like to run zookeeper external at my old master server.
>
> So I have two zookeeper to control my cloud. The third and fourth zookeeper will be a virtual machine.

For true HA with zookepeer, you need at least three instances on 
separate physical hardware.  If you want to use VMs, that would be fine, 
but you must ensure that you aren't running more than one instance on 
the same physical server.

For best results, use an odd number of ZK instances.  With three ZK 
instances, one can go down and everything still works.  With five, two 
can go down and everything still works.

If you've got a fully switched network that's at least gigabit speed, 
then the network latency involved in internal communication shouldn't 
really matter.

Thanks,
Shawn


Re: Internal shard communication - performance?

Posted by Torsten Albrecht <to...@soahc.eu>.
Hi Jack,

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth zookeeper will be a virtual machine.


Torsten


Von: Jack Krupansky
Gesendet: ?Mittwoch?, ?7?. ?August? ?2013 ?20?:?05
An: solr-user@lucene.apache.org

Three zookeepers give you bare minimum high availability - one can go down.

But... I would personally assert that running embedded zookeeper is
inherently not "high availability", just by definition (okay, by MY
definition.)

You didn't say whether you were running embedded zookeeper or not.

But if you were, to be HA, your cluster should be able to have all but one
node per shard go down and your cluster should still service both queries
and updates. But with embedded zookeeper on a four-node cluster, taking down
two of the nodes running embedded zookeeper would make zookeeper no longer
usable, and hence your cluster would not be HA.

-- Jack Krupansky

-----Original Message-----
From: Torsten Albrecht
Sent: Wednesday, August 07, 2013 1:15 PM
To: solr-user
Subject: Internal shard communication - performance?

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per
second. At my live system (20 shards) I get 200 requests per second.

Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high
availability system?


Regards,

Torsten


Re: Internal shard communication - performance?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Three zookeepers give you bare minimum high availability - one can go down.

But... I would personally assert that running embedded zookeeper is 
inherently not "high availability", just by definition (okay, by MY 
definition.)

You didn't say whether you were running embedded zookeeper or not.

But if you were, to be HA, your cluster should be able to have all but one 
node per shard go down and your cluster should still service both queries 
and updates. But with embedded zookeeper on a four-node cluster, taking down 
two of the nodes running embedded zookeeper would make zookeeper no longer 
usable, and hence your cluster would not be HA.

-- Jack Krupansky

-----Original Message----- 
From: Torsten Albrecht
Sent: Wednesday, August 07, 2013 1:15 PM
To: solr-user
Subject: Internal shard communication - performance?

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per 
second. At my live system (20 shards) I get 200 requests per second.

Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high 
availability system?


Regards,

Torsten