You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexey Kozhemiakin <Al...@epam.com> on 2013/08/09 22:17:24 UTC

RE: Sharding and Replication

+1 I'd like to vote for this issue https://issues.apache.org/jira/browse/SOLR-4956

It would be useful to have this parameters configurable.

When we index hundreds of millions of documents to 4 shard  SolrCloud in batches of 20K -  overhead of this chatty conversation with replicas and other shards is significant, we didn't perform detailed measurements, but increase of this hardcoded value improved our indexing throughput from 1.2mln up to 3mln docs per minute.

It agree that in general case it is more correct to reduce the value, but it would be nice to control it for specific cases and environments.

Alex
-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Sunday, June 23, 2013 20:41
To: solr-user@lucene.apache.org
Subject: Re: Sharding and Replication

Asif:

Thanks, this is great info and may add to the priority of making this configurable.
I raised a JIRA, see: https://issues.apache.org/jira/browse/SOLR-4956
and feel free to add anything you'd like or correct anything I didn't get right.

Best
Erick

On Sat, Jun 22, 2013 at 10:16 PM, Asif <ta...@gmail.com> wrote:
> Erick,
>
> Its a completely practical problem - we are exploring Solr to build a 
> real time analytics/data solution for a system handling about 1000 
> qps. We have various metrics that are stored as different collections 
> on the cloud, which means very high amount of writes. The cloud also 
> needs to support about 300-400 qps.
>
> We initially tested with a single Solr node on a 16 core / 24 GB box  
> for a single metric. We saw that writes were not a issue at all - Solr 
> was handling it extremely well. We were also able to achieve about 200 
> qps from a single node.
>
> When we set up the cloud ( a ensemble on 6 boxes), we saw very high 
> CPU usage on the replicas. Up to 10 cores were getting used for writes 
> on the replicas. Hence my concern with respect to batch updates for the replicas.
>
> BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU 
> usage is very similar to single node installation.
>
> - Asif
>
>
>
>
>
>
> On Sat, Jun 22, 2013 at 9:53 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> Yeah, there's been talk of making this configurable, but there are 
>> more pressing priorities so far.
>>
>> So just to be clear, is this theoretical or practical? I know of 
>> several very high-performance situations where 1,000 updates/sec (and 
>> I'm assuming that it's 1,000 docs/sec not 1,000 batches of 1,000 
>> docs) hasn't caused problems here. So unless you're actually seeing 
>> performance problems as opposed to fearing that there _might_ be, I'd 
>> just go on the to the next urgent problem.
>>
>> Best
>> Erick
>>
>> On Fri, Jun 21, 2013 at 8:34 PM, Asif <ta...@gmail.com> wrote:
>> > Erick,
>> >
>> > Thanks for your reply.
>> >
>> > You are right about 10 updates being batch up - It was hard to 
>> > figure out due to large number of updates/logging that happens in our system.
>> >
>> > We are batching 1000 updates every time.
>> >
>> > Here is my observation from leader and replica -
>> >
>> > 1. Leader logs are clearly indicating that 1000 updates arrived - [ 
>> > (1000 adds)],commit=] 2. On replica - for each 1000 document adds 
>> > on leader - I see a lot of requests on replica - with no indication 
>> > of how many updates in each request.
>> >
>> > Digging a little bit into Solr code  I figured this variable I am 
>> > interested in - maxBufferedAddsPerServer is set to 10 -
>> >
>> >
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/
>> apache/solr/update/SolrCmdDistributor.java?view=markup
>> >
>> > This means for a batch update of 1000 documents - we will be seeing 
>> > 100 requests for replica - which translates into 100 writes per 
>> > collection
>> per
>> > second in our system.
>> >
>> > Should this variable be made configurable via solrconfig.xml (or 
>> > any
>> other
>> > appropriate place)?
>> >
>> > A little background about a system we are trying to build - real 
>> > time analytics solution using the Solr Cloud + Atomic updates - we 
>> > have very high amount of writes - going as high as 1000 updates a 
>> > second (possibly more in long run).
>> >
>> > - Asif
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Jun 22, 2013 at 4:21 AM, Erick Erickson 
>> ><erickerickson@gmail.com
>> >wrote:
>> >
>> >> Update are batched, but it's on a per-request basis. So, if you're 
>> >> sending one document at a time you'll won't get any batching. If 
>> >> you send 10 docs at a time and they happen to go to 10 different 
>> >> shards, you'll get 10 different update requests.
>> >>
>> >> If you're sending 1,000 docs per update you' should be seeing some 
>> >> batching going on.
>> >>
>> >> bq:  but why not batch them up or give a option to batch N updates 
>> >> in either of the above case
>> >>
>> >> I suspect what you're seeing is that you're not sending very many 
>> >> docs per update request and so are being mislead.
>> >>
>> >> But that's a guess since you haven't provided much in the way of 
>> >> data on _how_ you're updating.
>> >>
>> >> bq: the cloud eventually starts to fail How? Details matter.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Jun 19, 2013 at 4:23 AM, Asif <ta...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I had questions on implementation of Sharding and Replication
>> features of
>> >> > Solr/Cloud.
>> >> >
>> >> > 1. I noticed that when sharding is enabled for a collection -
>> individual
>> >> > requests are sent to each node serving as a shard.
>> >> >
>> >> > 2. Replication too follows above strategy of sending individual
>> documents
>> >> > to the nodes serving as a replica.
>> >> >
>> >> > I am working with a system that requires massive number of 
>> >> > writes - I
>> >> have
>> >> > noticed that due to above reason - the cloud eventually starts 
>> >> > to fail (Even though I am using a ensemble).
>> >> >
>> >> > I do understand the reason behind individual updates - but why 
>> >> > not
>> batch
>> >> > them up or give a option to batch N updates in either of the 
>> >> > above
>> case
>> >> - I
>> >> > did come across a presentation that talked about batching 10 
>> >> > updates
>> for
>> >> > replication at least, but I do not think this is the case.
>> >> > - Asif
>> >>
>>