You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2013/08/11 16:46:48 UTC

[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

    [ https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736291#comment-13736291 ] 

Erick Erickson commented on SOLR-4956:
--------------------------------------

So it sounds like we have competing needs here. On the one hand
we have several anecdotal statements that upping the buffer size
had significant impact on throughput.

On the other, just upping the buffer size has potential for Bad
Outcomes.

So it seems we have three options here:
1> make it configurable with a warning that if you change it it
   may lead to Bad Stuff.
2> Leave it as-is and forget about it.
3> Do the harder thing and see if we can figure out why changing
   the batch size makes such a difference and fix the underlying
   cause (if there is one).

I'm totally unfamiliar with the code, but the 20,000 ft. smell is
that there's something about the intra-node routing code that's
very inefficient and making the buffers bigger is masking that. On
the surface, just sending the packets around doesn't seem like it
should spike the CPU that much... But like I said, I haven't looked
at the code at all.
                
> make maxBufferedAddsPerServer configurable
> ------------------------------------------
>
>                 Key: SOLR-4956
>                 URL: https://issues.apache.org/jira/browse/SOLR-4956
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.3, 5.0
>            Reporter: Erick Erickson
>
> Anecdotal user's list evidence indicates that in high-throughput situations, the default of 10 docs/batch for inter-shard batching can generate significant CPU load. See the thread titled "Sharding and Replication" on June 19th, but the gist is below.
> I haven't poked around, but it's a little surprising on the surface that Asif is seeing this kind of difference. So I'm wondering if this change indicates some other underlying issue. Regardless, this seems like it would be good to investigate.
> Here's the gist of Asif's experience from the thread:
> Its a completely practical problem - we are exploring Solr to build a real
> time analytics/data solution for a system handling about 1000 qps. We have
> various metrics that are stored as different collections on the cloud,
> which means very high amount of writes. The cloud also needs to support
> about 300-400 qps.
> We initially tested with a single Solr node on a 16 core / 24 GB box  for a
> single metric. We saw that writes were not a issue at all - Solr was
> handling it extremely well. We were also able to achieve about 200 qps from
> a single node.
> When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
> usage on the replicas. Up to 10 cores were getting used for writes on the
> replicas. Hence my concern with respect to batch updates for the replicas.
> BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
> very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org