You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Nicola Gordon <Ni...@D2L.com> on 2018/04/28 22:26:25 UTC

solr resource usage patterns

Hello,

Hoping someone has some insight on this.  I need to understand resource usage patterns seen at solr cluster.
Any insight/any info on what solr is doing would be much appreciated!  Here's what I see:

This pattern of CPU usage is seen throughout indexing - each of the lines is one of the 3 SOLR instances in my cluster.
[cid:image002.jpg@01D3DF1E.68DB1EC0]

Same seen for other resources eg:
[cid:image007.jpg@01D3DF1E.68DB1EC0]

Also, why is one of the solr instance receiving (and sending) consistently less network data than the others?
[cid:image008.jpg@01D3DF1E.68DB1EC0]

Thanks for any insight!
- Nicola

Re: solr resource usage patterns

Posted by Erick Erickson <er...@gmail.com>.

Apache's e-mail server is pretty aggressive about stripping
attachments, none of your images came through. You can put them
somewhere else and provide a link....

I happened to see the original e-mail so here are a couple of possibilities

CPU usage: Does the start of each spike correlate with a commit?
Either hard or soft? Hard commits will start a background merge which
can be CPU intensive (as well as I/O). Hard commits with
openSearcher=true or soft commits will trigger autowarming as they
both open a new searcher. The length of your CPU spikes hints that if
it is autowarming you may have excessive autowarm counts configured in
solrconfig.xml.

Although looking again, you have almost zero usage between spikes,
which contrariwise suggests that your indexing process is bursting
docs to Solr. Assuming you have a SolrJ program that fires docs in
batches (and it should), then my guess is that you send a batch, then
your SolrJ program spends time assembling the next batch during which
time Solr is just idling. Ditto if you're using some other process to
send docs to Solr.

The fact that the first and third graphs track each other so closely
(assuming that they are the exact same time interval) really looks
like your indexing process is bursting docs to Solr.

I don't quite know what to say about the second graph.

The third graph could be related to what roles the replicas on each
node have and how you're indexing and the like. Updates for a shard
are forwarded to the leader. From there they're sent to followers so
the machines the leaders are on can have considerably more traffic
since they get the original input then send it on whereas the
followers don't redistribute the input.

If you're not using SolrJ (and specifically CloudSolrClient), then the
documents just land on some node. From there they're forwarded to the
appropriate leader and the above is repeated from that point. So
another possibility is that the third machine isn't the target for,
say, HTTP updates. Are you updating by sending docs to a specific
node?

Third is if there's a skew in the kinds of replicas on each node. Say
all your leaders are on two nodes and all your followers are on the
third node. Then likely there'll be more traffic on the first two.
this all assumes NRT replicas (the default). If you specify TLOG or
PULL replicas, where which ones live could also be part of the reason.

All this at a guess of course since I don't know much about your
indexing process.

Best,
Erick

On Sat, Apr 28, 2018 at 3:26 PM, Nicola Gordon <Ni...@d2l.com> wrote:
> Hello,
>
>
>
> Hoping someone has some insight on this.  I need to understand resource
> usage patterns seen at solr cluster.
>
> Any insight/any info on what solr is doing would be much appreciated!
> Here’s what I see:
>
>
>
> This pattern of CPU usage is seen throughout indexing – each of the lines is
> one of the 3 SOLR instances in my cluster.
>
>
>
> Same seen for other resources eg:
>
>
>
> Also, why is one of the solr instance receiving (and sending) consistently
> less network data than the others?
>
>
>
> Thanks for any insight!
>
> - Nicola
>
>