You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jakov Sosic <js...@gmail.com> on 2017/04/26 23:00:05 UTC

Shard CPU usage?

Hi guys,

I was wondering does the introduction of shards actually increase CPU usage?

I have a 30GB index split into two shards (15GB each), and by analyzing 
the logs, I figured out that ~80% of the queries have the 
"&shard.url=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/".

I basically don't need sharding, and am now starting to wonder if shards 
are actually increasing the CPU usage of my nodes or not, cause of the 
huge percentage of queries with "shard.url=" flag?

I'm fighting with high cpu usage, and if turning sharding of and just 
keeping the replicas in my collection would lower the CPU usage for more 
then 10% I would choose that path..


Any insights?

Thanks.

Re: Shard CPU usage?

Posted by Erick Erickson <er...@gmail.com>.

Sharding should, in general, _not_ be used as long as the response
time for individual queries is acceptable. It imposes a certain amount
of overhead. The typical process is two-pass. pass1: get the candidate
top N docs from a replica on each shard. pass2: have each shard return
its portion of the top N docs found in pass 1.

There's an option for one-pass processing, but I don't think that's
really what you're looking for here.

There will be M sub-queries sent out, one to a replica for each of
your M shards. Etc.

So if everything fits in one shard with adequate response times, I'd
recommend you have only one. Add _replicas_ to get more QPS, possibly
on different machines.

You still get all the goodness of HA/DR with SolrCloud, so it's
perfectly reasonable to have a 1-shard collection with N replicas
handled by SolrCloud.

Best,
Erick

On Wed, Apr 26, 2017 at 4:00 PM, Jakov Sosic <js...@gmail.com> wrote:
> Hi guys,
>
> I was wondering does the introduction of shards actually increase CPU usage?
>
> I have a 30GB index split into two shards (15GB each), and by analyzing the
> logs, I figured out that ~80% of the queries have the
> "&shard.url=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/".
>
> I basically don't need sharding, and am now starting to wonder if shards are
> actually increasing the CPU usage of my nodes or not, cause of the huge
> percentage of queries with "shard.url=" flag?
>
> I'm fighting with high cpu usage, and if turning sharding of and just
> keeping the replicas in my collection would lower the CPU usage for more
> then 10% I would choose that path..
>
>
> Any insights?
>
> Thanks.
>