You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nitin Sharma <ni...@bloomreach.com> on 2014/02/14 19:52:23 UTC

Solr Hot Cpu and high load

Hell folks

  We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster
with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
solrconfig used by our collections

We have many collections and some of them are relatively very large
compared to the other. The size of the shard of these big  collections are
in the order of Gigabytes.We decided to split the bigger collection evenly
across all nodes (8 shards and 2 replicas) with maxNumShards > 1.

We did a test with a read load only on one big collection and we still see
only 2 nodes running 100% CPU and the rest are blazing through the queries
way faster (under 30% cpu). [Despite all of them being sharded across all
nodes]

I checked the JVM usage and found that none of the pools have high
 utilization (except Survivor space which is 100%). The GC cycles are in
the order of ms and mostly doing scavenge. Mark and sweep occurs once every
30 minutes

Few questions:

   1. Sharding all collections (small and large) across all nodes evenly
   distributes the load and makes the system characteristics of all machines
   similar. Is this a recommended way to do ?
   2. Solr Cloud does a distributed query by default. So if a node is at
   100% CPU does it slow down the response time for the other nodes waiting
   for this query? (or does it have a timeout if it cannot get a response from
   a node within x seconds?)
   3. Our collections use Mmap directory but i specifically haven't enabled
   anything related to mmaps (locked pages under ulimit ). Does it adverse
   affect performance? or can lock pages even without this?

Thanks a lot in advance.
Nitin

Re: Solr Hot Cpu and high load

Posted by Nitin Sharma <ni...@bloomreach.com>.
Thanks, Erick. I will try that




On Sun, Feb 16, 2014 at 5:07 PM, Erick Erickson <er...@gmail.com>wrote:

> Stored fields are what the Solr DocumentCache in solrconfig.xml
> is all about.
>
> My general feeling is that stored fields are mostly irrelevant for
> search speed, especially if lazy-loading is enabled. The only time
> stored fields come in to play is when assembling the final result
> list, i.e. the 10 or 20 documents that you return. That does imply
> disk I/O, and if you have massive fields theres also decompression
> to add to the CPU load.
>
> So, as usual, "it depends". Try measuring where you restrict the returned
> fields to whatever your <uniqueKey> field is for one set of tests, then
> try returning _everything_ for another?
>
> Best,
> Erick
>
>
> On Sun, Feb 16, 2014 at 12:18 PM, Nitin Sharma
> <ni...@bloomreach.com>wrote:
>
> > Thanks Tri
> >
> >
> > *a. Are you docs distributed evenly across shards: number of docs and
> size
> > of the shards*
> > >> Yes the size of all the shards is equal (an ignorable delta in the
> order
> > of KB) and so are the # of docs
> >
> > *b. Is your test client querying all nodes, or all the queries go to
> those
> > 2 busy nodes?*
> > *>> *Yes all nodes are receiving exactly the same amount of queries
> >
> >
> > I have one more question. Do stored fields have significant impact on
> > performance of solr queries? Having 50% of the fields stored ( out of 100
> > fields) significantly worse that having 20% of the fields stored?
> > (signficantly == orders of 100s of milliseconds assuming all fields are
> of
> > the same size and type)
> >
> > How are stored fields retrieved in general (always from disk or loaded
> into
> > memory in the first query and then going forward read from memory?)
> >
> > Thanks
> > Nitin
> >
> >
> >
> > On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao <tm...@me.com> wrote:
> >
> > > 1. Yes, that's the right way to go, well, in theory at least :)
> > > 2. Yes, queries are alway fanned to all shards and will be as slow as
> the
> > > slowest shard. When I looked into
> > > Solr distributed querying implementation a few months back, the support
> > > for graceful degradation for things
> > > like network failures and slow shards was not there yet.
> > > 3. I doubt mmap settings would impact your read-only load, and it seems
> > > you can easily
> > > fit your index in RAM. You could try to warm the file cache to make
> sure
> > > with "cat $sorl_dir > /dev/null".
> > >
> > > It's odd that only 2 nodes are at 100% in your set up. I would check a
> > > couple of things:
> > > a. Are you docs distributed evenly across shards: number of docs and
> size
> > > of the shards
> > > b. Is your test client querying all nodes, or all the queries go to
> those
> > > 2 busy nodes?
> > >
> > > Regards,
> > > Tri
> > >
> > > On Feb 14, 2014, at 10:52 AM, Nitin Sharma <
> nitin.sharma@bloomreach.com>
> > > wrote:
> > >
> > > Hell folks
> > >
> > > We are currently using solrcloud 4.3.1. We have 8 node solrcloud
> cluster
> > > with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
> > > solrconfig used by our collections
> > >
> > > We have many collections and some of them are relatively very large
> > > compared to the other. The size of the shard of these big collections
> are
> > > in the order of Gigabytes.We decided to split the bigger collection
> > evenly
> > > across all nodes (8 shards and 2 replicas) with maxNumShards > 1.
> > >
> > > We did a test with a read load only on one big collection and we still
> > see
> > > only 2 nodes running 100% CPU and the rest are blazing through the
> > queries
> > > way faster (under 30% cpu). [Despite all of them being sharded across
> all
> > > nodes]
> > >
> > > I checked the JVM usage and found that none of the pools have high
> > > utilization (except Survivor space which is 100%). The GC cycles are in
> > > the order of ms and mostly doing scavenge. Mark and sweep occurs once
> > every
> > > 30 minutes
> > >
> > > Few questions:
> > >
> > > 1. Sharding all collections (small and large) across all nodes evenly
> > >
> > > distributes the load and makes the system characteristics of all
> machines
> > > similar. Is this a recommended way to do ?
> > > 2. Solr Cloud does a distributed query by default. So if a node is at
> > >
> > > 100% CPU does it slow down the response time for the other nodes
> waiting
> > > for this query? (or does it have a timeout if it cannot get a response
> > from
> > > a node within x seconds?)
> > > 3. Our collections use Mmap directory but i specifically haven't
> enabled
> > >
> > > anything related to mmaps (locked pages under ulimit ). Does it adverse
> > > affect performance? or can lock pages even without this?
> > >
> > > Thanks a lot in advance.
> > > Nitin
> > >
> > >
> >
> >
> > --
> > - N
> >
>



-- 
- N

Re: Solr Hot Cpu and high load

Posted by Erick Erickson <er...@gmail.com>.
Stored fields are what the Solr DocumentCache in solrconfig.xml
is all about.

My general feeling is that stored fields are mostly irrelevant for
search speed, especially if lazy-loading is enabled. The only time
stored fields come in to play is when assembling the final result
list, i.e. the 10 or 20 documents that you return. That does imply
disk I/O, and if you have massive fields theres also decompression
to add to the CPU load.

So, as usual, "it depends". Try measuring where you restrict the returned
fields to whatever your <uniqueKey> field is for one set of tests, then
try returning _everything_ for another?

Best,
Erick


On Sun, Feb 16, 2014 at 12:18 PM, Nitin Sharma
<ni...@bloomreach.com>wrote:

> Thanks Tri
>
>
> *a. Are you docs distributed evenly across shards: number of docs and size
> of the shards*
> >> Yes the size of all the shards is equal (an ignorable delta in the order
> of KB) and so are the # of docs
>
> *b. Is your test client querying all nodes, or all the queries go to those
> 2 busy nodes?*
> *>> *Yes all nodes are receiving exactly the same amount of queries
>
>
> I have one more question. Do stored fields have significant impact on
> performance of solr queries? Having 50% of the fields stored ( out of 100
> fields) significantly worse that having 20% of the fields stored?
> (signficantly == orders of 100s of milliseconds assuming all fields are of
> the same size and type)
>
> How are stored fields retrieved in general (always from disk or loaded into
> memory in the first query and then going forward read from memory?)
>
> Thanks
> Nitin
>
>
>
> On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao <tm...@me.com> wrote:
>
> > 1. Yes, that's the right way to go, well, in theory at least :)
> > 2. Yes, queries are alway fanned to all shards and will be as slow as the
> > slowest shard. When I looked into
> > Solr distributed querying implementation a few months back, the support
> > for graceful degradation for things
> > like network failures and slow shards was not there yet.
> > 3. I doubt mmap settings would impact your read-only load, and it seems
> > you can easily
> > fit your index in RAM. You could try to warm the file cache to make sure
> > with "cat $sorl_dir > /dev/null".
> >
> > It's odd that only 2 nodes are at 100% in your set up. I would check a
> > couple of things:
> > a. Are you docs distributed evenly across shards: number of docs and size
> > of the shards
> > b. Is your test client querying all nodes, or all the queries go to those
> > 2 busy nodes?
> >
> > Regards,
> > Tri
> >
> > On Feb 14, 2014, at 10:52 AM, Nitin Sharma <ni...@bloomreach.com>
> > wrote:
> >
> > Hell folks
> >
> > We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster
> > with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
> > solrconfig used by our collections
> >
> > We have many collections and some of them are relatively very large
> > compared to the other. The size of the shard of these big collections are
> > in the order of Gigabytes.We decided to split the bigger collection
> evenly
> > across all nodes (8 shards and 2 replicas) with maxNumShards > 1.
> >
> > We did a test with a read load only on one big collection and we still
> see
> > only 2 nodes running 100% CPU and the rest are blazing through the
> queries
> > way faster (under 30% cpu). [Despite all of them being sharded across all
> > nodes]
> >
> > I checked the JVM usage and found that none of the pools have high
> > utilization (except Survivor space which is 100%). The GC cycles are in
> > the order of ms and mostly doing scavenge. Mark and sweep occurs once
> every
> > 30 minutes
> >
> > Few questions:
> >
> > 1. Sharding all collections (small and large) across all nodes evenly
> >
> > distributes the load and makes the system characteristics of all machines
> > similar. Is this a recommended way to do ?
> > 2. Solr Cloud does a distributed query by default. So if a node is at
> >
> > 100% CPU does it slow down the response time for the other nodes waiting
> > for this query? (or does it have a timeout if it cannot get a response
> from
> > a node within x seconds?)
> > 3. Our collections use Mmap directory but i specifically haven't enabled
> >
> > anything related to mmaps (locked pages under ulimit ). Does it adverse
> > affect performance? or can lock pages even without this?
> >
> > Thanks a lot in advance.
> > Nitin
> >
> >
>
>
> --
> - N
>

Re: Solr Hot Cpu and high load

Posted by Nitin Sharma <ni...@bloomreach.com>.
Thanks Tri


*a. Are you docs distributed evenly across shards: number of docs and size
of the shards*
>> Yes the size of all the shards is equal (an ignorable delta in the order
of KB) and so are the # of docs

*b. Is your test client querying all nodes, or all the queries go to those
2 busy nodes?*
*>> *Yes all nodes are receiving exactly the same amount of queries


I have one more question. Do stored fields have significant impact on
performance of solr queries? Having 50% of the fields stored ( out of 100
fields) significantly worse that having 20% of the fields stored?
(signficantly == orders of 100s of milliseconds assuming all fields are of
the same size and type)

How are stored fields retrieved in general (always from disk or loaded into
memory in the first query and then going forward read from memory?)

Thanks
Nitin



On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao <tm...@me.com> wrote:

> 1. Yes, that's the right way to go, well, in theory at least :)
> 2. Yes, queries are alway fanned to all shards and will be as slow as the
> slowest shard. When I looked into
> Solr distributed querying implementation a few months back, the support
> for graceful degradation for things
> like network failures and slow shards was not there yet.
> 3. I doubt mmap settings would impact your read-only load, and it seems
> you can easily
> fit your index in RAM. You could try to warm the file cache to make sure
> with "cat $sorl_dir > /dev/null".
>
> It's odd that only 2 nodes are at 100% in your set up. I would check a
> couple of things:
> a. Are you docs distributed evenly across shards: number of docs and size
> of the shards
> b. Is your test client querying all nodes, or all the queries go to those
> 2 busy nodes?
>
> Regards,
> Tri
>
> On Feb 14, 2014, at 10:52 AM, Nitin Sharma <ni...@bloomreach.com>
> wrote:
>
> Hell folks
>
> We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster
> with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
> solrconfig used by our collections
>
> We have many collections and some of them are relatively very large
> compared to the other. The size of the shard of these big collections are
> in the order of Gigabytes.We decided to split the bigger collection evenly
> across all nodes (8 shards and 2 replicas) with maxNumShards > 1.
>
> We did a test with a read load only on one big collection and we still see
> only 2 nodes running 100% CPU and the rest are blazing through the queries
> way faster (under 30% cpu). [Despite all of them being sharded across all
> nodes]
>
> I checked the JVM usage and found that none of the pools have high
> utilization (except Survivor space which is 100%). The GC cycles are in
> the order of ms and mostly doing scavenge. Mark and sweep occurs once every
> 30 minutes
>
> Few questions:
>
> 1. Sharding all collections (small and large) across all nodes evenly
>
> distributes the load and makes the system characteristics of all machines
> similar. Is this a recommended way to do ?
> 2. Solr Cloud does a distributed query by default. So if a node is at
>
> 100% CPU does it slow down the response time for the other nodes waiting
> for this query? (or does it have a timeout if it cannot get a response from
> a node within x seconds?)
> 3. Our collections use Mmap directory but i specifically haven't enabled
>
> anything related to mmaps (locked pages under ulimit ). Does it adverse
> affect performance? or can lock pages even without this?
>
> Thanks a lot in advance.
> Nitin
>
>


-- 
- N