You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tom Mortimer <to...@gmail.com> on 2013/10/21 17:48:14 UTC

SolrCloud performance in VM environment

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              64.50         0.00      2476.00          0       4952
...
sdb               8.90         0.00       348.00          0       6960
...
sdb               1.15         0.00        43.20          0        864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb               0.40         0.00         8.00          0        160
...
sdb               0.30         0.00        10.40          0        104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom

Re: SolrCloud performance in VM environment

Posted by Erick Erickson <er...@gmail.com>.

Be a bit careful here. 128G is lots of memory, you may encounter very long
garbage collection pauses. Just be aware that this may be happening later.

Best,
Erick


On Tue, Oct 22, 2013 at 5:04 PM, Tom Mortimer <to...@gmail.com> wrote:

> Just tried it with no other changes than upping the RAM to 128GB total, and
> it's flying. I think that proves that RAM is good. =)  Will implement
> suggested changes later, though.
>
> cheers,
> Tom
>
>
> On 22 October 2013 09:04, Tom Mortimer <to...@gmail.com> wrote:
>
> > Boogie, Shawn,
> >
> > Thanks for the replies. I'm going to try out some of your suggestions
> > today. Although, without more RAM I'm not that optimistic..
> >
> > Tom
> >
> >
> >
> > On 21 October 2013 18:40, Shawn Heisey <so...@elyograg.org> wrote:
> >
> >> On 10/21/2013 9:48 AM, Tom Mortimer wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> I've been working on an installation recently which uses SolrCloud to
> >>> index
> >>> 45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with
> another
> >>> 2
> >>> identical VMs set up for replicas). The reason we're using so many
> shards
> >>> for a relatively small index is that there are complex filtering
> >>> requirements at search time, to restrict users to items they are
> licensed
> >>> to view. Initial tests demonstrated that multiple shards would be
> >>> required.
> >>>
> >>> The total size of the index is about 140GB, and each VM has 16GB RAM
> >>> (32GB
> >>> total) and 4 CPU units. I know this is far under what would normally be
> >>> recommended for an index of this size, and I'm working on persuading
> the
> >>> customer to increase the RAM (basically, telling them it won't work
> >>> otherwise.) Performance is currently pretty poor and I would expect
> more
> >>> RAM to improve things. However, there are a couple of other oddities
> >>> which
> >>> concern me,
> >>>
> >>
> >> Running multiple shards like you are, where each operating system is
> >> handling more than one shard, is only going to perform better if your
> query
> >> volume is low and you have lots of CPU cores.  If your query volume is
> high
> >> or you only have 2-4 CPU cores on each VM, you might be better off with
> >> fewer shards or not sharded at all.
> >>
> >> The way that I read this is that you've got two physical machines with
> >> 32GB RAM, each running two VMs that have 16GB.  Each VM houses 4
> shards, or
> >> 70GB of index.
> >>
> >> There's a scenario that might be better if all of the following are
> true:
> >> 1) I'm right about how your hardware is provisioned.  2) You or the
> client
> >> owns the hardware.  3) You have an extremely low-end third machine
> >> available - single CPU with 1GB of RAM would probably be enough.  In
> this
> >> scenario, you run one Solr instance and one zookeeper instance on each
> of
> >> your two "big" machines, and use the third wimpy machine as a third
> >> zookeeper node.  No virtualization.  For the rest of my reply, I'm
> assuming
> >> that you haven't taken this step, but it will probably apply either way.
> >>
> >>
> >>  The first is that I've been reindexing a fixed set of 500 docs to test
> >>> indexing and commit performance (with soft commits within 60s). The
> time
> >>> taken to complete a hard commit after this is longer than I'd expect,
> and
> >>> highly variable - from 10s to 70s. This makes me wonder whether the SAN
> >>> (which provides all the storage for these VMs and the customers several
> >>> other VMs) is being saturated periodically. I grabbed some iostat
> output
> >>> on
> >>> different occasions to (possibly) show the variability:
> >>>
> >>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> >>> sdb              64.50         0.00      2476.00          0       4952
> >>> ...
> >>> sdb               8.90         0.00       348.00          0       6960
> >>> ...
> >>> sdb               1.15         0.00        43.20          0        864
> >>>
> >>
> >> There are two likely possibilities for this.  One or both of them might
> >> be in play.  1) Because the OS disk cache is small, not much of the
> index
> >> can be cached.  This can result in a lot of disk I/O for a commit,
> slowing
> >> things way down.  Increasing the size of the OS disk cache is really the
> >> only solution for that. 2) Cache autowarming, particularly the filter
> >> cache.  In the cache statistics, you can see how long each cache took to
> >> warm up after the last searcher was opened.  The solution for that is to
> >> reduce the autowarmCount values.
> >>
> >>
> >>  The other thing that confuses me is that after a Solr restart or hard
> >>> commit, search times average about 1.2s under light load. After
> searching
> >>> the same set of queries for 5-6 iterations this improves to 0.1s.
> >>> However,
> >>> in either case - cold or warm - iostat reports no device reads at all:
> >>>
> >>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> >>> sdb               0.40         0.00         8.00          0        160
> >>> ...
> >>> sdb               0.30         0.00        10.40          0        104
> >>>
> >>> (the writes are due to logging). This implies to me that the 'hot'
> blocks
> >>> are being completely cached in RAM - so why the variation in search
> time
> >>> and the number of iterations required to speed it up?
> >>>
> >>
> >> Linux is pretty good about making limited OS disk cache resources work.
> >>  Sounds like the caching is working reasonably well for queries.  It
> might
> >> not be working so well for updates or commits, though.
> >>
> >> Running multiple Solr JVMs per machine, virtual or not, causes more
> >> problems than it solves.  Solr has no limits on the number of cores
> (shard
> >> replicas) per instance, assuming there are enough system resources.
>  There
> >> should be exactly one Solr JVM per operating system.  Running more than
> one
> >> results in quite a lot of overhead, and your memory is precious.  When
> you
> >> create a collection, you can give the collections API the
> >> "maxShardsPerNode" parameter to create more than one shard per instance.
> >>
> >>
> >>  I don't have a great deal of experience in low-level performance
> tuning,
> >>> so
> >>> please forgive any naivety. Any ideas of what to do next would be
> greatly
> >>> appreciated. I don't currently have details of the VM implementation
> but
> >>> can get hold of this if it's relevant.
> >>>
> >>
> >> I don't think the virtualization details matter all that much.  Please
> >> feel free to ask questions or supply more info based on what I've told
> you.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
>

Re: SolrCloud performance in VM environment

Posted by Tom Mortimer <to...@gmail.com>.

Just tried it with no other changes than upping the RAM to 128GB total, and
it's flying. I think that proves that RAM is good. =)  Will implement
suggested changes later, though.

cheers,
Tom


On 22 October 2013 09:04, Tom Mortimer <to...@gmail.com> wrote:

> Boogie, Shawn,
>
> Thanks for the replies. I'm going to try out some of your suggestions
> today. Although, without more RAM I'm not that optimistic..
>
> Tom
>
>
>
> On 21 October 2013 18:40, Shawn Heisey <so...@elyograg.org> wrote:
>
>> On 10/21/2013 9:48 AM, Tom Mortimer wrote:
>>
>>> Hi everyone,
>>>
>>> I've been working on an installation recently which uses SolrCloud to
>>> index
>>> 45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another
>>> 2
>>> identical VMs set up for replicas). The reason we're using so many shards
>>> for a relatively small index is that there are complex filtering
>>> requirements at search time, to restrict users to items they are licensed
>>> to view. Initial tests demonstrated that multiple shards would be
>>> required.
>>>
>>> The total size of the index is about 140GB, and each VM has 16GB RAM
>>> (32GB
>>> total) and 4 CPU units. I know this is far under what would normally be
>>> recommended for an index of this size, and I'm working on persuading the
>>> customer to increase the RAM (basically, telling them it won't work
>>> otherwise.) Performance is currently pretty poor and I would expect more
>>> RAM to improve things. However, there are a couple of other oddities
>>> which
>>> concern me,
>>>
>>
>> Running multiple shards like you are, where each operating system is
>> handling more than one shard, is only going to perform better if your query
>> volume is low and you have lots of CPU cores.  If your query volume is high
>> or you only have 2-4 CPU cores on each VM, you might be better off with
>> fewer shards or not sharded at all.
>>
>> The way that I read this is that you've got two physical machines with
>> 32GB RAM, each running two VMs that have 16GB.  Each VM houses 4 shards, or
>> 70GB of index.
>>
>> There's a scenario that might be better if all of the following are true:
>> 1) I'm right about how your hardware is provisioned.  2) You or the client
>> owns the hardware.  3) You have an extremely low-end third machine
>> available - single CPU with 1GB of RAM would probably be enough.  In this
>> scenario, you run one Solr instance and one zookeeper instance on each of
>> your two "big" machines, and use the third wimpy machine as a third
>> zookeeper node.  No virtualization.  For the rest of my reply, I'm assuming
>> that you haven't taken this step, but it will probably apply either way.
>>
>>
>>  The first is that I've been reindexing a fixed set of 500 docs to test
>>> indexing and commit performance (with soft commits within 60s). The time
>>> taken to complete a hard commit after this is longer than I'd expect, and
>>> highly variable - from 10s to 70s. This makes me wonder whether the SAN
>>> (which provides all the storage for these VMs and the customers several
>>> other VMs) is being saturated periodically. I grabbed some iostat output
>>> on
>>> different occasions to (possibly) show the variability:
>>>
>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>> sdb              64.50         0.00      2476.00          0       4952
>>> ...
>>> sdb               8.90         0.00       348.00          0       6960
>>> ...
>>> sdb               1.15         0.00        43.20          0        864
>>>
>>
>> There are two likely possibilities for this.  One or both of them might
>> be in play.  1) Because the OS disk cache is small, not much of the index
>> can be cached.  This can result in a lot of disk I/O for a commit, slowing
>> things way down.  Increasing the size of the OS disk cache is really the
>> only solution for that. 2) Cache autowarming, particularly the filter
>> cache.  In the cache statistics, you can see how long each cache took to
>> warm up after the last searcher was opened.  The solution for that is to
>> reduce the autowarmCount values.
>>
>>
>>  The other thing that confuses me is that after a Solr restart or hard
>>> commit, search times average about 1.2s under light load. After searching
>>> the same set of queries for 5-6 iterations this improves to 0.1s.
>>> However,
>>> in either case - cold or warm - iostat reports no device reads at all:
>>>
>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>> sdb               0.40         0.00         8.00          0        160
>>> ...
>>> sdb               0.30         0.00        10.40          0        104
>>>
>>> (the writes are due to logging). This implies to me that the 'hot' blocks
>>> are being completely cached in RAM - so why the variation in search time
>>> and the number of iterations required to speed it up?
>>>
>>
>> Linux is pretty good about making limited OS disk cache resources work.
>>  Sounds like the caching is working reasonably well for queries.  It might
>> not be working so well for updates or commits, though.
>>
>> Running multiple Solr JVMs per machine, virtual or not, causes more
>> problems than it solves.  Solr has no limits on the number of cores (shard
>> replicas) per instance, assuming there are enough system resources.  There
>> should be exactly one Solr JVM per operating system.  Running more than one
>> results in quite a lot of overhead, and your memory is precious.  When you
>> create a collection, you can give the collections API the
>> "maxShardsPerNode" parameter to create more than one shard per instance.
>>
>>
>>  I don't have a great deal of experience in low-level performance tuning,
>>> so
>>> please forgive any naivety. Any ideas of what to do next would be greatly
>>> appreciated. I don't currently have details of the VM implementation but
>>> can get hold of this if it's relevant.
>>>
>>
>> I don't think the virtualization details matter all that much.  Please
>> feel free to ask questions or supply more info based on what I've told you.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: SolrCloud performance in VM environment

Posted by Tom Mortimer <to...@gmail.com>.

Boogie, Shawn,

Thanks for the replies. I'm going to try out some of your suggestions
today. Although, without more RAM I'm not that optimistic..

Tom



On 21 October 2013 18:40, Shawn Heisey <so...@elyograg.org> wrote:

> On 10/21/2013 9:48 AM, Tom Mortimer wrote:
>
>> Hi everyone,
>>
>> I've been working on an installation recently which uses SolrCloud to
>> index
>> 45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
>> identical VMs set up for replicas). The reason we're using so many shards
>> for a relatively small index is that there are complex filtering
>> requirements at search time, to restrict users to items they are licensed
>> to view. Initial tests demonstrated that multiple shards would be
>> required.
>>
>> The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
>> total) and 4 CPU units. I know this is far under what would normally be
>> recommended for an index of this size, and I'm working on persuading the
>> customer to increase the RAM (basically, telling them it won't work
>> otherwise.) Performance is currently pretty poor and I would expect more
>> RAM to improve things. However, there are a couple of other oddities which
>> concern me,
>>
>
> Running multiple shards like you are, where each operating system is
> handling more than one shard, is only going to perform better if your query
> volume is low and you have lots of CPU cores.  If your query volume is high
> or you only have 2-4 CPU cores on each VM, you might be better off with
> fewer shards or not sharded at all.
>
> The way that I read this is that you've got two physical machines with
> 32GB RAM, each running two VMs that have 16GB.  Each VM houses 4 shards, or
> 70GB of index.
>
> There's a scenario that might be better if all of the following are true:
> 1) I'm right about how your hardware is provisioned.  2) You or the client
> owns the hardware.  3) You have an extremely low-end third machine
> available - single CPU with 1GB of RAM would probably be enough.  In this
> scenario, you run one Solr instance and one zookeeper instance on each of
> your two "big" machines, and use the third wimpy machine as a third
> zookeeper node.  No virtualization.  For the rest of my reply, I'm assuming
> that you haven't taken this step, but it will probably apply either way.
>
>
>  The first is that I've been reindexing a fixed set of 500 docs to test
>> indexing and commit performance (with soft commits within 60s). The time
>> taken to complete a hard commit after this is longer than I'd expect, and
>> highly variable - from 10s to 70s. This makes me wonder whether the SAN
>> (which provides all the storage for these VMs and the customers several
>> other VMs) is being saturated periodically. I grabbed some iostat output
>> on
>> different occasions to (possibly) show the variability:
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> sdb              64.50         0.00      2476.00          0       4952
>> ...
>> sdb               8.90         0.00       348.00          0       6960
>> ...
>> sdb               1.15         0.00        43.20          0        864
>>
>
> There are two likely possibilities for this.  One or both of them might be
> in play.  1) Because the OS disk cache is small, not much of the index can
> be cached.  This can result in a lot of disk I/O for a commit, slowing
> things way down.  Increasing the size of the OS disk cache is really the
> only solution for that. 2) Cache autowarming, particularly the filter
> cache.  In the cache statistics, you can see how long each cache took to
> warm up after the last searcher was opened.  The solution for that is to
> reduce the autowarmCount values.
>
>
>  The other thing that confuses me is that after a Solr restart or hard
>> commit, search times average about 1.2s under light load. After searching
>> the same set of queries for 5-6 iterations this improves to 0.1s. However,
>> in either case - cold or warm - iostat reports no device reads at all:
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> sdb               0.40         0.00         8.00          0        160
>> ...
>> sdb               0.30         0.00        10.40          0        104
>>
>> (the writes are due to logging). This implies to me that the 'hot' blocks
>> are being completely cached in RAM - so why the variation in search time
>> and the number of iterations required to speed it up?
>>
>
> Linux is pretty good about making limited OS disk cache resources work.
>  Sounds like the caching is working reasonably well for queries.  It might
> not be working so well for updates or commits, though.
>
> Running multiple Solr JVMs per machine, virtual or not, causes more
> problems than it solves.  Solr has no limits on the number of cores (shard
> replicas) per instance, assuming there are enough system resources.  There
> should be exactly one Solr JVM per operating system.  Running more than one
> results in quite a lot of overhead, and your memory is precious.  When you
> create a collection, you can give the collections API the
> "maxShardsPerNode" parameter to create more than one shard per instance.
>
>
>  I don't have a great deal of experience in low-level performance tuning,
>> so
>> please forgive any naivety. Any ideas of what to do next would be greatly
>> appreciated. I don't currently have details of the VM implementation but
>> can get hold of this if it's relevant.
>>
>
> I don't think the virtualization details matter all that much.  Please
> feel free to ask questions or supply more info based on what I've told you.
>
> Thanks,
> Shawn
>
>

Re: SolrCloud performance in VM environment

Posted by Shawn Heisey <so...@elyograg.org>.

On 10/21/2013 9:48 AM, Tom Mortimer wrote:
> Hi everyone,
>
> I've been working on an installation recently which uses SolrCloud to index
> 45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
> identical VMs set up for replicas). The reason we're using so many shards
> for a relatively small index is that there are complex filtering
> requirements at search time, to restrict users to items they are licensed
> to view. Initial tests demonstrated that multiple shards would be required.
>
> The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
> total) and 4 CPU units. I know this is far under what would normally be
> recommended for an index of this size, and I'm working on persuading the
> customer to increase the RAM (basically, telling them it won't work
> otherwise.) Performance is currently pretty poor and I would expect more
> RAM to improve things. However, there are a couple of other oddities which
> concern me,

Running multiple shards like you are, where each operating system is 
handling more than one shard, is only going to perform better if your 
query volume is low and you have lots of CPU cores.  If your query 
volume is high or you only have 2-4 CPU cores on each VM, you might be 
better off with fewer shards or not sharded at all.

The way that I read this is that you've got two physical machines with 
32GB RAM, each running two VMs that have 16GB.  Each VM houses 4 shards, 
or 70GB of index.

There's a scenario that might be better if all of the following are 
true: 1) I'm right about how your hardware is provisioned.  2) You or 
the client owns the hardware.  3) You have an extremely low-end third 
machine available - single CPU with 1GB of RAM would probably be 
enough.  In this scenario, you run one Solr instance and one zookeeper 
instance on each of your two "big" machines, and use the third wimpy 
machine as a third zookeeper node.  No virtualization.  For the rest of 
my reply, I'm assuming that you haven't taken this step, but it will 
probably apply either way.

> The first is that I've been reindexing a fixed set of 500 docs to test
> indexing and commit performance (with soft commits within 60s). The time
> taken to complete a hard commit after this is longer than I'd expect, and
> highly variable - from 10s to 70s. This makes me wonder whether the SAN
> (which provides all the storage for these VMs and the customers several
> other VMs) is being saturated periodically. I grabbed some iostat output on
> different occasions to (possibly) show the variability:
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb              64.50         0.00      2476.00          0       4952
> ...
> sdb               8.90         0.00       348.00          0       6960
> ...
> sdb               1.15         0.00        43.20          0        864

There are two likely possibilities for this.  One or both of them might 
be in play.  1) Because the OS disk cache is small, not much of the 
index can be cached.  This can result in a lot of disk I/O for a commit, 
slowing things way down.  Increasing the size of the OS disk cache is 
really the only solution for that. 2) Cache autowarming, particularly 
the filter cache.  In the cache statistics, you can see how long each 
cache took to warm up after the last searcher was opened.  The solution 
for that is to reduce the autowarmCount values.

> The other thing that confuses me is that after a Solr restart or hard
> commit, search times average about 1.2s under light load. After searching
> the same set of queries for 5-6 iterations this improves to 0.1s. However,
> in either case - cold or warm - iostat reports no device reads at all:
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb               0.40         0.00         8.00          0        160
> ...
> sdb               0.30         0.00        10.40          0        104
>
> (the writes are due to logging). This implies to me that the 'hot' blocks
> are being completely cached in RAM - so why the variation in search time
> and the number of iterations required to speed it up?

Linux is pretty good about making limited OS disk cache resources work.  
Sounds like the caching is working reasonably well for queries.  It 
might not be working so well for updates or commits, though.

Running multiple Solr JVMs per machine, virtual or not, causes more 
problems than it solves.  Solr has no limits on the number of cores 
(shard replicas) per instance, assuming there are enough system 
resources.  There should be exactly one Solr JVM per operating system.  
Running more than one results in quite a lot of overhead, and your 
memory is precious.  When you create a collection, you can give the 
collections API the "maxShardsPerNode" parameter to create more than one 
shard per instance.

> I don't have a great deal of experience in low-level performance tuning, so
> please forgive any naivety. Any ideas of what to do next would be greatly
> appreciated. I don't currently have details of the VM implementation but
> can get hold of this if it's relevant.

I don't think the virtualization details matter all that much.  Please 
feel free to ask questions or supply more info based on what I've told you.

Thanks,
Shawn

RE: SolrCloud performance in VM environment

Posted by Boogie Shafer <bo...@ebrary.com>.

some basic tips.

-try to create enough shards that you can get the size of each index portion on the shard closer to the amount of RAM you have on each node (e.g. if you are ~140GB index on 16GB nodes, try doing 12-16 shards)

-start with just the initial shards, add replicas later when you have dialed things in a bit more

-try to leave some memory for the OS as well as the JVM

-try starting with 1/2 of the total ram on each vm allocated to JVM as Xmx value

-try setting the Xms in the range of .75 to 1.0 of Xmx

-do all the normal JVM tuning, esp the part about capturing the gc events in a log such that you can see what is going on with java itself..this will probably lead you to adjust your GC type, etc

-make sure you arent hammering your storage devices (or the interconnects between your servers and your storage)...the OS internal tools on the guest are helpful, but you probably want to look at the hypervisor and storage device layer directly as well. on vmware the built in perf graphs for datastore latency and network throughput are easily observed. esxtop is the cli tool which provides the same.

-if you are using a SAN, you probably want to make sure you have some sort of MPIO in place (esp if you are using 1GB iscsi)




________________________________________
From: Tom Mortimer <to...@gmail.com>
Sent: Monday, October 21, 2013 08:48
To: solr-user@lucene.apache.org
Subject: SolrCloud performance in VM environment

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              64.50         0.00      2476.00          0       4952
...
sdb               8.90         0.00       348.00          0       6960
...
sdb               1.15         0.00        43.20          0        864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb               0.40         0.00         8.00          0        160
...
sdb               0.30         0.00        10.40          0        104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom