You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2015/05/06 09:58:44 UTC

severe problems with soft and hard commits in a large index

Hello
I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
documents.
it currently has 3 billion documents overall (parent and children).
each shard has around 200 million docs. size of each shard is 250GB.
this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
each process has 28GB heap.  each machine has 196GB RAM.

I perform periodic indexing throughout the day. each indexing cycle adds
around 1.5 million docs. I keep the indexing load light - 2 processes with
bulks of 20 docs.

My use case demands that each indexing cycle will be visible only when the
whole cycle finishes.

I tried various methods of using soft and hard commits:

1. using auto hard commit with time=10secs (opensearcher=false) and an
explicit soft commit when the indexing finishes.
2. using auto soft commit with time=10/30/60secs during the indexing.
3. not using soft commit at all, just using auto hard commit with
time=10secs during the indexing (opensearcher=false) and an explicit hard
commit with opensearcher=true when the cycle finishes.


with all methods I encounter pretty much the same problem:
1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
opensearcher=true is performed. these GCs cause heavy latency (average
latency is 3 secs. latency during the problem is 80secs)
2. if indexing cycles come too often, which causes softcommits or
hardcommits(opensearcher=true) occur with a small interval one after another
(around 5-10minutes), I start getting many OOM exceptions.


Thank you.



--
View this message in context: http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: severe problems with soft and hard commits in a large index

Posted by adfel70 <ad...@gmail.com>.
I dont see any of these.
I've seen them before in other clusters and uses of SOLR  but don't see any
of these messages here.



Dmitry Kan-2 wrote
> Do you seen any (a lot?) of the warming searchers on deck, i.e. value for
> N:
> 
> PERFORMANCE WARNING: Overlapping onDeckSearchers=N
> 
> On Wed, May 6, 2015 at 10:58 AM, adfel70 &lt;

> adfel70@

> &gt; wrote:
> 
>> Hello
>> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
>> documents.
>> it currently has 3 billion documents overall (parent and children).
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
>>
>> I perform periodic indexing throughout the day. each indexing cycle adds
>> around 1.5 million docs. I keep the indexing load light - 2 processes
>> with
>> bulks of 20 docs.
>>
>> My use case demands that each indexing cycle will be visible only when
>> the
>> whole cycle finishes.
>>
>> I tried various methods of using soft and hard commits:
>>
>> 1. using auto hard commit with time=10secs (opensearcher=false) and an
>> explicit soft commit when the indexing finishes.
>> 2. using auto soft commit with time=10/30/60secs during the indexing.
>> 3. not using soft commit at all, just using auto hard commit with
>> time=10secs during the indexing (opensearcher=false) and an explicit hard
>> commit with opensearcher=true when the cycle finishes.
>>
>>
>> with all methods I encounter pretty much the same problem:
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
>> 2. if indexing cycles come too often, which causes softcommits or
>> hardcommits(opensearcher=true) occur with a small interval one after
>> another
>> (around 5-10minutes), I start getting many OOM exceptions.
>>
>>
>> Thank you.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info





--
View this message in context: http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204123.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: severe problems with soft and hard commits in a large index

Posted by Dmitry Kan <so...@gmail.com>.
Do you seen any (a lot?) of the warming searchers on deck, i.e. value for N:

PERFORMANCE WARNING: Overlapping onDeckSearchers=N

On Wed, May 6, 2015 at 10:58 AM, adfel70 <ad...@gmail.com> wrote:

> Hello
> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
> documents.
> it currently has 3 billion documents overall (parent and children).
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
> processes.
> each process has 28GB heap.  each machine has 196GB RAM.
>
> I perform periodic indexing throughout the day. each indexing cycle adds
> around 1.5 million docs. I keep the indexing load light - 2 processes with
> bulks of 20 docs.
>
> My use case demands that each indexing cycle will be visible only when the
> whole cycle finishes.
>
> I tried various methods of using soft and hard commits:
>
> 1. using auto hard commit with time=10secs (opensearcher=false) and an
> explicit soft commit when the indexing finishes.
> 2. using auto soft commit with time=10/30/60secs during the indexing.
> 3. not using soft commit at all, just using auto hard commit with
> time=10secs during the indexing (opensearcher=false) and an explicit hard
> commit with opensearcher=true when the cycle finishes.
>
>
> with all methods I encounter pretty much the same problem:
> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)
> 2. if indexing cycles come too often, which causes softcommits or
> hardcommits(opensearcher=true) occur with a small interval one after
> another
> (around 5-10minutes), I start getting many OOM exceptions.
>
>
> Thank you.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: severe problems with soft and hard commits in a large index

Posted by adfel70 <ad...@gmail.com>.
1. yes, I'm sure that pauses are due to GCs. I monitor the cluster and
receive continuously metric from system and from java process.
I see clearly that when soft commit is triggered, major GCs start occurring
(sometimes reocuuring on the same process) and latency rises.
I use CMS GC and jdk 1.7.75

2. My previous post was about another use case, but nevertheless I have
configured docvalues in the faceted fields.


Toke Eskildsen wrote
> On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
> 
> [...]
> 
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
> 
> Sanity check: Are you sure the pauses are due to garbage collection?
> 
> You have a fairly large heap and judging from your previous post
> "problem with facets  - out of memory exception", you are doing
> non-trivial faceting. Are you using DocValues, as Marc suggested?
> 
> 
> - Toke Eskildsen, State and University Library, Denmark





--
View this message in context: http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204088.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: severe problems with soft and hard commits in a large index

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
> each process has 28GB heap.  each machine has 196GB RAM.

[...]

> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)

Sanity check: Are you sure the pauses are due to garbage collection?

You have a fairly large heap and judging from your previous post
"problem with facets  - out of memory exception", you are doing
non-trivial faceting. Are you using DocValues, as Marc suggested?


- Toke Eskildsen, State and University Library, Denmark



Re: severe problems with soft and hard commits in a large index

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/6/2015 8:55 AM, adfel70 wrote:
> Thank you for the detailed answer.
> How can I decrease the impact of opening a searcher in such a large index?
> especially the impact of heap usage that causes OOM.

See the wiki link I sent.  It talks about some of the things that
require a lot of heap and ways you can reduce those requirements.  The
lists are nowhere near complete.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

> regarding GC tuning - I am doint that.
> here are the params I use:
> AggresiveOpts
> UseLargePages
> ParallelRefProcEnabled
> CMSParallelRemarkEnabled
> CMSMaxAbortablePrecleanTime=6000
> CMDTriggerPermRatio=80
> CMSInitiatingOccupancyFraction=70
> UseCMSInitiatinOccupancyOnly
> CMSFullGCsBeforeCompaction=1
> PretenureSizeThreshold=64m
> CMSScavengeBeforeRemark
> UseConcMarkSweepGC
> MaxTenuringThreshold=8
> TargetSurvivorRatio=90
> SurviorRatio=4
> NewRatio=2
> Xms16gb
> Xmn28gb

This list seems to have come from re-typing the GC options.  If this is
a cut/paste, I would not expect it to work -- there are typos and part
of each option is missing other characters.  Assuming that this is not
cut/paste, it is mostly similar to the CMS options that I once used for
my own index:

http://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector

> How many documents per shard are recommended?
> Note that I use nested documents. total collection size is 3 billion docs,
> number of parent docs is 600 million. the rest are children.

For the G1 collector, you'd want to limit each shard to about 100
million docs.  I have no idea about limitations and capabilities where
very large memory allocations are concerned with the CMS collector. 
Running the latest Java 8 is *strongly* recommended, no matter what
collector you're using, because recent versions have incorporated GC
improvements with large memory allocations.  With Java 8u40 and later,
the limitations for 16MB huge allocations on the G1 collector might not
even apply.

Thanks,
Shawn


Re: severe problems with soft and hard commits in a large index

Posted by adfel70 <ad...@gmail.com>.
Thank you for the detailed answer.
How can I decrease the impact of opening a searcher in such a large index?
especially the impact of heap usage that causes OOM.

regarding GC tuning - I am doint that.
here are the params I use:
AggresiveOpts
UseLargePages
ParallelRefProcEnabled
CMSParallelRemarkEnabled
CMSMaxAbortablePrecleanTime=6000
CMDTriggerPermRatio=80
CMSInitiatingOccupancyFraction=70
UseCMSInitiatinOccupancyOnly
CMSFullGCsBeforeCompaction=1
PretenureSizeThreshold=64m
CMSScavengeBeforeRemark
UseConcMarkSweepGC
MaxTenuringThreshold=8
TargetSurvivorRatio=90
SurviorRatio=4
NewRatio=2
Xms16gb
Xmn28gb

any input on this?

How many documents per shard are recommended?
Note that I use nested documents. total collection size is 3 billion docs,
number of parent docs is 600 million. the rest are children.



Shawn Heisey-2 wrote
> On 5/6/2015 1:58 AM, adfel70 wrote:
>> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
>> documents.
>> it currently has 3 billion documents overall (parent and children).
>> each shard has around 200 million docs. size of each shard is 250GB.
>> this runs on 12 machines. each machine has 4 SSD disks and 4 solr
>> processes.
>> each process has 28GB heap.  each machine has 196GB RAM.
>> 
>> I perform periodic indexing throughout the day. each indexing cycle adds
>> around 1.5 million docs. I keep the indexing load light - 2 processes
>> with
>> bulks of 20 docs.
>> 
>> My use case demands that each indexing cycle will be visible only when
>> the
>> whole cycle finishes.
>> 
>> I tried various methods of using soft and hard commits:
> 
> I personally would configure autoCommit on a five minute (maxTime of
> 300000) interval with openSearcher=false.  The use case you have
> outlined (not seeing changed until the indexing is done) demands that
> you do NOT turn on autoSoftCommit, that you do one manual commit at the
> end of indexing, which could be either a soft commit or a hard commit.
> I would recommend a soft commit.
> 
> Because it is the openSearcher part of a commit that's very expensive,
> you can successfully do autoCommit with openSearcher=false on an
> interval like 10 or 15 seconds and not see much in the way of immediate
> performance loss.  That commit is still not free, not only in terms of
> resources, but in terms of java heap garbage generated.
> 
> The general advice with commits is to do them as infrequently as you
> can, which applies to ANY commit, not just those that make changes
> visible.
> 
>> with all methods I encounter pretty much the same problem:
>> 1. heavy GCs when soft commit is performed (methods 1,2) or when
>> hardcommit
>> opensearcher=true is performed. these GCs cause heavy latency (average
>> latency is 3 secs. latency during the problem is 80secs)
>> 2. if indexing cycles come too often, which causes softcommits or
>> hardcommits(opensearcher=true) occur with a small interval one after
>> another
>> (around 5-10minutes), I start getting many OOM exceptions.
> 
> If you're getting OOM, then either you need to change things so Solr
> requires less heap memory, or you need to increase the heap size.
> Changing things might be either the config or how you use Solr.
> 
> Are you tuning your garbage collection?  With a 28GB heap, tuning is not
> optional.  It's so important that the startup scripts in 5.0 and 5.1
> include it, even though the default max heap is 512MB.
> 
> Let's do some quick math on your memory.  You have four instances of
> Solr on each machine, each with a 28GB heap.  That's 112GB of memory
> allocated to Java.  With 196GB total, you have approximately 84GB of RAM
> left over for caching your index.
> 
> A 16-shard index with three replicas means 48 cores.  Divide that by 12
> machines and that's 4 replicas on each server, presumably one in each
> Solr instance.  You say that the size of each shard is 250GB, so you've
> got about a terabyte of index on each server, but only 84GB of RAM for
> caching.
> 
> Even with SSD, that's not going to be anywhere near enough cache memory
> for good Solr performance.
> 
> All these memory issues, including GC tuning, are discussed on this wiki
> page:
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems
> 
> One additional note: By my calculations, each filterCache entry will be
> at least 23MB in size.  This means that if you are using the filterCache
> and the G1 collector, you will not be able to avoid humongous
> allocations, which is any allocation larger than half the G1 region
> size.  The max configurable G1 region size is 32MB.  You should use the
> CMS collector for your GC tuning, not G1.  If you can reduce the number
> of documents in each shard, G1 might work well.
> 
> Thanks,
> Shawn





--
View this message in context: http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204148.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: severe problems with soft and hard commits in a large index

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/6/2015 1:58 AM, adfel70 wrote:
> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
> documents.
> it currently has 3 billion documents overall (parent and children).
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
> each process has 28GB heap.  each machine has 196GB RAM.
> 
> I perform periodic indexing throughout the day. each indexing cycle adds
> around 1.5 million docs. I keep the indexing load light - 2 processes with
> bulks of 20 docs.
> 
> My use case demands that each indexing cycle will be visible only when the
> whole cycle finishes.
> 
> I tried various methods of using soft and hard commits:

I personally would configure autoCommit on a five minute (maxTime of
300000) interval with openSearcher=false.  The use case you have
outlined (not seeing changed until the indexing is done) demands that
you do NOT turn on autoSoftCommit, that you do one manual commit at the
end of indexing, which could be either a soft commit or a hard commit.
I would recommend a soft commit.

Because it is the openSearcher part of a commit that's very expensive,
you can successfully do autoCommit with openSearcher=false on an
interval like 10 or 15 seconds and not see much in the way of immediate
performance loss.  That commit is still not free, not only in terms of
resources, but in terms of java heap garbage generated.

The general advice with commits is to do them as infrequently as you
can, which applies to ANY commit, not just those that make changes visible.

> with all methods I encounter pretty much the same problem:
> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)
> 2. if indexing cycles come too often, which causes softcommits or
> hardcommits(opensearcher=true) occur with a small interval one after another
> (around 5-10minutes), I start getting many OOM exceptions.

If you're getting OOM, then either you need to change things so Solr
requires less heap memory, or you need to increase the heap size.
Changing things might be either the config or how you use Solr.

Are you tuning your garbage collection?  With a 28GB heap, tuning is not
optional.  It's so important that the startup scripts in 5.0 and 5.1
include it, even though the default max heap is 512MB.

Let's do some quick math on your memory.  You have four instances of
Solr on each machine, each with a 28GB heap.  That's 112GB of memory
allocated to Java.  With 196GB total, you have approximately 84GB of RAM
left over for caching your index.

A 16-shard index with three replicas means 48 cores.  Divide that by 12
machines and that's 4 replicas on each server, presumably one in each
Solr instance.  You say that the size of each shard is 250GB, so you've
got about a terabyte of index on each server, but only 84GB of RAM for
caching.

Even with SSD, that's not going to be anywhere near enough cache memory
for good Solr performance.

All these memory issues, including GC tuning, are discussed on this wiki
page:

http://wiki.apache.org/solr/SolrPerformanceProblems

One additional note: By my calculations, each filterCache entry will be
at least 23MB in size.  This means that if you are using the filterCache
and the G1 collector, you will not be able to avoid humongous
allocations, which is any allocation larger than half the G1 region
size.  The max configurable G1 region size is 32MB.  You should use the
CMS collector for your GC tuning, not G1.  If you can reduce the number
of documents in each shard, G1 might work well.

Thanks,
Shawn