You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jason <hi...@gmail.com> on 2016/07/12 02:19:10 UTC

High cpu and gc time when performing optimization.

hi, all.

I'm running solr instance with two cores and JVM max heap is 32G.
Each core index size is 68G, 61G repectively.
I'm always keeping on optimization after update index.
BTW, on last week, document update is completed but optimize phase cpu is
very high.
I think that is because long gc time.
How should I solve this problem?
welcome any idea.
thanks,



--
View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Jason <hi...@gmail.com>.
Let me know the guide reference address which is mentioned reasonable index
size is around 15G.




--
View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Kent Mu <so...@gmail.com>.
as I said before. we also come across the issue. and I just guess the
possible reason. let's wait the expert to explain for us.
on the other hand. I find that your index data is 68G, that is too large, I
recommend you to use solrcloud, as the guide reference, the reasonable size
is around 15G.
now our project use solr and solrcloud together so that if anyone down or
other issue, we can switch to the well-running one.

2016-07-12 17:02 GMT+08:00 Jason <hi...@gmail.com>:

> hi, Kent
> thanks your reply.
>
> I think that I need more explain to my server status.
> I'm using solr 4.2.1 and master-slave replication model.
> On master server many solr(tomcat) instances are running.
> (server has 64 cores, 128G ram.)
> Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G
> max heap respectively.
> When cpu is high on optimize phase, load average is almost over 100.
> And high cpu time is continued very long(5 hours over).
> Besides, other process of solr(tomcat) instance use also high cpu.
> But I'd not operated in other instances.
> So, I tried stop the other instances and just run one instance.
> But still cpu is high.
> I don't know how should I do.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: High cpu and gc time when performing optimization.

Posted by Jason <hi...@gmail.com>.
hi, Kent
thanks your reply.

I think that I need more explain to my server status.
I'm using solr 4.2.1 and master-slave replication model.
On master server many solr(tomcat) instances are running.
(server has 64 cores, 128G ram.)
Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G
max heap respectively.
When cpu is high on optimize phase, load average is almost over 100.
And high cpu time is continued very long(5 hours over).
Besides, other process of solr(tomcat) instance use also high cpu.
But I'd not operated in other instances.
So, I tried stop the other instances and just run one instance.
But still cpu is high.
I don't know how should I do.



--
View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Kent Mu <so...@gmail.com>.
we also came across this issue. I think it is not caused by gc time, but
the optimize action, though I did not read the source code, I think when
optimize the index in master internally, it will produce the replicate log
file, and the replicates synchronize the log file, just like the DB master
and slave theory, it will consumes much CPU and the IO will be very high.
but It is OK, and will take some time.

2016-07-12 10:19 GMT+08:00 Jason <hi...@gmail.com>:

> hi, all.
>
> I'm running solr instance with two cores and JVM max heap is 32G.
> Each core index size is 68G, 61G repectively.
> I'm always keeping on optimization after update index.
> BTW, on last week, document update is completed but optimize phase cpu is
> very high.
> I think that is because long gc time.
> How should I solve this problem?
> welcome any idea.
> thanks,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: High cpu and gc time when performing optimization.

Posted by Otis Gospodnetic <ot...@gmail.com>.
Heap: start small and increase as necessary. Leave as much RAM for FS cache, don't give it to the JVM until it starts crying. SPM for Solr will help you see when Solr and JVM are starting to hurt.

Otis

> On Jul 12, 2016, at 11:45, Jason <hi...@gmail.com> wrote:
> 
> I'm using optimize because it's a option for fast search.
> Our index updates one or more weekly.
> If I don't use optimize, many index files should be kept.
> Any performance issues in that case?
> 
> And I'm wondering relation between index file size and heap size.
> In case of running as master server that only update index,
> is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?
> 
> 
> 
> Yonik Seeley wrote
>> Optimize is a very expensive operation.  It involves reading the
>> entire index and merging and rewriting at a single segment.
>> If you find it too expensive, do it less often, or don't do it at all.
>> It's an optional operation.
>> 
>> -Yonik
>> 
>> 
>> On Mon, Jul 11, 2016 at 10:19 PM, Jason &lt;
> 
>> hialooha@
> 
>> &gt; wrote:
>>> hi, all.
>>> 
>>> I'm running solr instance with two cores and JVM max heap is 32G.
>>> Each core index size is 68G, 61G repectively.
>>> I'm always keeping on optimization after update index.
>>> BTW, on last week, document update is completed but optimize phase cpu is
>>> very high.
>>> I think that is because long gc time.
>>> How should I solve this problem?
>>> welcome any idea.
>>> thanks,
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/12/2016 9:45 AM, Jason wrote:
> I'm using optimize because it's a option for fast search. Our index
> updates one or more weekly. If I don't use optimize, many index files
> should be kept. Any performance issues in that case? And I'm wondering
> relation between index file size and heap size. In case of running as
> master server that only update index, is there any guide for heap size
> include Xmx, NewSize, MaxNewSize, etc.?

In older (2.x and 3.x) versions of Lucene, optimizing an index would
make a huge difference in performance.  In modern versions, the
performance increase from an optimize is much less dramatic.  Lucene
(and by extension, Solr) has gotten very good at dealing with an index
comprised of many segments.  The recommendation for the last few years
has been to AVOID doing an optimize unless it can be done during times
of very low query traffic, when the I/O load will not cause issues.

About the only good reason left for frequent optimizes is when the index
has many updates to existing documents, resulting in a very large
percentage of deleted documents in the index.  In that case, the
optimize will shrink the overall index size, which will make it faster
and make relevancy more accurate.

There is no general information available for setting the heap size. 
There is also no general information available on "acceptable" index
size.  The following wiki page touches a little bit on the heap size topic:

https://wiki.apache.org/solr/SolrPerformanceProblems

The reason that there is no generic information available is covered here:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn


Re: High cpu and gc time when performing optimization.

Posted by Erick Erickson <er...@gmail.com>.
It's more a matter of "is unoptimized fast enough"? If so, why bother?
The background merging will keep segment counts relatively
reasonable.

If you're updating your index only once a week, it's reasonable to
optimize. Anecdotal reports are of on the order of a 10% speedup
_at best_.

As Yonik  says, optimizing is expensive. You'll have to evaluate whether
that expense is worth it in your case, there's no universal answer.

Best,
Erick

On Tue, Jul 12, 2016 at 8:45 AM, Jason <hi...@gmail.com> wrote:
> I'm using optimize because it's a option for fast search.
> Our index updates one or more weekly.
> If I don't use optimize, many index files should be kept.
> Any performance issues in that case?
>
> And I'm wondering relation between index file size and heap size.
> In case of running as master server that only update index,
> is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?
>
>
>
> Yonik Seeley wrote
>> Optimize is a very expensive operation.  It involves reading the
>> entire index and merging and rewriting at a single segment.
>> If you find it too expensive, do it less often, or don't do it at all.
>> It's an optional operation.
>>
>> -Yonik
>>
>>
>> On Mon, Jul 11, 2016 at 10:19 PM, Jason &lt;
>
>> hialooha@
>
>> &gt; wrote:
>>> hi, all.
>>>
>>> I'm running solr instance with two cores and JVM max heap is 32G.
>>> Each core index size is 68G, 61G repectively.
>>> I'm always keeping on optimization after update index.
>>> BTW, on last week, document update is completed but optimize phase cpu is
>>> very high.
>>> I think that is because long gc time.
>>> How should I solve this problem?
>>> welcome any idea.
>>> thanks,
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Jason <hi...@gmail.com>.
I'm using optimize because it's a option for fast search.
Our index updates one or more weekly.
If I don't use optimize, many index files should be kept.
Any performance issues in that case?

And I'm wondering relation between index file size and heap size.
In case of running as master server that only update index,
is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.?



Yonik Seeley wrote
> Optimize is a very expensive operation.  It involves reading the
> entire index and merging and rewriting at a single segment.
> If you find it too expensive, do it less often, or don't do it at all.
> It's an optional operation.
> 
> -Yonik
> 
> 
> On Mon, Jul 11, 2016 at 10:19 PM, Jason &lt;

> hialooha@

> &gt; wrote:
>> hi, all.
>>
>> I'm running solr instance with two cores and JVM max heap is 32G.
>> Each core index size is 68G, 61G repectively.
>> I'm always keeping on optimization after update index.
>> BTW, on last week, document update is completed but optimize phase cpu is
>> very high.
>> I think that is because long gc time.
>> How should I solve this problem?
>> welcome any idea.
>> thanks,
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: High cpu and gc time when performing optimization.

Posted by Yonik Seeley <ys...@gmail.com>.
Optimize is a very expensive operation.  It involves reading the
entire index and merging and rewriting at a single segment.
If you find it too expensive, do it less often, or don't do it at all.
It's an optional operation.

-Yonik


On Mon, Jul 11, 2016 at 10:19 PM, Jason <hi...@gmail.com> wrote:
> hi, all.
>
> I'm running solr instance with two cores and JVM max heap is 32G.
> Each core index size is 68G, 61G repectively.
> I'm always keeping on optimization after update index.
> BTW, on last week, document update is completed but optimize phase cpu is
> very high.
> I think that is because long gc time.
> How should I solve this problem?
> welcome any idea.
> thanks,
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html
> Sent from the Solr - User mailing list archive at Nabble.com.