You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Peng Xiao <25...@qq.com> on 2017/11/23 20:51:45 UTC

回复： gc causes C* node hang

Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao

Re: gc causes C* node hang

Posted by Chris Lohfink <cl...@gmail.com>.

Mail client may be changing changing the char if your copy and pasting, its
- "hyphen" not the unicode en dash –. I would recommend adding it to jvm
options like oleksandr pointed out

Chris

On Thu, Nov 30, 2017 at 1:50 AM, Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> On Thu, Nov 30, 2017 at 1:38 AM, Peng Xiao <25...@qq.com> wrote:
>
>> looks we are not able to enable –XX:PrintSafepointStatisticsCount=1
>> in cassandra-env.sh
>> Could anyone please advise?
>>
>> ...
>
>> Error: Could not find or load main class –XX:PrintSafepointStatisticsCo
>> unt=1
>>
>
> Hm, not sure how are you doing it, but it boils down to adding a line
> somewhere in the cassandra-env.sh like this one:
>
> JVM_OPTS="$JVM_OPTS -XX:PrintSafepointStatisticsCount=1"
>
> OR, if you're using a newer version (3.0 or newer), the following in the
> jvm.options file:
>
> -XX:PrintSafepointStatisticsCount=1
>
> Cheers,
> --
> Alex
>
>

Re: gc causes C* node hang

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Nov 30, 2017 at 1:38 AM, Peng Xiao <25...@qq.com> wrote:

> looks we are not able to enable –XX:PrintSafepointStatisticsCount=1
> in cassandra-env.sh
> Could anyone please advise?
>
> ...

> Error: Could not find or load main class –XX:
> PrintSafepointStatisticsCount=1
>

Hm, not sure how are you doing it, but it boils down to adding a line
somewhere in the cassandra-env.sh like this one:

JVM_OPTS="$JVM_OPTS -XX:PrintSafepointStatisticsCount=1"

OR, if you're using a newer version (3.0 or newer), the following in the
jvm.options file:

-XX:PrintSafepointStatisticsCount=1

Cheers,
--
Alex

回复： gc causes C* node hang

Posted by Peng Xiao <25...@qq.com>.

looks we are not able to enable –XX:PrintSafepointStatisticsCount=1
in cassandra-env.sh
Could anyone please advise?


CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
Error: Could not find or load main class –XX:PrintSafepointStatisticsCount=1



Thanks
------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<25...@qq.com>;
发送时间: 2017年11月24日(星期五) 上午6:17
收件人: "user"<us...@cassandra.apache.org>;

主题: 回复： gc causes C* node hang



Thanks Chris for the thorough explanation,actually we are using ssd,we will try to check the hardware .And all the 7 vm node is in the same machine,but we did not find any errors from vmware logs.




------------------ 原始邮件 ------------------
发件人: "clohfink85";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨5:21
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



sorry also:–XX:PrintSafepointStatisticsCount=1



On Thu, Nov 23, 2017 at 3:20 PM, Chris Lohfink <cl...@gmail.com> wrote:
The only pause over 1s you had was 
2017-11-23T17:50:14.573+0800: 1378060.385: Total time for which application threads were stopped: 37.0282783 seconds, Stopping threads took: 36.9420759 seconds

This is not actually a GC pause, its likely that it was actually revoking bias or something completely unrelated to GCs even. "Stopping threads took: 36.9420759 seconds" means it took 37 seconds for the threads to reach a safepoint once jvm wanted to stop the world. My knee jerk reaction to this is "hardware", as I've mostly seen it when fsyncing the hprof statistics or something and blocking on a slow disk. if you want to know more specific you can enable some safepoint logging but i would recommend checking your disks or just replacing the host if able to. Can do analysis after its not impacting you.


for info safepoint logging (might not be super helpful, but if you really need this hardware and need to dig into whats causing JVM to hang up):


-XX:+UnlockDiagnosticVMOptions
-XX:+PrintSafepointStatistics
-XX:+LogVMOutput
-XX:LogFile=somelocation.log



Chris




On Thu, Nov 23, 2017 at 3:09 PM, Peng Xiao <25...@qq.com> wrote:


Hi Chris,


I found the gc log in another node which we enable the gc log.
Could you please take a look?


Thanks,
Peng Xiao






------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:57
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang





If it sets it to 8, you shouldn't override it to 5.

You should enable gc logging for what its worth. Its very very cheap and provides a lot of useful information when you need it.


Chris


On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:
We only have 7 cores per node.
For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
maybe we need to remove this?




------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<25...@qq.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:51
收件人: "user"<us...@cassandra.apache.org>;

主题: 回复： gc causes C* node hang



Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao

















 ---------------------------------------------------------------------
 To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
 For additional commands, e-mail: user-help@cassandra.apache.org

回复： gc causes C* node hang

Posted by Peng Xiao <25...@qq.com>.

Thanks Chris for the thorough explanation,actually we are using ssd,we will try to check the hardware .And all the 7 vm node is in the same machine,but we did not find any errors from vmware logs.




------------------ 原始邮件 ------------------
发件人: "clohfink85";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨5:21
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



sorry also:–XX:PrintSafepointStatisticsCount=1



On Thu, Nov 23, 2017 at 3:20 PM, Chris Lohfink <cl...@gmail.com> wrote:
The only pause over 1s you had was 
2017-11-23T17:50:14.573+0800: 1378060.385: Total time for which application threads were stopped: 37.0282783 seconds, Stopping threads took: 36.9420759 seconds

This is not actually a GC pause, its likely that it was actually revoking bias or something completely unrelated to GCs even. "Stopping threads took: 36.9420759 seconds" means it took 37 seconds for the threads to reach a safepoint once jvm wanted to stop the world. My knee jerk reaction to this is "hardware", as I've mostly seen it when fsyncing the hprof statistics or something and blocking on a slow disk. if you want to know more specific you can enable some safepoint logging but i would recommend checking your disks or just replacing the host if able to. Can do analysis after its not impacting you.


for info safepoint logging (might not be super helpful, but if you really need this hardware and need to dig into whats causing JVM to hang up):


-XX:+UnlockDiagnosticVMOptions
-XX:+PrintSafepointStatistics
-XX:+LogVMOutput
-XX:LogFile=somelocation.log



Chris




On Thu, Nov 23, 2017 at 3:09 PM, Peng Xiao <25...@qq.com> wrote:


Hi Chris,


I found the gc log in another node which we enable the gc log.
Could you please take a look?


Thanks,
Peng Xiao






------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:57
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang





If it sets it to 8, you shouldn't override it to 5.

You should enable gc logging for what its worth. Its very very cheap and provides a lot of useful information when you need it.


Chris


On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:
We only have 7 cores per node.
For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
maybe we need to remove this?




------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<25...@qq.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:51
收件人: "user"<us...@cassandra.apache.org>;

主题: 回复： gc causes C* node hang



Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao

















 ---------------------------------------------------------------------
 To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
 For additional commands, e-mail: user-help@cassandra.apache.org

Re: gc causes C* node hang

Posted by Chris Lohfink <cl...@gmail.com>.

sorry also:
–XX:PrintSafepointStatisticsCount=1

On Thu, Nov 23, 2017 at 3:20 PM, Chris Lohfink <cl...@gmail.com> wrote:

> The only pause over 1s you had was
>
> 2017-11-23T17:50:14.573+0800: 1378060.385: Total time for which
> application threads were stopped: 37.0282783 seconds, Stopping threads
> took: 36.9420759 seconds
> This is not actually a GC pause, its likely that it was actually revoking
> bias or something completely unrelated to GCs even. "Stopping threads
> took: 36.9420759 seconds" means it took 37 seconds for the threads to
> reach a safepoint once jvm wanted to stop the world. My knee jerk reaction
> to this is "hardware", as I've mostly seen it when fsyncing the hprof
> statistics or something and blocking on a slow disk. if you want to know
> more specific you can enable some safepoint logging but i would recommend
> checking your disks or just replacing the host if able to. Can do analysis
> after its not impacting you.
>
> for info safepoint logging (might not be super helpful, but if you really
> need this hardware and need to dig into whats causing JVM to hang up):
>
> -XX:+UnlockDiagnosticVMOptions
> -XX:+PrintSafepointStatistics
> -XX:+LogVMOutput
> -XX:LogFile=somelocation.log
>
> Chris
>
>
> On Thu, Nov 23, 2017 at 3:09 PM, Peng Xiao <25...@qq.com> wrote:
>
>> Hi Chris,
>>
>> I found the gc log in another node which we enable the gc log.
>> Could you please take a look?
>>
>> Thanks,
>> Peng Xiao
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Chris Lohfink";<cl...@gmail.com>;
>> *发送时间:* 2017年11月24日(星期五) 凌晨4:57
>> *收件人:* "user"<us...@cassandra.apache.org>;
>> *主题:* Re: gc causes C* node hang
>>
>> If it sets it to 8, you shouldn't override it to 5.
>>
>> You should enable gc logging for what its worth. Its very very cheap and
>> provides a lot of useful information when you need it.
>>
>> Chris
>>
>> On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:
>>
>>> We only have 7 cores per node.
>>> For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at
>>> 8,
>>> maybe we need to remove this?
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "我自己的邮箱";<25...@qq.com>;
>>> *发送时间:* 2017年11月24日(星期五) 凌晨4:51
>>> *收件人:* "user"<us...@cassandra.apache.org>;
>>> *主题:* 回复： gc causes C* node hang
>>>
>>> Thanks Chris.we don't have gclogs for this node.we will try to add
>>> XX:G1ReservePercent=25.
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "Chris Lohfink";<cl...@gmail.com>;
>>> *发送时间:* 2017年11月24日(星期五) 凌晨4:46
>>> *收件人:* "user"<us...@cassandra.apache.org>;
>>> *主题:* Re: gc causes C* node hang
>>>
>>> Can you include output from the gc logs on the 30ms pause? If you dont
>>> have gclogs, enable it and collect one. G1 provides good details and can
>>> catch some edge cases with usecase.
>>>
>>> I would guess since its so long you didnt have enough to-space. can try
>>> adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and
>>> -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you
>>> can.
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>>>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>>>
>>>
>>> how many cpu cores do you have? Make sure your not setting these lower
>>> than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep
>>> Threads`)
>>>
>>> Looks like 16gb heap?  how much space is available on the host (how big
>>> can you set it)? swap disabled?
>>>
>>> If its not to-space exhausted issue, gc logs will help.
>>>
>>> Chris
>>>
>>> On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> We have a cluster with two DCs with 2.1.13,sometimes the gc will cause
>>>> one node hang,and the application rt will jump to 15s,actually even we have
>>>> one node down,the rt will not fluctuates violently.
>>>> We are using Cassandra G1 with the following configuration:
>>>>
>>>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>>>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>>>
>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>>> JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>>>
>>>> Could anyone please advise?
>>>>
>>>>
>>>> Thanks,
>>>> Peng Xiao
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>
>

Re: gc causes C* node hang

Posted by Chris Lohfink <cl...@gmail.com>.

The only pause over 1s you had was

2017-11-23T17:50:14.573+0800: 1378060.385: Total time for which application
threads were stopped: 37.0282783 seconds, Stopping threads took: 36.9420759
seconds
This is not actually a GC pause, its likely that it was actually revoking
bias or something completely unrelated to GCs even. "Stopping threads took:
36.9420759 seconds" means it took 37 seconds for the threads to reach a
safepoint once jvm wanted to stop the world. My knee jerk reaction to this
is "hardware", as I've mostly seen it when fsyncing the hprof statistics or
something and blocking on a slow disk. if you want to know more specific
you can enable some safepoint logging but i would recommend checking your
disks or just replacing the host if able to. Can do analysis after its not
impacting you.

for info safepoint logging (might not be super helpful, but if you really
need this hardware and need to dig into whats causing JVM to hang up):

-XX:+UnlockDiagnosticVMOptions
-XX:+PrintSafepointStatistics
-XX:+LogVMOutput
-XX:LogFile=somelocation.log

Chris


On Thu, Nov 23, 2017 at 3:09 PM, Peng Xiao <25...@qq.com> wrote:

> Hi Chris,
>
> I found the gc log in another node which we enable the gc log.
> Could you please take a look?
>
> Thanks,
> Peng Xiao
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Chris Lohfink";<cl...@gmail.com>;
> *发送时间:* 2017年11月24日(星期五) 凌晨4:57
> *收件人:* "user"<us...@cassandra.apache.org>;
> *主题:* Re: gc causes C* node hang
>
> If it sets it to 8, you shouldn't override it to 5.
>
> You should enable gc logging for what its worth. Its very very cheap and
> provides a lot of useful information when you need it.
>
> Chris
>
> On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:
>
>> We only have 7 cores per node.
>> For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
>> maybe we need to remove this?
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "我自己的邮箱";<25...@qq.com>;
>> *发送时间:* 2017年11月24日(星期五) 凌晨4:51
>> *收件人:* "user"<us...@cassandra.apache.org>;
>> *主题:* 回复： gc causes C* node hang
>>
>> Thanks Chris.we don't have gclogs for this node.we will try to add
>> XX:G1ReservePercent=25.
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Chris Lohfink";<cl...@gmail.com>;
>> *发送时间:* 2017年11月24日(星期五) 凌晨4:46
>> *收件人:* "user"<us...@cassandra.apache.org>;
>> *主题:* Re: gc causes C* node hang
>>
>> Can you include output from the gc logs on the 30ms pause? If you dont
>> have gclogs, enable it and collect one. G1 provides good details and can
>> catch some edge cases with usecase.
>>
>> I would guess since its so long you didnt have enough to-space. can try
>> adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and
>> -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>>
>>
>> how many cpu cores do you have? Make sure your not setting these lower
>> than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep
>> Threads`)
>>
>> Looks like 16gb heap?  how much space is available on the host (how big
>> can you set it)? swap disabled?
>>
>> If its not to-space exhausted issue, gc logs will help.
>>
>> Chris
>>
>> On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
>>
>>> Hi there,
>>>
>>> We have a cluster with two DCs with 2.1.13,sometimes the gc will cause
>>> one node hang,and the application rt will jump to 15s,actually even we have
>>> one node down,the rt will not fluctuates violently.
>>> We are using Cassandra G1 with the following configuration:
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>> JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>>
>>> Could anyone please advise?
>>>
>>>
>>> Thanks,
>>> Peng Xiao
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>

回复： gc causes C* node hang

Posted by Peng Xiao <25...@qq.com>.

Hi Chris,


I found the gc log in another node which we enable the gc log.
Could you please take a look?


Thanks,
Peng Xiao






------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:57
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



If it sets it to 8, you shouldn't override it to 5.

You should enable gc logging for what its worth. Its very very cheap and provides a lot of useful information when you need it.


Chris


On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:
We only have 7 cores per node.
For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
maybe we need to remove this?




------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<25...@qq.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:51
收件人: "user"<us...@cassandra.apache.org>;

主题: 回复： gc causes C* node hang



Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao

Re: gc causes C* node hang

Posted by Chris Lohfink <cl...@gmail.com>.

If it sets it to 8, you shouldn't override it to 5.

You should enable gc logging for what its worth. Its very very cheap and
provides a lot of useful information when you need it.

Chris

On Thu, Nov 23, 2017 at 2:54 PM, Peng Xiao <25...@qq.com> wrote:

> We only have 7 cores per node.
> For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
> maybe we need to remove this?
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "我自己的邮箱";<25...@qq.com>;
> *发送时间:* 2017年11月24日(星期五) 凌晨4:51
> *收件人:* "user"<us...@cassandra.apache.org>;
> *主题:* 回复： gc causes C* node hang
>
> Thanks Chris.we don't have gclogs for this node.we will try to add
> XX:G1ReservePercent=25.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Chris Lohfink";<cl...@gmail.com>;
> *发送时间:* 2017年11月24日(星期五) 凌晨4:46
> *收件人:* "user"<us...@cassandra.apache.org>;
> *主题:* Re: gc causes C* node hang
>
> Can you include output from the gc logs on the 30ms pause? If you dont
> have gclogs, enable it and collect one. G1 provides good details and can
> catch some edge cases with usecase.
>
> I would guess since its so long you didnt have enough to-space. can try
> adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:
> InitiatingHeapOccupancyPercent) and increasing heap space if you can.
>
> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>
>
> how many cpu cores do you have? Make sure your not setting these lower
> than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`
> )
>
> Looks like 16gb heap?  how much space is available on the host (how big
> can you set it)? swap disabled?
>
> If its not to-space exhausted issue, gc logs will help.
>
> Chris
>
> On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
>
>> Hi there,
>>
>> We have a cluster with two DCs with 2.1.13,sometimes the gc will cause
>> one node hang,and the application rt will jump to 15s,actually even we have
>> one node down,the rt will not fluctuates violently.
>> We are using Cassandra G1 with the following configuration:
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>> JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>
>> Could anyone please advise?
>>
>>
>> Thanks,
>> Peng Xiao
>>
>
>

回复： gc causes C* node hang

Posted by Peng Xiao <25...@qq.com>.

We only have 7 cores per node.
For XX:ParallelGCThreads ,looks By default, Hotspot caps GC threads at 8,
maybe we need to remove this?




------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<25...@qq.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:51
收件人: "user"<us...@cassandra.apache.org>;

主题: 回复： gc causes C* node hang



Thanks Chris.we don't have gclogs for this node.we will try to add XX:G1ReservePercent=25.




------------------ 原始邮件 ------------------
发件人: "Chris Lohfink";<cl...@gmail.com>;
发送时间: 2017年11月24日(星期五) 凌晨4:46
收件人: "user"<us...@cassandra.apache.org>;

主题: Re: gc causes C* node hang



Can you include output from the gc logs on the 30ms pause? If you dont have gclogs, enable it and collect one. G1 provides good details and can catch some edge cases with usecase.

I would guess since its so long you didnt have enough to-space. can try adding -XX:G1ReservePercent=25 (or -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent) and increasing heap space if you can.

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"

how many cpu cores do you have? Make sure your not setting these lower than default. ( check with `java -XX:+PrintFlagsFinal 2>&1 | grep Threads`)
 

Looks like 16gb heap?  how much space is available on the host (how big can you set it)? swap disabled?

If its not to-space exhausted issue, gc logs will help.

Chris






On Thu, Nov 23, 2017 at 12:49 AM, Peng Xiao <25...@qq.com> wrote:
Hi there,


We have a cluster with two DCs with 2.1.13,sometimes the gc will cause one node hang,and the application rt will jump to 15s,actually even we have one node down,the rt will not fluctuates violently. 
We are using Cassandra G1 with the following configuration:


JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=5"
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"


JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"



Could anyone please advise?





Thanks,
Peng Xiao