You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Gokulakannan M (Engineering - Data Platform)" <go...@flipkart.com> on 2016/02/25 06:39:05 UTC

Namenode shutdown due to long GC Pauses

Hi,

It is known that namenode shuts down when a long GC pause happens when NN
writes edits to journal nodes - Namenode thinks that journal nodes didn't
respond but actually it was due to the long GC pause. Any pointers on
solving this issue?

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Can you also share the GC details and jmap histogram output?

Thanks

On Thu, Feb 25, 2016 at 4:21 PM, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:

> Hi Jitendra,
>
> Trying to find the pattern but one thing observed is that the metrics *RpcDetailedActivity.GetServerDefaultsNumOps
> *is pretty high(around 14 million) when long pause happened.
>
> G1 garbage collector is used already. These are the main JVM parameters.
>
> -XX:+UseG1GC
> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
> -XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
> -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
> -XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
> -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> -Xms75776m -Xmx75776m
>
> On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:
>
>> Which garbage collector you are using currently in your env? Can you
>> share the jvm parameters?.  If you are using CMS and already optimized your
>> parameter then probably you can look at to G1 garbage collector.
>>
>> First you should look at the GC stats and pattern to find out the cause
>> of long GC.
>>
>> Regards
>> Jitendra
>>
>>
>>
>> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
>> wrote:
>>
>>> You my need to tune your GC settings.
>>>
>>>
>>> ᐧ
>>>
>>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>>> wrote:
>>>
>>>> This happened to us. Our namenodes are on a virtual machine, and
>>>> reducing the number of replication locations of the journal node to
>>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>>
>>>> Regards,
>>>> LLoyd
>>>>
>>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>>> Platform) <go...@flipkart.com> wrote:
>>>> > Hi,
>>>> >
>>>> > It is known that namenode shuts down when a long GC pause happens
>>>> when NN
>>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>>> didn't
>>>> > respond but actually it was due to the long GC pause. Any pointers on
>>>> > solving this issue?
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> *  Regards*
>>> *  Sandeep Nemuri*
>>>
>>
>>
>
>
> --
>
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Can you also share the GC details and jmap histogram output?

Thanks

On Thu, Feb 25, 2016 at 4:21 PM, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:

> Hi Jitendra,
>
> Trying to find the pattern but one thing observed is that the metrics *RpcDetailedActivity.GetServerDefaultsNumOps
> *is pretty high(around 14 million) when long pause happened.
>
> G1 garbage collector is used already. These are the main JVM parameters.
>
> -XX:+UseG1GC
> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
> -XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
> -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
> -XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
> -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> -Xms75776m -Xmx75776m
>
> On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:
>
>> Which garbage collector you are using currently in your env? Can you
>> share the jvm parameters?.  If you are using CMS and already optimized your
>> parameter then probably you can look at to G1 garbage collector.
>>
>> First you should look at the GC stats and pattern to find out the cause
>> of long GC.
>>
>> Regards
>> Jitendra
>>
>>
>>
>> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
>> wrote:
>>
>>> You my need to tune your GC settings.
>>>
>>>
>>> ᐧ
>>>
>>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>>> wrote:
>>>
>>>> This happened to us. Our namenodes are on a virtual machine, and
>>>> reducing the number of replication locations of the journal node to
>>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>>
>>>> Regards,
>>>> LLoyd
>>>>
>>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>>> Platform) <go...@flipkart.com> wrote:
>>>> > Hi,
>>>> >
>>>> > It is known that namenode shuts down when a long GC pause happens
>>>> when NN
>>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>>> didn't
>>>> > respond but actually it was due to the long GC pause. Any pointers on
>>>> > solving this issue?
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> *  Regards*
>>> *  Sandeep Nemuri*
>>>
>>
>>
>
>
> --
>
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Can you also share the GC details and jmap histogram output?

Thanks

On Thu, Feb 25, 2016 at 4:21 PM, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:

> Hi Jitendra,
>
> Trying to find the pattern but one thing observed is that the metrics *RpcDetailedActivity.GetServerDefaultsNumOps
> *is pretty high(around 14 million) when long pause happened.
>
> G1 garbage collector is used already. These are the main JVM parameters.
>
> -XX:+UseG1GC
> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
> -XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
> -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
> -XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
> -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> -Xms75776m -Xmx75776m
>
> On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:
>
>> Which garbage collector you are using currently in your env? Can you
>> share the jvm parameters?.  If you are using CMS and already optimized your
>> parameter then probably you can look at to G1 garbage collector.
>>
>> First you should look at the GC stats and pattern to find out the cause
>> of long GC.
>>
>> Regards
>> Jitendra
>>
>>
>>
>> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
>> wrote:
>>
>>> You my need to tune your GC settings.
>>>
>>>
>>> ᐧ
>>>
>>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>>> wrote:
>>>
>>>> This happened to us. Our namenodes are on a virtual machine, and
>>>> reducing the number of replication locations of the journal node to
>>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>>
>>>> Regards,
>>>> LLoyd
>>>>
>>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>>> Platform) <go...@flipkart.com> wrote:
>>>> > Hi,
>>>> >
>>>> > It is known that namenode shuts down when a long GC pause happens
>>>> when NN
>>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>>> didn't
>>>> > respond but actually it was due to the long GC pause. Any pointers on
>>>> > solving this issue?
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> *  Regards*
>>> *  Sandeep Nemuri*
>>>
>>
>>
>
>
> --
>
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Can you also share the GC details and jmap histogram output?

Thanks

On Thu, Feb 25, 2016 at 4:21 PM, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:

> Hi Jitendra,
>
> Trying to find the pattern but one thing observed is that the metrics *RpcDetailedActivity.GetServerDefaultsNumOps
> *is pretty high(around 14 million) when long pause happened.
>
> G1 garbage collector is used already. These are the main JVM parameters.
>
> -XX:+UseG1GC
> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
> -XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
> -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
> -XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
> -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> -Xms75776m -Xmx75776m
>
> On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:
>
>> Which garbage collector you are using currently in your env? Can you
>> share the jvm parameters?.  If you are using CMS and already optimized your
>> parameter then probably you can look at to G1 garbage collector.
>>
>> First you should look at the GC stats and pattern to find out the cause
>> of long GC.
>>
>> Regards
>> Jitendra
>>
>>
>>
>> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
>> wrote:
>>
>>> You my need to tune your GC settings.
>>>
>>>
>>> ᐧ
>>>
>>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>>> wrote:
>>>
>>>> This happened to us. Our namenodes are on a virtual machine, and
>>>> reducing the number of replication locations of the journal node to
>>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>>
>>>> Regards,
>>>> LLoyd
>>>>
>>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>>> Platform) <go...@flipkart.com> wrote:
>>>> > Hi,
>>>> >
>>>> > It is known that namenode shuts down when a long GC pause happens
>>>> when NN
>>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>>> didn't
>>>> > respond but actually it was due to the long GC pause. Any pointers on
>>>> > solving this issue?
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> *  Regards*
>>> *  Sandeep Nemuri*
>>>
>>
>>
>
>
> --
>
>

Re: Namenode shutdown due to long GC Pauses

Posted by "Gokulakannan M (Engineering - Data Platform)" <go...@flipkart.com>.

Hi Jitendra,

Trying to find the pattern but one thing observed is that the metrics
*RpcDetailedActivity.GetServerDefaultsNumOps
*is pretty high(around 14 million) when long pause happened.

G1 garbage collector is used already. These are the main JVM parameters.

-XX:+UseG1GC
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
-XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
-XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
-server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-Xms75776m -Xmx75776m

On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:

> Which garbage collector you are using currently in your env? Can you share
> the jvm parameters?.  If you are using CMS and already optimized your
> parameter then probably you can look at to G1 garbage collector.
>
> First you should look at the GC stats and pattern to find out the cause of
> long GC.
>
> Regards
> Jitendra
>
>
>
> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
> wrote:
>
>> You my need to tune your GC settings.
>>
>>
>> ᐧ
>>
>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>> wrote:
>>
>>> This happened to us. Our namenodes are on a virtual machine, and
>>> reducing the number of replication locations of the journal node to
>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>
>>> Regards,
>>> LLoyd
>>>
>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>> Platform) <go...@flipkart.com> wrote:
>>> > Hi,
>>> >
>>> > It is known that namenode shuts down when a long GC pause happens when
>>> NN
>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>> didn't
>>> > respond but actually it was due to the long GC pause. Any pointers on
>>> > solving this issue?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>
>>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


--

Re: Namenode shutdown due to long GC Pauses

Posted by "Gokulakannan M (Engineering - Data Platform)" <go...@flipkart.com>.

Hi Jitendra,

Trying to find the pattern but one thing observed is that the metrics
*RpcDetailedActivity.GetServerDefaultsNumOps
*is pretty high(around 14 million) when long pause happened.

G1 garbage collector is used already. These are the main JVM parameters.

-XX:+UseG1GC
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
-XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
-XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
-server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-Xms75776m -Xmx75776m

On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:

> Which garbage collector you are using currently in your env? Can you share
> the jvm parameters?.  If you are using CMS and already optimized your
> parameter then probably you can look at to G1 garbage collector.
>
> First you should look at the GC stats and pattern to find out the cause of
> long GC.
>
> Regards
> Jitendra
>
>
>
> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
> wrote:
>
>> You my need to tune your GC settings.
>>
>>
>> ᐧ
>>
>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>> wrote:
>>
>>> This happened to us. Our namenodes are on a virtual machine, and
>>> reducing the number of replication locations of the journal node to
>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>
>>> Regards,
>>> LLoyd
>>>
>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>> Platform) <go...@flipkart.com> wrote:
>>> > Hi,
>>> >
>>> > It is known that namenode shuts down when a long GC pause happens when
>>> NN
>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>> didn't
>>> > respond but actually it was due to the long GC pause. Any pointers on
>>> > solving this issue?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>
>>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


--

Re: Namenode shutdown due to long GC Pauses

Posted by "Gokulakannan M (Engineering - Data Platform)" <go...@flipkart.com>.

Hi Jitendra,

Trying to find the pattern but one thing observed is that the metrics
*RpcDetailedActivity.GetServerDefaultsNumOps
*is pretty high(around 14 million) when long pause happened.

G1 garbage collector is used already. These are the main JVM parameters.

-XX:+UseG1GC
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
-XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
-XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
-server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-Xms75776m -Xmx75776m

On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:

> Which garbage collector you are using currently in your env? Can you share
> the jvm parameters?.  If you are using CMS and already optimized your
> parameter then probably you can look at to G1 garbage collector.
>
> First you should look at the GC stats and pattern to find out the cause of
> long GC.
>
> Regards
> Jitendra
>
>
>
> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
> wrote:
>
>> You my need to tune your GC settings.
>>
>>
>> ᐧ
>>
>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>> wrote:
>>
>>> This happened to us. Our namenodes are on a virtual machine, and
>>> reducing the number of replication locations of the journal node to
>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>
>>> Regards,
>>> LLoyd
>>>
>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>> Platform) <go...@flipkart.com> wrote:
>>> > Hi,
>>> >
>>> > It is known that namenode shuts down when a long GC pause happens when
>>> NN
>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>> didn't
>>> > respond but actually it was due to the long GC pause. Any pointers on
>>> > solving this issue?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>
>>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


--

Re: Namenode shutdown due to long GC Pauses

Posted by "Gokulakannan M (Engineering - Data Platform)" <go...@flipkart.com>.

Hi Jitendra,

Trying to find the pattern but one thing observed is that the metrics
*RpcDetailedActivity.GetServerDefaultsNumOps
*is pretty high(around 14 million) when long pause happened.

G1 garbage collector is used already. These are the main JVM parameters.

-XX:+UseG1GC
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+UseNUMA
-XX:MaxGCPauseMillis=500 -XX:GCPauseIntervalMillis=1000
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xss256k
-XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseCompressedOops
-server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-Xms75776m -Xmx75776m

On Thu, Feb 25, 2016 at 3:46 PM, bappa kon <or...@gmail.com> wrote:

> Which garbage collector you are using currently in your env? Can you share
> the jvm parameters?.  If you are using CMS and already optimized your
> parameter then probably you can look at to G1 garbage collector.
>
> First you should look at the GC stats and pattern to find out the cause of
> long GC.
>
> Regards
> Jitendra
>
>
>
> On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
> wrote:
>
>> You my need to tune your GC settings.
>>
>>
>> ᐧ
>>
>> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
>> wrote:
>>
>>> This happened to us. Our namenodes are on a virtual machine, and
>>> reducing the number of replication locations of the journal node to
>>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>>
>>> Regards,
>>> LLoyd
>>>
>>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>>> Platform) <go...@flipkart.com> wrote:
>>> > Hi,
>>> >
>>> > It is known that namenode shuts down when a long GC pause happens when
>>> NN
>>> > writes edits to journal nodes - Namenode thinks that journal nodes
>>> didn't
>>> > respond but actually it was due to the long GC pause. Any pointers on
>>> > solving this issue?
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>
>>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


--

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Which garbage collector you are using currently in your env? Can you share
the jvm parameters?.  If you are using CMS and already optimized your
parameter then probably you can look at to G1 garbage collector.

First you should look at the GC stats and pattern to find out the cause of
long GC.

Regards
Jitendra



On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
wrote:

> You my need to tune your GC settings.
>
>
> ᐧ
>
> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
> wrote:
>
>> This happened to us. Our namenodes are on a virtual machine, and
>> reducing the number of replication locations of the journal node to
>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>
>> Regards,
>> LLoyd
>>
>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>> Platform) <go...@flipkart.com> wrote:
>> > Hi,
>> >
>> > It is known that namenode shuts down when a long GC pause happens when
>> NN
>> > writes edits to journal nodes - Namenode thinks that journal nodes
>> didn't
>> > respond but actually it was due to the long GC pause. Any pointers on
>> > solving this issue?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Which garbage collector you are using currently in your env? Can you share
the jvm parameters?.  If you are using CMS and already optimized your
parameter then probably you can look at to G1 garbage collector.

First you should look at the GC stats and pattern to find out the cause of
long GC.

Regards
Jitendra



On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
wrote:

> You my need to tune your GC settings.
>
>
> ᐧ
>
> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
> wrote:
>
>> This happened to us. Our namenodes are on a virtual machine, and
>> reducing the number of replication locations of the journal node to
>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>
>> Regards,
>> LLoyd
>>
>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>> Platform) <go...@flipkart.com> wrote:
>> > Hi,
>> >
>> > It is known that namenode shuts down when a long GC pause happens when
>> NN
>> > writes edits to journal nodes - Namenode thinks that journal nodes
>> didn't
>> > respond but actually it was due to the long GC pause. Any pointers on
>> > solving this issue?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Which garbage collector you are using currently in your env? Can you share
the jvm parameters?.  If you are using CMS and already optimized your
parameter then probably you can look at to G1 garbage collector.

First you should look at the GC stats and pattern to find out the cause of
long GC.

Regards
Jitendra



On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
wrote:

> You my need to tune your GC settings.
>
>
> ᐧ
>
> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
> wrote:
>
>> This happened to us. Our namenodes are on a virtual machine, and
>> reducing the number of replication locations of the journal node to
>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>
>> Regards,
>> LLoyd
>>
>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>> Platform) <go...@flipkart.com> wrote:
>> > Hi,
>> >
>> > It is known that namenode shuts down when a long GC pause happens when
>> NN
>> > writes edits to journal nodes - Namenode thinks that journal nodes
>> didn't
>> > respond but actually it was due to the long GC pause. Any pointers on
>> > solving this issue?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Re: Namenode shutdown due to long GC Pauses

Posted by bappa kon <or...@gmail.com>.

Which garbage collector you are using currently in your env? Can you share
the jvm parameters?.  If you are using CMS and already optimized your
parameter then probably you can look at to G1 garbage collector.

First you should look at the GC stats and pattern to find out the cause of
long GC.

Regards
Jitendra



On Thu, Feb 25, 2016 at 3:24 PM, Sandeep Nemuri <nh...@gmail.com>
wrote:

> You my need to tune your GC settings.
>
>
> ᐧ
>
> On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
> wrote:
>
>> This happened to us. Our namenodes are on a virtual machine, and
>> reducing the number of replication locations of the journal node to
>> 1 (it's backed by by a safe raid array anyway) solved the problem.
>>
>> Regards,
>> LLoyd
>>
>> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
>> Platform) <go...@flipkart.com> wrote:
>> > Hi,
>> >
>> > It is known that namenode shuts down when a long GC pause happens when
>> NN
>> > writes edits to journal nodes - Namenode thinks that journal nodes
>> didn't
>> > respond but actually it was due to the long GC pause. Any pointers on
>> > solving this issue?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Re: Namenode shutdown due to long GC Pauses

Posted by Sandeep Nemuri <nh...@gmail.com>.

You my need to tune your GC settings.


ᐧ

On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
wrote:

> This happened to us. Our namenodes are on a virtual machine, and
> reducing the number of replication locations of the journal node to
> 1 (it's backed by by a safe raid array anyway) solved the problem.
>
> Regards,
> LLoyd
>
> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
> Platform) <go...@flipkart.com> wrote:
> > Hi,
> >
> > It is known that namenode shuts down when a long GC pause happens when NN
> > writes edits to journal nodes - Namenode thinks that journal nodes didn't
> > respond but actually it was due to the long GC pause. Any pointers on
> > solving this issue?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>


-- 
*  Regards*
*  Sandeep Nemuri*

Re: Namenode shutdown due to long GC Pauses

Posted by Sandeep Nemuri <nh...@gmail.com>.

You my need to tune your GC settings.


ᐧ

On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
wrote:

> This happened to us. Our namenodes are on a virtual machine, and
> reducing the number of replication locations of the journal node to
> 1 (it's backed by by a safe raid array anyway) solved the problem.
>
> Regards,
> LLoyd
>
> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
> Platform) <go...@flipkart.com> wrote:
> > Hi,
> >
> > It is known that namenode shuts down when a long GC pause happens when NN
> > writes edits to journal nodes - Namenode thinks that journal nodes didn't
> > respond but actually it was due to the long GC pause. Any pointers on
> > solving this issue?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>


-- 
*  Regards*
*  Sandeep Nemuri*

Re: Namenode shutdown due to long GC Pauses

Posted by Sandeep Nemuri <nh...@gmail.com>.

You my need to tune your GC settings.


ᐧ

On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
wrote:

> This happened to us. Our namenodes are on a virtual machine, and
> reducing the number of replication locations of the journal node to
> 1 (it's backed by by a safe raid array anyway) solved the problem.
>
> Regards,
> LLoyd
>
> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
> Platform) <go...@flipkart.com> wrote:
> > Hi,
> >
> > It is known that namenode shuts down when a long GC pause happens when NN
> > writes edits to journal nodes - Namenode thinks that journal nodes didn't
> > respond but actually it was due to the long GC pause. Any pointers on
> > solving this issue?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>


-- 
*  Regards*
*  Sandeep Nemuri*

Re: Namenode shutdown due to long GC Pauses

Posted by Sandeep Nemuri <nh...@gmail.com>.

You my need to tune your GC settings.


ᐧ

On Thu, Feb 25, 2016 at 3:04 PM, Namikaze Minato <ll...@gmail.com>
wrote:

> This happened to us. Our namenodes are on a virtual machine, and
> reducing the number of replication locations of the journal node to
> 1 (it's backed by by a safe raid array anyway) solved the problem.
>
> Regards,
> LLoyd
>
> On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
> Platform) <go...@flipkart.com> wrote:
> > Hi,
> >
> > It is known that namenode shuts down when a long GC pause happens when NN
> > writes edits to journal nodes - Namenode thinks that journal nodes didn't
> > respond but actually it was due to the long GC pause. Any pointers on
> > solving this issue?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>


-- 
*  Regards*
*  Sandeep Nemuri*

Re: Namenode shutdown due to long GC Pauses

Posted by Namikaze Minato <ll...@gmail.com>.

This happened to us. Our namenodes are on a virtual machine, and
reducing the number of replication locations of the journal node to
1 (it's backed by by a safe raid array anyway) solved the problem.

Regards,
LLoyd

On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:
> Hi,
>
> It is known that namenode shuts down when a long GC pause happens when NN
> writes edits to journal nodes - Namenode thinks that journal nodes didn't
> respond but actually it was due to the long GC pause. Any pointers on
> solving this issue?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: Namenode shutdown due to long GC Pauses

Posted by Namikaze Minato <ll...@gmail.com>.

This happened to us. Our namenodes are on a virtual machine, and
reducing the number of replication locations of the journal node to
1 (it's backed by by a safe raid array anyway) solved the problem.

Regards,
LLoyd

On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:
> Hi,
>
> It is known that namenode shuts down when a long GC pause happens when NN
> writes edits to journal nodes - Namenode thinks that journal nodes didn't
> respond but actually it was due to the long GC pause. Any pointers on
> solving this issue?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: Namenode shutdown due to long GC Pauses

Posted by Namikaze Minato <ll...@gmail.com>.

This happened to us. Our namenodes are on a virtual machine, and
reducing the number of replication locations of the journal node to
1 (it's backed by by a safe raid array anyway) solved the problem.

Regards,
LLoyd

On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:
> Hi,
>
> It is known that namenode shuts down when a long GC pause happens when NN
> writes edits to journal nodes - Namenode thinks that journal nodes didn't
> respond but actually it was due to the long GC pause. Any pointers on
> solving this issue?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: Namenode shutdown due to long GC Pauses

Posted by Namikaze Minato <ll...@gmail.com>.

This happened to us. Our namenodes are on a virtual machine, and
reducing the number of replication locations of the journal node to
1 (it's backed by by a safe raid array anyway) solved the problem.

Regards,
LLoyd

On 25 February 2016 at 06:39, Gokulakannan M (Engineering - Data
Platform) <go...@flipkart.com> wrote:
> Hi,
>
> It is known that namenode shuts down when a long GC pause happens when NN
> writes edits to journal nodes - Namenode thinks that journal nodes didn't
> respond but actually it was due to the long GC pause. Any pointers on
> solving this issue?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org