You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jack Levin <ma...@gmail.com> on 2011/05/02 02:47:18 UTC
one of our datanodes stops working after few hours
I took a jstack (http://pastebin.com/5v6mHg3t). After few hours, its
literally staggers to a halt and gets very very slow... Any ideas
whats its blocking on?
(main issue is that fsreads for RS get really slow when that happens).
-Jack
Re: one of our datanodes stops working after few hours
Posted by Jack Levin <ma...@gmail.com>.
I will try, thanks. I have not ran NFS since 1998 :).
-Jack
On Mon, May 2, 2011 at 10:10 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Jack,
>
> Try turning off your clienttrace logs in the DN log4j.properties, perhaps?
>
> By any chance do you log to NFS?
>
> Your blocked threads all seem to be waiting on appends to log4j.
>
> -Todd
>
> On Mon, May 2, 2011 at 7:29 PM, Jack Levin <ma...@gmail.com> wrote:
>
>> As requested:
>>
>> http://pastebin.com/aySaTADp
>>
>> Note, blocked threads.
>>
>> -Jack
>>
>> On Mon, May 2, 2011 at 2:39 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > I think Todd was asking to have a jstack without yourkit, so it
>> > shouldn't be an issue for you :)
>> >
>> > J-D
>> >
>> > On Mon, May 2, 2011 at 1:56 PM, Jack Levin <ma...@gmail.com> wrote:
>> >> my yourkit version expired :)... but here is the jstack when it
>> >> happens: http://pastebin.com/5v6mHg3t
>> >>
>> >> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> >>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
>> >>>
>> >>>> Tried removing yourkit and run on javasun, same thing. We have some
>> >>>> threads blocked, does anyone know what they block on?
>> >>>>
>> >>>
>> >>> Which threads are blocked? Can you get some jstacks without yourkit?
>> >>>
>> >>> -Todd
>> >>>
>> >>>
>> >>>>
>> >>>> -Jack
>> >>>>
>> >>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com>
>> wrote:
>> >>>> > Hi Jack,
>> >>>> >
>> >>>> > Does this happen even if you aren't running Yourkit on the DN?
>> >>>> >
>> >>>> > Can you try using a Sun JDK instead of OpenJDK?
>> >>>> >
>> >>>> > -Todd
>> >>>> >
>> >>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com>
>> wrote:
>> >>>> >
>> >>>> >> Version: 0.20.2+320 hdfs
>> >>>> >> .89 HBASE
>> >>>> >>
>> >>>> >> ulimit is 32k
>> >>>> >> xcievers is 5k
>> >>>> >>
>> >>>> >> Note from the jstack, I am not exceeding xcievers.
>> >>>> >>
>> >>>> >> -Jack
>> >>>> >>
>> >>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>> >>>> michael_segel@hotmail.com>
>> >>>> >> wrote:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > What's your xceivers set to?
>> >>>> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
>> >>>> which
>> >>>> >> release/version you were using.)
>> >>>> >> >
>> >>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>> >>>> >> >> Subject: one of our datanodes stops working after few hours
>> >>>> >> >> From: magnito@gmail.com
>> >>>> >> >> To: user@hbase.apache.org
>> >>>> >> >>
>> >>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few
>> hours,
>> >>>> its
>> >>>> >> >> literally staggers to a halt and gets very very slow... Any
>> ideas
>> >>>> >> >> whats its blocking on?
>> >>>> >> >> (main issue is that fsreads for RS get really slow when that
>> >>>> happens).
>> >>>> >> >>
>> >>>> >> >> -Jack
>> >>>> >> >
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Todd Lipcon
>> >>>> > Software Engineer, Cloudera
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Todd Lipcon
>> >>> Software Engineer, Cloudera
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
Re: one of our datanodes stops working after few hours
Posted by Todd Lipcon <to...@cloudera.com>.
Hi Jack,
Try turning off your clienttrace logs in the DN log4j.properties, perhaps?
By any chance do you log to NFS?
Your blocked threads all seem to be waiting on appends to log4j.
-Todd
On Mon, May 2, 2011 at 7:29 PM, Jack Levin <ma...@gmail.com> wrote:
> As requested:
>
> http://pastebin.com/aySaTADp
>
> Note, blocked threads.
>
> -Jack
>
> On Mon, May 2, 2011 at 2:39 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > I think Todd was asking to have a jstack without yourkit, so it
> > shouldn't be an issue for you :)
> >
> > J-D
> >
> > On Mon, May 2, 2011 at 1:56 PM, Jack Levin <ma...@gmail.com> wrote:
> >> my yourkit version expired :)... but here is the jstack when it
> >> happens: http://pastebin.com/5v6mHg3t
> >>
> >> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
> >>>
> >>>> Tried removing yourkit and run on javasun, same thing. We have some
> >>>> threads blocked, does anyone know what they block on?
> >>>>
> >>>
> >>> Which threads are blocked? Can you get some jstacks without yourkit?
> >>>
> >>> -Todd
> >>>
> >>>
> >>>>
> >>>> -Jack
> >>>>
> >>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> >>>> > Hi Jack,
> >>>> >
> >>>> > Does this happen even if you aren't running Yourkit on the DN?
> >>>> >
> >>>> > Can you try using a Sun JDK instead of OpenJDK?
> >>>> >
> >>>> > -Todd
> >>>> >
> >>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com>
> wrote:
> >>>> >
> >>>> >> Version: 0.20.2+320 hdfs
> >>>> >> .89 HBASE
> >>>> >>
> >>>> >> ulimit is 32k
> >>>> >> xcievers is 5k
> >>>> >>
> >>>> >> Note from the jstack, I am not exceeding xcievers.
> >>>> >>
> >>>> >> -Jack
> >>>> >>
> >>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
> >>>> michael_segel@hotmail.com>
> >>>> >> wrote:
> >>>> >> >
> >>>> >> >
> >>>> >> > What's your xceivers set to?
> >>>> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
> >>>> which
> >>>> >> release/version you were using.)
> >>>> >> >
> >>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
> >>>> >> >> Subject: one of our datanodes stops working after few hours
> >>>> >> >> From: magnito@gmail.com
> >>>> >> >> To: user@hbase.apache.org
> >>>> >> >>
> >>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few
> hours,
> >>>> its
> >>>> >> >> literally staggers to a halt and gets very very slow... Any
> ideas
> >>>> >> >> whats its blocking on?
> >>>> >> >> (main issue is that fsreads for RS get really slow when that
> >>>> happens).
> >>>> >> >>
> >>>> >> >> -Jack
> >>>> >> >
> >>>> >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Todd Lipcon
> >>>> > Software Engineer, Cloudera
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>>
> >>
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
Re: one of our datanodes stops working after few hours
Posted by Jack Levin <ma...@gmail.com>.
As requested:
http://pastebin.com/aySaTADp
Note, blocked threads.
-Jack
On Mon, May 2, 2011 at 2:39 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> I think Todd was asking to have a jstack without yourkit, so it
> shouldn't be an issue for you :)
>
> J-D
>
> On Mon, May 2, 2011 at 1:56 PM, Jack Levin <ma...@gmail.com> wrote:
>> my yourkit version expired :)... but here is the jstack when it
>> happens: http://pastebin.com/5v6mHg3t
>>
>> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
>>>
>>>> Tried removing yourkit and run on javasun, same thing. We have some
>>>> threads blocked, does anyone know what they block on?
>>>>
>>>
>>> Which threads are blocked? Can you get some jstacks without yourkit?
>>>
>>> -Todd
>>>
>>>
>>>>
>>>> -Jack
>>>>
>>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> > Hi Jack,
>>>> >
>>>> > Does this happen even if you aren't running Yourkit on the DN?
>>>> >
>>>> > Can you try using a Sun JDK instead of OpenJDK?
>>>> >
>>>> > -Todd
>>>> >
>>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
>>>> >
>>>> >> Version: 0.20.2+320 hdfs
>>>> >> .89 HBASE
>>>> >>
>>>> >> ulimit is 32k
>>>> >> xcievers is 5k
>>>> >>
>>>> >> Note from the jstack, I am not exceeding xcievers.
>>>> >>
>>>> >> -Jack
>>>> >>
>>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>>>> michael_segel@hotmail.com>
>>>> >> wrote:
>>>> >> >
>>>> >> >
>>>> >> > What's your xceivers set to?
>>>> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
>>>> which
>>>> >> release/version you were using.)
>>>> >> >
>>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>>>> >> >> Subject: one of our datanodes stops working after few hours
>>>> >> >> From: magnito@gmail.com
>>>> >> >> To: user@hbase.apache.org
>>>> >> >>
>>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours,
>>>> its
>>>> >> >> literally staggers to a halt and gets very very slow... Any ideas
>>>> >> >> whats its blocking on?
>>>> >> >> (main issue is that fsreads for RS get really slow when that
>>>> happens).
>>>> >> >>
>>>> >> >> -Jack
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Todd Lipcon
>>>> > Software Engineer, Cloudera
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>
Re: one of our datanodes stops working after few hours
Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think Todd was asking to have a jstack without yourkit, so it
shouldn't be an issue for you :)
J-D
On Mon, May 2, 2011 at 1:56 PM, Jack Levin <ma...@gmail.com> wrote:
> my yourkit version expired :)... but here is the jstack when it
> happens: http://pastebin.com/5v6mHg3t
>
> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
>>
>>> Tried removing yourkit and run on javasun, same thing. We have some
>>> threads blocked, does anyone know what they block on?
>>>
>>
>> Which threads are blocked? Can you get some jstacks without yourkit?
>>
>> -Todd
>>
>>
>>>
>>> -Jack
>>>
>>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>> > Hi Jack,
>>> >
>>> > Does this happen even if you aren't running Yourkit on the DN?
>>> >
>>> > Can you try using a Sun JDK instead of OpenJDK?
>>> >
>>> > -Todd
>>> >
>>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
>>> >
>>> >> Version: 0.20.2+320 hdfs
>>> >> .89 HBASE
>>> >>
>>> >> ulimit is 32k
>>> >> xcievers is 5k
>>> >>
>>> >> Note from the jstack, I am not exceeding xcievers.
>>> >>
>>> >> -Jack
>>> >>
>>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>>> michael_segel@hotmail.com>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > What's your xceivers set to?
>>> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
>>> which
>>> >> release/version you were using.)
>>> >> >
>>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>>> >> >> Subject: one of our datanodes stops working after few hours
>>> >> >> From: magnito@gmail.com
>>> >> >> To: user@hbase.apache.org
>>> >> >>
>>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours,
>>> its
>>> >> >> literally staggers to a halt and gets very very slow... Any ideas
>>> >> >> whats its blocking on?
>>> >> >> (main issue is that fsreads for RS get really slow when that
>>> happens).
>>> >> >>
>>> >> >> -Jack
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Todd Lipcon
>>> > Software Engineer, Cloudera
>>> >
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
Re: one of our datanodes stops working after few hours
Posted by Jack Levin <ma...@gmail.com>.
my yourkit version expired :)... but here is the jstack when it
happens: http://pastebin.com/5v6mHg3t
On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon <to...@cloudera.com> wrote:
> On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
>
>> Tried removing yourkit and run on javasun, same thing. We have some
>> threads blocked, does anyone know what they block on?
>>
>
> Which threads are blocked? Can you get some jstacks without yourkit?
>
> -Todd
>
>
>>
>> -Jack
>>
>> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> > Hi Jack,
>> >
>> > Does this happen even if you aren't running Yourkit on the DN?
>> >
>> > Can you try using a Sun JDK instead of OpenJDK?
>> >
>> > -Todd
>> >
>> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
>> >
>> >> Version: 0.20.2+320 hdfs
>> >> .89 HBASE
>> >>
>> >> ulimit is 32k
>> >> xcievers is 5k
>> >>
>> >> Note from the jstack, I am not exceeding xcievers.
>> >>
>> >> -Jack
>> >>
>> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
>> michael_segel@hotmail.com>
>> >> wrote:
>> >> >
>> >> >
>> >> > What's your xceivers set to?
>> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
>> which
>> >> release/version you were using.)
>> >> >
>> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
>> >> >> Subject: one of our datanodes stops working after few hours
>> >> >> From: magnito@gmail.com
>> >> >> To: user@hbase.apache.org
>> >> >>
>> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours,
>> its
>> >> >> literally staggers to a halt and gets very very slow... Any ideas
>> >> >> whats its blocking on?
>> >> >> (main issue is that fsreads for RS get really slow when that
>> happens).
>> >> >>
>> >> >> -Jack
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
Re: one of our datanodes stops working after few hours
Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, May 2, 2011 at 12:56 PM, Jack Levin <ma...@gmail.com> wrote:
> Tried removing yourkit and run on javasun, same thing. We have some
> threads blocked, does anyone know what they block on?
>
Which threads are blocked? Can you get some jstacks without yourkit?
-Todd
>
> -Jack
>
> On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com> wrote:
> > Hi Jack,
> >
> > Does this happen even if you aren't running Yourkit on the DN?
> >
> > Can you try using a Sun JDK instead of OpenJDK?
> >
> > -Todd
> >
> > On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
> >
> >> Version: 0.20.2+320 hdfs
> >> .89 HBASE
> >>
> >> ulimit is 32k
> >> xcievers is 5k
> >>
> >> Note from the jstack, I am not exceeding xcievers.
> >>
> >> -Jack
> >>
> >> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <
> michael_segel@hotmail.com>
> >> wrote:
> >> >
> >> >
> >> > What's your xceivers set to?
> >> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say
> which
> >> release/version you were using.)
> >> >
> >> >> Date: Sun, 1 May 2011 17:47:18 -0700
> >> >> Subject: one of our datanodes stops working after few hours
> >> >> From: magnito@gmail.com
> >> >> To: user@hbase.apache.org
> >> >>
> >> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours,
> its
> >> >> literally staggers to a halt and gets very very slow... Any ideas
> >> >> whats its blocking on?
> >> >> (main issue is that fsreads for RS get really slow when that
> happens).
> >> >>
> >> >> -Jack
> >> >
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
Re: one of our datanodes stops working after few hours
Posted by Jack Levin <ma...@gmail.com>.
Tried removing yourkit and run on javasun, same thing. We have some
threads blocked, does anyone know what they block on?
-Jack
On Mon, May 2, 2011 at 7:53 AM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Jack,
>
> Does this happen even if you aren't running Yourkit on the DN?
>
> Can you try using a Sun JDK instead of OpenJDK?
>
> -Todd
>
> On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
>
>> Version: 0.20.2+320 hdfs
>> .89 HBASE
>>
>> ulimit is 32k
>> xcievers is 5k
>>
>> Note from the jstack, I am not exceeding xcievers.
>>
>> -Jack
>>
>> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <mi...@hotmail.com>
>> wrote:
>> >
>> >
>> > What's your xceivers set to?
>> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say which
>> release/version you were using.)
>> >
>> >> Date: Sun, 1 May 2011 17:47:18 -0700
>> >> Subject: one of our datanodes stops working after few hours
>> >> From: magnito@gmail.com
>> >> To: user@hbase.apache.org
>> >>
>> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours, its
>> >> literally staggers to a halt and gets very very slow... Any ideas
>> >> whats its blocking on?
>> >> (main issue is that fsreads for RS get really slow when that happens).
>> >>
>> >> -Jack
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
Re: one of our datanodes stops working after few hours
Posted by Todd Lipcon <to...@cloudera.com>.
Hi Jack,
Does this happen even if you aren't running Yourkit on the DN?
Can you try using a Sun JDK instead of OpenJDK?
-Todd
On Sun, May 1, 2011 at 7:34 PM, Jack Levin <ma...@gmail.com> wrote:
> Version: 0.20.2+320 hdfs
> .89 HBASE
>
> ulimit is 32k
> xcievers is 5k
>
> Note from the jstack, I am not exceeding xcievers.
>
> -Jack
>
> On Sun, May 1, 2011 at 6:19 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> >
> >
> > What's your xceivers set to?
> > What's the ulimit -n set for hdfs/hadoop user... (You didn't say which
> release/version you were using.)
> >
> >> Date: Sun, 1 May 2011 17:47:18 -0700
> >> Subject: one of our datanodes stops working after few hours
> >> From: magnito@gmail.com
> >> To: user@hbase.apache.org
> >>
> >> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours, its
> >> literally staggers to a halt and gets very very slow... Any ideas
> >> whats its blocking on?
> >> (main issue is that fsreads for RS get really slow when that happens).
> >>
> >> -Jack
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
Re: one of our datanodes stops working after few hours
Posted by Jack Levin <ma...@gmail.com>.
Version: 0.20.2+320 hdfs
.89 HBASE
ulimit is 32k
xcievers is 5k
Note from the jstack, I am not exceeding xcievers.
-Jack
On Sun, May 1, 2011 at 6:19 PM, Michael Segel <mi...@hotmail.com> wrote:
>
>
> What's your xceivers set to?
> What's the ulimit -n set for hdfs/hadoop user... (You didn't say which release/version you were using.)
>
>> Date: Sun, 1 May 2011 17:47:18 -0700
>> Subject: one of our datanodes stops working after few hours
>> From: magnito@gmail.com
>> To: user@hbase.apache.org
>>
>> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours, its
>> literally staggers to a halt and gets very very slow... Any ideas
>> whats its blocking on?
>> (main issue is that fsreads for RS get really slow when that happens).
>>
>> -Jack
>
RE: one of our datanodes stops working after few hours
Posted by Michael Segel <mi...@hotmail.com>.
What's your xceivers set to?
What's the ulimit -n set for hdfs/hadoop user... (You didn't say which release/version you were using.)
> Date: Sun, 1 May 2011 17:47:18 -0700
> Subject: one of our datanodes stops working after few hours
> From: magnito@gmail.com
> To: user@hbase.apache.org
>
> I took a jstack (http://pastebin.com/5v6mHg3t). After few hours, its
> literally staggers to a halt and gets very very slow... Any ideas
> whats its blocking on?
> (main issue is that fsreads for RS get really slow when that happens).
>
> -Jack