You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Stephen Boesch <ja...@gmail.com> on 2011/11/29 15:43:41 UTC

MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

I am just trying to get off the ground with MRv2.  The first node (in
pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
it.

The second node has a serious issue with its single DataNode: it consumes
100% of one of the CPU's.  Looking at it through JVisualVM, there are over
8 million invocations of isBPServiceAlive in a matter of a minute or so and
 continually incrementing at a steady clip.   A screenshot of the JvisualVM
cpu profile - showing just shy of 8M invocations is attached.

What kind of configuration error could lead to this?  The conf/masters and
conf/slaves simply say localhost.   If need be I'll copy the *-site.xml's.
 They are boilerplate from the Cloudera page by Ahmed Radwan.

Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Posted by Stephen Boesch <ja...@gmail.com>.
The problem seems to have gone away, but I can not offer a solid
explanation.  At some point after having removed the working directories
for the datanode and reformatted the namenode and restarted the cluster,
this issue does not manifest anymore.  However, I had already done those
same steps well before posting these issues, so it is not clear what small
detail that I had done was different this time. if this problem were to
recur I would not be able to precisely prescribe a solution.

2011/11/29 Stephen Boesch <ja...@gmail.com>

> I verified the DN was down via both jps and java. Anyways,  it was enough
> to see via "top" since as mentioned DN was consuming 100% of one cpu when
> running.
>
>
> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>
>> Hi Uma,
>>    I mentioned that I have restarted the datanode *many *times, and in
>> fact the entire cluster more than ten times.
>>
>>
>> 2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>
>>
>>>  Looks you are getting HDFS-2553.
>>>
>>> The cause might be that, you cleared the datadirectories directly
>>> without DN restart. Workaround would be to restart DNs.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Uma
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>  *From:* Stephen Boesch [javadba@gmail.com]
>>> *Sent:* Tuesday, November 29, 2011 8:53 PM
>>> *To:* mapreduce-user@hadoop.apache.org
>>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
>>> 200K times per second
>>>
>>>  Update on this:  I've shut down all the servers multiple times.  Also
>>> cleared the data directories and reformatted the namenode. Restarted it and
>>> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>>>
>>>
>>> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>>>
>>>> I am just trying to get off the ground with MRv2.  The first node (in
>>>> pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
>>>> it.
>>>>
>>>>  The second node has a serious issue with its single DataNode: it
>>>> consumes 100% of one of the CPU's.  Looking at it through JVisualVM, there
>>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>>>> or so and  continually incrementing at a steady clip.   A screenshot of the
>>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>>>
>>>>  What kind of configuration error could lead to this?  The
>>>> conf/masters and conf/slaves simply say localhost.   If need be I'll copy
>>>> the *-site.xml's.  They are boilerplate from the Cloudera page by Ahmed
>>>> Radwan.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Posted by Stephen Boesch <ja...@gmail.com>.
I verified the DN was down via both jps and java. Anyways,  it was enough
to see via "top" since as mentioned DN was consuming 100% of one cpu when
running.

2011/11/29 Stephen Boesch <ja...@gmail.com>

> Hi Uma,
>    I mentioned that I have restarted the datanode *many *times, and in
> fact the entire cluster more than ten times.
>
>
> 2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>
>
>>  Looks you are getting HDFS-2553.
>>
>> The cause might be that, you cleared the datadirectories directly without
>> DN restart. Workaround would be to restart DNs.
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>> ------------------------------
>>
>>  *From:* Stephen Boesch [javadba@gmail.com]
>> *Sent:* Tuesday, November 29, 2011 8:53 PM
>> *To:* mapreduce-user@hadoop.apache.org
>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
>> 200K times per second
>>
>>  Update on this:  I've shut down all the servers multiple times.  Also
>> cleared the data directories and reformatted the namenode. Restarted it and
>> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>>
>>
>> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>>
>>> I am just trying to get off the ground with MRv2.  The first node (in
>>> pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
>>> it.
>>>
>>>  The second node has a serious issue with its single DataNode: it
>>> consumes 100% of one of the CPU's.  Looking at it through JVisualVM, there
>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>>> or so and  continually incrementing at a steady clip.   A screenshot of the
>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>>
>>>  What kind of configuration error could lead to this?  The conf/masters
>>> and conf/slaves simply say localhost.   If need be I'll copy the
>>> *-site.xml's.  They are boilerplate from the Cloudera page by Ahmed Radwan.
>>>
>>>
>>>
>>>
>>>
>>
>

Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Posted by Stephen Boesch <ja...@gmail.com>.
Hi Uma,
   I mentioned that I have restarted the datanode *many *times, and in fact
the entire cluster more than ten times.

2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>

>  Looks you are getting HDFS-2553.
>
> The cause might be that, you cleared the datadirectories directly without
> DN restart. Workaround would be to restart DNs.
>
>
>
> Regards,
>
> Uma
>
>
>
> ------------------------------
>
>  *From:* Stephen Boesch [javadba@gmail.com]
> *Sent:* Tuesday, November 29, 2011 8:53 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
> 200K times per second
>
>  Update on this:  I've shut down all the servers multiple times.  Also
> cleared the data directories and reformatted the namenode. Restarted it and
> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>
>
> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>
>> I am just trying to get off the ground with MRv2.  The first node (in
>> pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
>> it.
>>
>>  The second node has a serious issue with its single DataNode: it
>> consumes 100% of one of the CPU's.  Looking at it through JVisualVM, there
>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>> or so and  continually incrementing at a steady clip.   A screenshot of the
>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>
>>  What kind of configuration error could lead to this?  The conf/masters
>> and conf/slaves simply say localhost.   If need be I'll copy the
>> *-site.xml's.  They are boilerplate from the Cloudera page by Ahmed Radwan.
>>
>>
>>
>>
>>
>

RE: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Looks you are getting HDFS-2553.

The cause might be that, you cleared the datadirectories directly without DN restart. Workaround would be to restart DNs.



Regards,

Uma



________________________________

From: Stephen Boesch [javadba@gmail.com]
Sent: Tuesday, November 29, 2011 8:53 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Update on this:  I've shut down all the servers multiple times.  Also cleared the data directories and reformatted the namenode. Restarted it and the same results: 100% cpu and millions of these calls to isBPServiceAlive.


2011/11/29 Stephen Boesch <ja...@gmail.com>>
I am just trying to get off the ground with MRv2.  The first node (in pseudo distributed mode)  is working fine - ran a couple of TeraSort's on it.

The second node has a serious issue with its single DataNode: it consumes 100% of one of the CPU's.  Looking at it through JVisualVM, there are over 8 million invocations of isBPServiceAlive in a matter of a minute or so and  continually incrementing at a steady clip.   A screenshot of the JvisualVM cpu profile - showing just shy of 8M invocations is attached.

What kind of configuration error could lead to this?  The conf/masters and conf/slaves simply say localhost.   If need be I'll copy the *-site.xml's.  They are boilerplate from the Cloudera page by Ahmed Radwan.






Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second

Posted by Stephen Boesch <ja...@gmail.com>.
Update on this:  I've shut down all the servers multiple times.  Also
cleared the data directories and reformatted the namenode. Restarted it and
the same results: 100% cpu and millions of these calls to isBPServiceAlive.


2011/11/29 Stephen Boesch <ja...@gmail.com>

> I am just trying to get off the ground with MRv2.  The first node (in
> pseudo distributed mode)  is working fine - ran a couple of TeraSort's on
> it.
>
> The second node has a serious issue with its single DataNode: it consumes
> 100% of one of the CPU's.  Looking at it through JVisualVM, there are over
> 8 million invocations of isBPServiceAlive in a matter of a minute or so and
>  continually incrementing at a steady clip.   A screenshot of the JvisualVM
> cpu profile - showing just shy of 8M invocations is attached.
>
> What kind of configuration error could lead to this?  The conf/masters and
> conf/slaves simply say localhost.   If need be I'll copy the *-site.xml's.
>  They are boilerplate from the Cloudera page by Ahmed Radwan.
>
>
>
>
>