You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Stephen Boesch <ja...@gmail.com> on 2011/11/29 15:43:41 UTC
MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times
per second
I am just trying to get off the ground with MRv2. The first node (in
pseudo distributed mode) is working fine - ran a couple of TeraSort's on
it.
The second node has a serious issue with its single DataNode: it consumes
100% of one of the CPU's. Looking at it through JVisualVM, there are over
8 million invocations of isBPServiceAlive in a matter of a minute or so and
continually incrementing at a steady clip. A screenshot of the JvisualVM
cpu profile - showing just shy of 8M invocations is attached.
What kind of configuration error could lead to this? The conf/masters and
conf/slaves simply say localhost. If need be I'll copy the *-site.xml's.
They are boilerplate from the Cloudera page by Ahmed Radwan.
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K
times per second
Posted by Stephen Boesch <ja...@gmail.com>.
The problem seems to have gone away, but I can not offer a solid
explanation. At some point after having removed the working directories
for the datanode and reformatted the namenode and restarted the cluster,
this issue does not manifest anymore. However, I had already done those
same steps well before posting these issues, so it is not clear what small
detail that I had done was different this time. if this problem were to
recur I would not be able to precisely prescribe a solution.
2011/11/29 Stephen Boesch <ja...@gmail.com>
> I verified the DN was down via both jps and java. Anyways, it was enough
> to see via "top" since as mentioned DN was consuming 100% of one cpu when
> running.
>
>
> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>
>> Hi Uma,
>> I mentioned that I have restarted the datanode *many *times, and in
>> fact the entire cluster more than ten times.
>>
>>
>> 2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>
>>
>>> Looks you are getting HDFS-2553.
>>>
>>> The cause might be that, you cleared the datadirectories directly
>>> without DN restart. Workaround would be to restart DNs.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Uma
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Stephen Boesch [javadba@gmail.com]
>>> *Sent:* Tuesday, November 29, 2011 8:53 PM
>>> *To:* mapreduce-user@hadoop.apache.org
>>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
>>> 200K times per second
>>>
>>> Update on this: I've shut down all the servers multiple times. Also
>>> cleared the data directories and reformatted the namenode. Restarted it and
>>> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>>>
>>>
>>> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>>>
>>>> I am just trying to get off the ground with MRv2. The first node (in
>>>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on
>>>> it.
>>>>
>>>> The second node has a serious issue with its single DataNode: it
>>>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there
>>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>>>> or so and continually incrementing at a steady clip. A screenshot of the
>>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>>>
>>>> What kind of configuration error could lead to this? The
>>>> conf/masters and conf/slaves simply say localhost. If need be I'll copy
>>>> the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed
>>>> Radwan.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K
times per second
Posted by Stephen Boesch <ja...@gmail.com>.
I verified the DN was down via both jps and java. Anyways, it was enough
to see via "top" since as mentioned DN was consuming 100% of one cpu when
running.
2011/11/29 Stephen Boesch <ja...@gmail.com>
> Hi Uma,
> I mentioned that I have restarted the datanode *many *times, and in
> fact the entire cluster more than ten times.
>
>
> 2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>
>
>> Looks you are getting HDFS-2553.
>>
>> The cause might be that, you cleared the datadirectories directly without
>> DN restart. Workaround would be to restart DNs.
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>> ------------------------------
>>
>> *From:* Stephen Boesch [javadba@gmail.com]
>> *Sent:* Tuesday, November 29, 2011 8:53 PM
>> *To:* mapreduce-user@hadoop.apache.org
>> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
>> 200K times per second
>>
>> Update on this: I've shut down all the servers multiple times. Also
>> cleared the data directories and reformatted the namenode. Restarted it and
>> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>>
>>
>> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>>
>>> I am just trying to get off the ground with MRv2. The first node (in
>>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on
>>> it.
>>>
>>> The second node has a serious issue with its single DataNode: it
>>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there
>>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>>> or so and continually incrementing at a steady clip. A screenshot of the
>>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>>
>>> What kind of configuration error could lead to this? The conf/masters
>>> and conf/slaves simply say localhost. If need be I'll copy the
>>> *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan.
>>>
>>>
>>>
>>>
>>>
>>
>
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K
times per second
Posted by Stephen Boesch <ja...@gmail.com>.
Hi Uma,
I mentioned that I have restarted the datanode *many *times, and in fact
the entire cluster more than ten times.
2011/11/29 Uma Maheswara Rao G <ma...@huawei.com>
> Looks you are getting HDFS-2553.
>
> The cause might be that, you cleared the datadirectories directly without
> DN restart. Workaround would be to restart DNs.
>
>
>
> Regards,
>
> Uma
>
>
>
> ------------------------------
>
> *From:* Stephen Boesch [javadba@gmail.com]
> *Sent:* Tuesday, November 29, 2011 8:53 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Re: MRv2 DataNode problem: isBPServiceAlive invoked order of
> 200K times per second
>
> Update on this: I've shut down all the servers multiple times. Also
> cleared the data directories and reformatted the namenode. Restarted it and
> the same results: 100% cpu and millions of these calls to isBPServiceAlive.
>
>
> 2011/11/29 Stephen Boesch <ja...@gmail.com>
>
>> I am just trying to get off the ground with MRv2. The first node (in
>> pseudo distributed mode) is working fine - ran a couple of TeraSort's on
>> it.
>>
>> The second node has a serious issue with its single DataNode: it
>> consumes 100% of one of the CPU's. Looking at it through JVisualVM, there
>> are over 8 million invocations of isBPServiceAlive in a matter of a minute
>> or so and continually incrementing at a steady clip. A screenshot of the
>> JvisualVM cpu profile - showing just shy of 8M invocations is attached.
>>
>> What kind of configuration error could lead to this? The conf/masters
>> and conf/slaves simply say localhost. If need be I'll copy the
>> *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan.
>>
>>
>>
>>
>>
>
RE: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K
times per second
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Looks you are getting HDFS-2553.
The cause might be that, you cleared the datadirectories directly without DN restart. Workaround would be to restart DNs.
Regards,
Uma
________________________________
From: Stephen Boesch [javadba@gmail.com]
Sent: Tuesday, November 29, 2011 8:53 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K times per second
Update on this: I've shut down all the servers multiple times. Also cleared the data directories and reformatted the namenode. Restarted it and the same results: 100% cpu and millions of these calls to isBPServiceAlive.
2011/11/29 Stephen Boesch <ja...@gmail.com>>
I am just trying to get off the ground with MRv2. The first node (in pseudo distributed mode) is working fine - ran a couple of TeraSort's on it.
The second node has a serious issue with its single DataNode: it consumes 100% of one of the CPU's. Looking at it through JVisualVM, there are over 8 million invocations of isBPServiceAlive in a matter of a minute or so and continually incrementing at a steady clip. A screenshot of the JvisualVM cpu profile - showing just shy of 8M invocations is attached.
What kind of configuration error could lead to this? The conf/masters and conf/slaves simply say localhost. If need be I'll copy the *-site.xml's. They are boilerplate from the Cloudera page by Ahmed Radwan.
Re: MRv2 DataNode problem: isBPServiceAlive invoked order of 200K
times per second
Posted by Stephen Boesch <ja...@gmail.com>.
Update on this: I've shut down all the servers multiple times. Also
cleared the data directories and reformatted the namenode. Restarted it and
the same results: 100% cpu and millions of these calls to isBPServiceAlive.
2011/11/29 Stephen Boesch <ja...@gmail.com>
> I am just trying to get off the ground with MRv2. The first node (in
> pseudo distributed mode) is working fine - ran a couple of TeraSort's on
> it.
>
> The second node has a serious issue with its single DataNode: it consumes
> 100% of one of the CPU's. Looking at it through JVisualVM, there are over
> 8 million invocations of isBPServiceAlive in a matter of a minute or so and
> continually incrementing at a steady clip. A screenshot of the JvisualVM
> cpu profile - showing just shy of 8M invocations is attached.
>
> What kind of configuration error could lead to this? The conf/masters and
> conf/slaves simply say localhost. If need be I'll copy the *-site.xml's.
> They are boilerplate from the Cloudera page by Ahmed Radwan.
>
>
>
>
>