You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2014/10/20 04:56:04 UTC

Can add a regular check in DataNode on free disk space?

Hi Experts and Developers,

At present, if a DataNode does not has free disk space, we can not get this
bad situation from anywhere, including DataNode log. At the same time,
under this situation, the hdfs writing operation will fail and return error
msg as below. However, from the error msg, user could not know the root
cause is that the only datanode runs out of disk space, and he also could
not get any useful hint in datanode log. So I believe it will be better if
we could add a regular check in DataNode on free disk space, and it will
add WARNING or ERROR msg in datanode log if that datanode runs out of
space. What's your opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
and no node(s) are excluded in this operation.
        at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)


Thanks!

Re: Can add a regular check in DataNode on free disk space?

Posted by Nitin Pawar <ni...@gmail.com>.
Hi Sam,

Monitoring disks and other server related activities can be easily handled
by Nagios

On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble <Dh...@sandisk.com>
wrote:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>


-- 
Nitin Pawar

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Aitor,

Actually I did so in my test. But the issue is that I did not find disk
full info in any log.

2014-10-20 4:00 GMT-07:00 Aitor Cedres <ac...@pivotal.io>:

>
> Hi Sam,
>
> You can set the property "dfs.datanode.du.reserved" to reserve some space
> for non-DFS use. By doing that, Hadoop daemons will keep writing to log
> files, and it will help you diagnose the issue.
>
> Hope it helps.
>
> Regards,
> Aitor
>
> On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:
>
>> Hi Dhiraj,
>>
>> My cluster only includes 1 datanode and its log does not include any
>> warning/error messages for the out of free disk space. That wastes some of
>> my time to find the root cause.
>>
>> Also I did not find any free disk checking code in DataNode.java. So it
>> will be better if the datanode could check the free disk frequently and
>> write the warning/error info into its log.
>>
>>
>> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>>
>>  Formatting NameNode will cause data loss – in effect you will lose all
>>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>>> have no idea where your data(files) are stored. I don’t think that’s what
>>> you’re looking for.
>>>
>>> I am wondering why isn’t there any log information on DataNode for disk
>>> full. What version of Hadoop are you using and what’s your configuration(
>>> Single Node, Single Node Pseudo Distributed or Cluster)
>>>
>>>
>>>
>>> Regards,
>>>
>>> Dhiraj
>>>
>>>
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* Monday, October 20, 2014 11:51 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>>
>>>
>>>
>>> Hi unmesha,
>>>
>>> Thanks for your response, but I am not clear what effect will the hadoop
>>> cluster has after applying above operations. Could you pls give more
>>> explanations?
>>>
>>>
>>>
>>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>>
>>> 1. Stop all Hadoop daemons
>>>
>>> 2. Remove all files from
>>>
>>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>>
>>> 3. Format namenode
>>>
>>> 4. Start all Hadoop daemons.
>>>
>>>
>>>
>>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Experts and Developers,
>>>
>>> At present, if a DataNode does not has free disk space, we can not get
>>> this bad situation from anywhere, including DataNode log. At the same time,
>>> under this situation, the hdfs writing operation will fail and return error
>>> msg as below. However, from the error msg, user could not know the root
>>> cause is that the only datanode runs out of disk space, and he also could
>>> not get any useful hint in datanode log. So I believe it will be better if
>>> we could add a regular check in DataNode on free disk space, and it will
>>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>>> space. What's your opinion?
>>>
>>> Error Msg:
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>>> and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>
>>>  Thanks!
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Thanks & Regards *
>>>
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>>
>>> *Hadoop, Bigdata Developer*
>>>
>>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message
>>> is intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Aitor,

Actually I did so in my test. But the issue is that I did not find disk
full info in any log.

2014-10-20 4:00 GMT-07:00 Aitor Cedres <ac...@pivotal.io>:

>
> Hi Sam,
>
> You can set the property "dfs.datanode.du.reserved" to reserve some space
> for non-DFS use. By doing that, Hadoop daemons will keep writing to log
> files, and it will help you diagnose the issue.
>
> Hope it helps.
>
> Regards,
> Aitor
>
> On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:
>
>> Hi Dhiraj,
>>
>> My cluster only includes 1 datanode and its log does not include any
>> warning/error messages for the out of free disk space. That wastes some of
>> my time to find the root cause.
>>
>> Also I did not find any free disk checking code in DataNode.java. So it
>> will be better if the datanode could check the free disk frequently and
>> write the warning/error info into its log.
>>
>>
>> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>>
>>  Formatting NameNode will cause data loss – in effect you will lose all
>>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>>> have no idea where your data(files) are stored. I don’t think that’s what
>>> you’re looking for.
>>>
>>> I am wondering why isn’t there any log information on DataNode for disk
>>> full. What version of Hadoop are you using and what’s your configuration(
>>> Single Node, Single Node Pseudo Distributed or Cluster)
>>>
>>>
>>>
>>> Regards,
>>>
>>> Dhiraj
>>>
>>>
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* Monday, October 20, 2014 11:51 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>>
>>>
>>>
>>> Hi unmesha,
>>>
>>> Thanks for your response, but I am not clear what effect will the hadoop
>>> cluster has after applying above operations. Could you pls give more
>>> explanations?
>>>
>>>
>>>
>>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>>
>>> 1. Stop all Hadoop daemons
>>>
>>> 2. Remove all files from
>>>
>>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>>
>>> 3. Format namenode
>>>
>>> 4. Start all Hadoop daemons.
>>>
>>>
>>>
>>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Experts and Developers,
>>>
>>> At present, if a DataNode does not has free disk space, we can not get
>>> this bad situation from anywhere, including DataNode log. At the same time,
>>> under this situation, the hdfs writing operation will fail and return error
>>> msg as below. However, from the error msg, user could not know the root
>>> cause is that the only datanode runs out of disk space, and he also could
>>> not get any useful hint in datanode log. So I believe it will be better if
>>> we could add a regular check in DataNode on free disk space, and it will
>>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>>> space. What's your opinion?
>>>
>>> Error Msg:
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>>> and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>
>>>  Thanks!
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Thanks & Regards *
>>>
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>>
>>> *Hadoop, Bigdata Developer*
>>>
>>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message
>>> is intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Aitor,

Actually I did so in my test. But the issue is that I did not find disk
full info in any log.

2014-10-20 4:00 GMT-07:00 Aitor Cedres <ac...@pivotal.io>:

>
> Hi Sam,
>
> You can set the property "dfs.datanode.du.reserved" to reserve some space
> for non-DFS use. By doing that, Hadoop daemons will keep writing to log
> files, and it will help you diagnose the issue.
>
> Hope it helps.
>
> Regards,
> Aitor
>
> On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:
>
>> Hi Dhiraj,
>>
>> My cluster only includes 1 datanode and its log does not include any
>> warning/error messages for the out of free disk space. That wastes some of
>> my time to find the root cause.
>>
>> Also I did not find any free disk checking code in DataNode.java. So it
>> will be better if the datanode could check the free disk frequently and
>> write the warning/error info into its log.
>>
>>
>> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>>
>>  Formatting NameNode will cause data loss – in effect you will lose all
>>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>>> have no idea where your data(files) are stored. I don’t think that’s what
>>> you’re looking for.
>>>
>>> I am wondering why isn’t there any log information on DataNode for disk
>>> full. What version of Hadoop are you using and what’s your configuration(
>>> Single Node, Single Node Pseudo Distributed or Cluster)
>>>
>>>
>>>
>>> Regards,
>>>
>>> Dhiraj
>>>
>>>
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* Monday, October 20, 2014 11:51 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>>
>>>
>>>
>>> Hi unmesha,
>>>
>>> Thanks for your response, but I am not clear what effect will the hadoop
>>> cluster has after applying above operations. Could you pls give more
>>> explanations?
>>>
>>>
>>>
>>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>>
>>> 1. Stop all Hadoop daemons
>>>
>>> 2. Remove all files from
>>>
>>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>>
>>> 3. Format namenode
>>>
>>> 4. Start all Hadoop daemons.
>>>
>>>
>>>
>>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Experts and Developers,
>>>
>>> At present, if a DataNode does not has free disk space, we can not get
>>> this bad situation from anywhere, including DataNode log. At the same time,
>>> under this situation, the hdfs writing operation will fail and return error
>>> msg as below. However, from the error msg, user could not know the root
>>> cause is that the only datanode runs out of disk space, and he also could
>>> not get any useful hint in datanode log. So I believe it will be better if
>>> we could add a regular check in DataNode on free disk space, and it will
>>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>>> space. What's your opinion?
>>>
>>> Error Msg:
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>>> and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>
>>>  Thanks!
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Thanks & Regards *
>>>
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>>
>>> *Hadoop, Bigdata Developer*
>>>
>>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message
>>> is intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Aitor,

Actually I did so in my test. But the issue is that I did not find disk
full info in any log.

2014-10-20 4:00 GMT-07:00 Aitor Cedres <ac...@pivotal.io>:

>
> Hi Sam,
>
> You can set the property "dfs.datanode.du.reserved" to reserve some space
> for non-DFS use. By doing that, Hadoop daemons will keep writing to log
> files, and it will help you diagnose the issue.
>
> Hope it helps.
>
> Regards,
> Aitor
>
> On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:
>
>> Hi Dhiraj,
>>
>> My cluster only includes 1 datanode and its log does not include any
>> warning/error messages for the out of free disk space. That wastes some of
>> my time to find the root cause.
>>
>> Also I did not find any free disk checking code in DataNode.java. So it
>> will be better if the datanode could check the free disk frequently and
>> write the warning/error info into its log.
>>
>>
>> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>>
>>  Formatting NameNode will cause data loss – in effect you will lose all
>>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>>> have no idea where your data(files) are stored. I don’t think that’s what
>>> you’re looking for.
>>>
>>> I am wondering why isn’t there any log information on DataNode for disk
>>> full. What version of Hadoop are you using and what’s your configuration(
>>> Single Node, Single Node Pseudo Distributed or Cluster)
>>>
>>>
>>>
>>> Regards,
>>>
>>> Dhiraj
>>>
>>>
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* Monday, October 20, 2014 11:51 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>>
>>>
>>>
>>> Hi unmesha,
>>>
>>> Thanks for your response, but I am not clear what effect will the hadoop
>>> cluster has after applying above operations. Could you pls give more
>>> explanations?
>>>
>>>
>>>
>>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>>
>>> 1. Stop all Hadoop daemons
>>>
>>> 2. Remove all files from
>>>
>>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>>
>>> 3. Format namenode
>>>
>>> 4. Start all Hadoop daemons.
>>>
>>>
>>>
>>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Experts and Developers,
>>>
>>> At present, if a DataNode does not has free disk space, we can not get
>>> this bad situation from anywhere, including DataNode log. At the same time,
>>> under this situation, the hdfs writing operation will fail and return error
>>> msg as below. However, from the error msg, user could not know the root
>>> cause is that the only datanode runs out of disk space, and he also could
>>> not get any useful hint in datanode log. So I believe it will be better if
>>> we could add a regular check in DataNode on free disk space, and it will
>>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>>> space. What's your opinion?
>>>
>>> Error Msg:
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>>> and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>
>>>  Thanks!
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Thanks & Regards *
>>>
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>>
>>> *Hadoop, Bigdata Developer*
>>>
>>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message
>>> is intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Sam,

You can set the property "dfs.datanode.du.reserved" to reserve some space
for non-DFS use. By doing that, Hadoop daemons will keep writing to log
files, and it will help you diagnose the issue.

Hope it helps.

Regards,
Aitor

On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:

> Hi Dhiraj,
>
> My cluster only includes 1 datanode and its log does not include any
> warning/error messages for the out of free disk space. That wastes some of
> my time to find the root cause.
>
> Also I did not find any free disk checking code in DataNode.java. So it
> will be better if the datanode could check the free disk frequently and
> write the warning/error info into its log.
>
>
> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>
>  Formatting NameNode will cause data loss – in effect you will lose all
>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>> have no idea where your data(files) are stored. I don’t think that’s what
>> you’re looking for.
>>
>> I am wondering why isn’t there any log information on DataNode for disk
>> full. What version of Hadoop are you using and what’s your configuration(
>> Single Node, Single Node Pseudo Distributed or Cluster)
>>
>>
>>
>> Regards,
>>
>> Dhiraj
>>
>>
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* Monday, October 20, 2014 11:51 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>
>>
>>
>> Hi unmesha,
>>
>> Thanks for your response, but I am not clear what effect will the hadoop
>> cluster has after applying above operations. Could you pls give more
>> explanations?
>>
>>
>>
>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>
>> 1. Stop all Hadoop daemons
>>
>> 2. Remove all files from
>>
>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>
>> 3. Format namenode
>>
>> 4. Start all Hadoop daemons.
>>
>>
>>
>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>  Thanks!
>>
>>
>>
>>
>>
>> --
>>
>> *Thanks & Regards *
>>
>>
>>
>> *Unmesha Sreeveni U.B*
>>
>> *Hadoop, Bigdata Developer*
>>
>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If
>> the reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please notify
>> the sender by telephone or e-mail (as shown above) immediately and destroy
>> any and all copies of this message in your possession (whether hard copies
>> or electronically stored copies).
>>
>>
>

AW: Best number of mappers and reducers when processing data to and from HBase?

Posted by "Kleegrewe, Christian" <ch...@siemens.com>.
Hallo Rolf,

in der letzten Oktober Woche, aber ich muss glaub ich nicht die ganze Zeit dabei sein.

Mit freundlichen Grüßen
Christian Kleegrewe

Siemens AG
Corporate Technology
Research and Technology Center
CT RTC BAM KMR-DE
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 89 636-633785
mailto:christian.kleegrewe@siemens.com

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Lisa Davis, Klaus Helmrich, Hermann Requardt, Siegfried Russwurm, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-----Ursprüngliche Nachricht-----
Von: peterm_second [mailto:regestrer@gmail.com] 
Gesendet: Montag, 20. Oktober 2014 16:09
An: user@hadoop.apache.org
Betreff: Best number of mappers and reducers when processing data to and from HBase?

Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase and I was wondering how am I to know what's the best mapper and reducer count, I mean what are the criteria that need to be taken into consideration when determining the mapper and reducer counts. My MR job is reeding data from a Hbase table, said data is processed in the mapper and the reducer takes the data and outputs some stuff to another Hbase table. I want to be able to dinamicly deduce what's the correct number of mappers to initially process the data (actually map it to a specific criterion ) and the reducers to later do some other magic on it and output a new dataset which then saved to a new Hbase Table. I've read that when reading data from files I should have something like 10 mappers per DFS block, but I have no clue how to translate that in my case where the input is a HBase table. Any ideas would be appreciated, even if it's a book or an article I should read.

Regards,
Peter

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by nqkoi nqkoev <re...@gmail.com>.
Yes, it's effectively reading in the mapper and writing in the reducer. The
mapper is doing more than just reading the data, but as per my initial
tests the average map function time is around 1ms to 3ms so it's not a big
problem. The reducer is a bit slower however but it's still pretty fast. I
am trying to optmize the memory consumption and the speed of the mr job. I
don't want to just randomly change settings, if you guys can give me  a
hint on what should I read, that be great.

Thanks,
Peter

On Mon, Oct 20, 2014 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:

> For number of mappers, take a look at the following
> in TableInputFormatBase:
>
>   public List<InputSplit> getSplits(JobContext context) throws
> IOException {
>
> Is reducer required in your model ?
>
> Can you write to second hbase table from the mappers ?
>
>
> Cheers
>
> On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com>
> wrote:
>
>> Hi Guys,
>> I have a somewhat abstract question to ask. I am reading data from Hbase
>> and I was wondering how am I to know what's the best mapper and reducer
>> count, I mean what are the criteria that need to be taken into
>> consideration when determining the mapper and reducer counts. My MR job is
>> reeding data from a Hbase table, said data is processed in the mapper and
>> the reducer takes the data and outputs some stuff to another Hbase table. I
>> want to be able to dinamicly deduce what's the correct number of mappers to
>> initially process the data (actually map it to a specific criterion ) and
>> the reducers to later do some other magic on it and output a new dataset
>> which then saved to a new Hbase Table. I've read that when reading data
>> from files I should have something like 10 mappers per DFS block, but I
>> have no clue how to translate that in my case where the input is a HBase
>> table. Any ideas would be appreciated, even if it's a book or an article I
>> should read.
>>
>> Regards,
>> Peter
>>
>
>

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by nqkoi nqkoev <re...@gmail.com>.
Yes, it's effectively reading in the mapper and writing in the reducer. The
mapper is doing more than just reading the data, but as per my initial
tests the average map function time is around 1ms to 3ms so it's not a big
problem. The reducer is a bit slower however but it's still pretty fast. I
am trying to optmize the memory consumption and the speed of the mr job. I
don't want to just randomly change settings, if you guys can give me  a
hint on what should I read, that be great.

Thanks,
Peter

On Mon, Oct 20, 2014 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:

> For number of mappers, take a look at the following
> in TableInputFormatBase:
>
>   public List<InputSplit> getSplits(JobContext context) throws
> IOException {
>
> Is reducer required in your model ?
>
> Can you write to second hbase table from the mappers ?
>
>
> Cheers
>
> On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com>
> wrote:
>
>> Hi Guys,
>> I have a somewhat abstract question to ask. I am reading data from Hbase
>> and I was wondering how am I to know what's the best mapper and reducer
>> count, I mean what are the criteria that need to be taken into
>> consideration when determining the mapper and reducer counts. My MR job is
>> reeding data from a Hbase table, said data is processed in the mapper and
>> the reducer takes the data and outputs some stuff to another Hbase table. I
>> want to be able to dinamicly deduce what's the correct number of mappers to
>> initially process the data (actually map it to a specific criterion ) and
>> the reducers to later do some other magic on it and output a new dataset
>> which then saved to a new Hbase Table. I've read that when reading data
>> from files I should have something like 10 mappers per DFS block, but I
>> have no clue how to translate that in my case where the input is a HBase
>> table. Any ideas would be appreciated, even if it's a book or an article I
>> should read.
>>
>> Regards,
>> Peter
>>
>
>

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by nqkoi nqkoev <re...@gmail.com>.
Yes, it's effectively reading in the mapper and writing in the reducer. The
mapper is doing more than just reading the data, but as per my initial
tests the average map function time is around 1ms to 3ms so it's not a big
problem. The reducer is a bit slower however but it's still pretty fast. I
am trying to optmize the memory consumption and the speed of the mr job. I
don't want to just randomly change settings, if you guys can give me  a
hint on what should I read, that be great.

Thanks,
Peter

On Mon, Oct 20, 2014 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:

> For number of mappers, take a look at the following
> in TableInputFormatBase:
>
>   public List<InputSplit> getSplits(JobContext context) throws
> IOException {
>
> Is reducer required in your model ?
>
> Can you write to second hbase table from the mappers ?
>
>
> Cheers
>
> On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com>
> wrote:
>
>> Hi Guys,
>> I have a somewhat abstract question to ask. I am reading data from Hbase
>> and I was wondering how am I to know what's the best mapper and reducer
>> count, I mean what are the criteria that need to be taken into
>> consideration when determining the mapper and reducer counts. My MR job is
>> reeding data from a Hbase table, said data is processed in the mapper and
>> the reducer takes the data and outputs some stuff to another Hbase table. I
>> want to be able to dinamicly deduce what's the correct number of mappers to
>> initially process the data (actually map it to a specific criterion ) and
>> the reducers to later do some other magic on it and output a new dataset
>> which then saved to a new Hbase Table. I've read that when reading data
>> from files I should have something like 10 mappers per DFS block, but I
>> have no clue how to translate that in my case where the input is a HBase
>> table. Any ideas would be appreciated, even if it's a book or an article I
>> should read.
>>
>> Regards,
>> Peter
>>
>
>

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by nqkoi nqkoev <re...@gmail.com>.
Yes, it's effectively reading in the mapper and writing in the reducer. The
mapper is doing more than just reading the data, but as per my initial
tests the average map function time is around 1ms to 3ms so it's not a big
problem. The reducer is a bit slower however but it's still pretty fast. I
am trying to optmize the memory consumption and the speed of the mr job. I
don't want to just randomly change settings, if you guys can give me  a
hint on what should I read, that be great.

Thanks,
Peter

On Mon, Oct 20, 2014 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:

> For number of mappers, take a look at the following
> in TableInputFormatBase:
>
>   public List<InputSplit> getSplits(JobContext context) throws
> IOException {
>
> Is reducer required in your model ?
>
> Can you write to second hbase table from the mappers ?
>
>
> Cheers
>
> On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com>
> wrote:
>
>> Hi Guys,
>> I have a somewhat abstract question to ask. I am reading data from Hbase
>> and I was wondering how am I to know what's the best mapper and reducer
>> count, I mean what are the criteria that need to be taken into
>> consideration when determining the mapper and reducer counts. My MR job is
>> reeding data from a Hbase table, said data is processed in the mapper and
>> the reducer takes the data and outputs some stuff to another Hbase table. I
>> want to be able to dinamicly deduce what's the correct number of mappers to
>> initially process the data (actually map it to a specific criterion ) and
>> the reducers to later do some other magic on it and output a new dataset
>> which then saved to a new Hbase Table. I've read that when reading data
>> from files I should have something like 10 mappers per DFS block, but I
>> have no clue how to translate that in my case where the input is a HBase
>> table. Any ideas would be appreciated, even if it's a book or an article I
>> should read.
>>
>> Regards,
>> Peter
>>
>
>

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by Ted Yu <yu...@gmail.com>.
For number of mappers, take a look at the following in TableInputFormatBase:

  public List<InputSplit> getSplits(JobContext context) throws IOException {

Is reducer required in your model ?

Can you write to second hbase table from the mappers ?


Cheers

On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com> wrote:

> Hi Guys,
> I have a somewhat abstract question to ask. I am reading data from Hbase
> and I was wondering how am I to know what's the best mapper and reducer
> count, I mean what are the criteria that need to be taken into
> consideration when determining the mapper and reducer counts. My MR job is
> reeding data from a Hbase table, said data is processed in the mapper and
> the reducer takes the data and outputs some stuff to another Hbase table. I
> want to be able to dinamicly deduce what's the correct number of mappers to
> initially process the data (actually map it to a specific criterion ) and
> the reducers to later do some other magic on it and output a new dataset
> which then saved to a new Hbase Table. I've read that when reading data
> from files I should have something like 10 mappers per DFS block, but I
> have no clue how to translate that in my case where the input is a HBase
> table. Any ideas would be appreciated, even if it's a book or an article I
> should read.
>
> Regards,
> Peter
>

AW: Best number of mappers and reducers when processing data to and from HBase?

Posted by "Kleegrewe, Christian" <ch...@siemens.com>.
Hallo Rolf,

in der letzten Oktober Woche, aber ich muss glaub ich nicht die ganze Zeit dabei sein.

Mit freundlichen Grüßen
Christian Kleegrewe

Siemens AG
Corporate Technology
Research and Technology Center
CT RTC BAM KMR-DE
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 89 636-633785
mailto:christian.kleegrewe@siemens.com

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Lisa Davis, Klaus Helmrich, Hermann Requardt, Siegfried Russwurm, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-----Ursprüngliche Nachricht-----
Von: peterm_second [mailto:regestrer@gmail.com] 
Gesendet: Montag, 20. Oktober 2014 16:09
An: user@hadoop.apache.org
Betreff: Best number of mappers and reducers when processing data to and from HBase?

Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase and I was wondering how am I to know what's the best mapper and reducer count, I mean what are the criteria that need to be taken into consideration when determining the mapper and reducer counts. My MR job is reeding data from a Hbase table, said data is processed in the mapper and the reducer takes the data and outputs some stuff to another Hbase table. I want to be able to dinamicly deduce what's the correct number of mappers to initially process the data (actually map it to a specific criterion ) and the reducers to later do some other magic on it and output a new dataset which then saved to a new Hbase Table. I've read that when reading data from files I should have something like 10 mappers per DFS block, but I have no clue how to translate that in my case where the input is a HBase table. Any ideas would be appreciated, even if it's a book or an article I should read.

Regards,
Peter

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by Ted Yu <yu...@gmail.com>.
For number of mappers, take a look at the following in TableInputFormatBase:

  public List<InputSplit> getSplits(JobContext context) throws IOException {

Is reducer required in your model ?

Can you write to second hbase table from the mappers ?


Cheers

On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com> wrote:

> Hi Guys,
> I have a somewhat abstract question to ask. I am reading data from Hbase
> and I was wondering how am I to know what's the best mapper and reducer
> count, I mean what are the criteria that need to be taken into
> consideration when determining the mapper and reducer counts. My MR job is
> reeding data from a Hbase table, said data is processed in the mapper and
> the reducer takes the data and outputs some stuff to another Hbase table. I
> want to be able to dinamicly deduce what's the correct number of mappers to
> initially process the data (actually map it to a specific criterion ) and
> the reducers to later do some other magic on it and output a new dataset
> which then saved to a new Hbase Table. I've read that when reading data
> from files I should have something like 10 mappers per DFS block, but I
> have no clue how to translate that in my case where the input is a HBase
> table. Any ideas would be appreciated, even if it's a book or an article I
> should read.
>
> Regards,
> Peter
>

AW: Best number of mappers and reducers when processing data to and from HBase?

Posted by "Kleegrewe, Christian" <ch...@siemens.com>.
Hallo Rolf,

in der letzten Oktober Woche, aber ich muss glaub ich nicht die ganze Zeit dabei sein.

Mit freundlichen Grüßen
Christian Kleegrewe

Siemens AG
Corporate Technology
Research and Technology Center
CT RTC BAM KMR-DE
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 89 636-633785
mailto:christian.kleegrewe@siemens.com

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Lisa Davis, Klaus Helmrich, Hermann Requardt, Siegfried Russwurm, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-----Ursprüngliche Nachricht-----
Von: peterm_second [mailto:regestrer@gmail.com] 
Gesendet: Montag, 20. Oktober 2014 16:09
An: user@hadoop.apache.org
Betreff: Best number of mappers and reducers when processing data to and from HBase?

Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase and I was wondering how am I to know what's the best mapper and reducer count, I mean what are the criteria that need to be taken into consideration when determining the mapper and reducer counts. My MR job is reeding data from a Hbase table, said data is processed in the mapper and the reducer takes the data and outputs some stuff to another Hbase table. I want to be able to dinamicly deduce what's the correct number of mappers to initially process the data (actually map it to a specific criterion ) and the reducers to later do some other magic on it and output a new dataset which then saved to a new Hbase Table. I've read that when reading data from files I should have something like 10 mappers per DFS block, but I have no clue how to translate that in my case where the input is a HBase table. Any ideas would be appreciated, even if it's a book or an article I should read.

Regards,
Peter

AW: Best number of mappers and reducers when processing data to and from HBase?

Posted by "Kleegrewe, Christian" <ch...@siemens.com>.
Hallo Rolf,

in der letzten Oktober Woche, aber ich muss glaub ich nicht die ganze Zeit dabei sein.

Mit freundlichen Grüßen
Christian Kleegrewe

Siemens AG
Corporate Technology
Research and Technology Center
CT RTC BAM KMR-DE
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 89 636-633785
mailto:christian.kleegrewe@siemens.com

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Lisa Davis, Klaus Helmrich, Hermann Requardt, Siegfried Russwurm, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-----Ursprüngliche Nachricht-----
Von: peterm_second [mailto:regestrer@gmail.com] 
Gesendet: Montag, 20. Oktober 2014 16:09
An: user@hadoop.apache.org
Betreff: Best number of mappers and reducers when processing data to and from HBase?

Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase and I was wondering how am I to know what's the best mapper and reducer count, I mean what are the criteria that need to be taken into consideration when determining the mapper and reducer counts. My MR job is reeding data from a Hbase table, said data is processed in the mapper and the reducer takes the data and outputs some stuff to another Hbase table. I want to be able to dinamicly deduce what's the correct number of mappers to initially process the data (actually map it to a specific criterion ) and the reducers to later do some other magic on it and output a new dataset which then saved to a new Hbase Table. I've read that when reading data from files I should have something like 10 mappers per DFS block, but I have no clue how to translate that in my case where the input is a HBase table. Any ideas would be appreciated, even if it's a book or an article I should read.

Regards,
Peter

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by Ted Yu <yu...@gmail.com>.
For number of mappers, take a look at the following in TableInputFormatBase:

  public List<InputSplit> getSplits(JobContext context) throws IOException {

Is reducer required in your model ?

Can you write to second hbase table from the mappers ?


Cheers

On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com> wrote:

> Hi Guys,
> I have a somewhat abstract question to ask. I am reading data from Hbase
> and I was wondering how am I to know what's the best mapper and reducer
> count, I mean what are the criteria that need to be taken into
> consideration when determining the mapper and reducer counts. My MR job is
> reeding data from a Hbase table, said data is processed in the mapper and
> the reducer takes the data and outputs some stuff to another Hbase table. I
> want to be able to dinamicly deduce what's the correct number of mappers to
> initially process the data (actually map it to a specific criterion ) and
> the reducers to later do some other magic on it and output a new dataset
> which then saved to a new Hbase Table. I've read that when reading data
> from files I should have something like 10 mappers per DFS block, but I
> have no clue how to translate that in my case where the input is a HBase
> table. Any ideas would be appreciated, even if it's a book or an article I
> should read.
>
> Regards,
> Peter
>

Re: Best number of mappers and reducers when processing data to and from HBase?

Posted by Ted Yu <yu...@gmail.com>.
For number of mappers, take a look at the following in TableInputFormatBase:

  public List<InputSplit> getSplits(JobContext context) throws IOException {

Is reducer required in your model ?

Can you write to second hbase table from the mappers ?


Cheers

On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <re...@gmail.com> wrote:

> Hi Guys,
> I have a somewhat abstract question to ask. I am reading data from Hbase
> and I was wondering how am I to know what's the best mapper and reducer
> count, I mean what are the criteria that need to be taken into
> consideration when determining the mapper and reducer counts. My MR job is
> reeding data from a Hbase table, said data is processed in the mapper and
> the reducer takes the data and outputs some stuff to another Hbase table. I
> want to be able to dinamicly deduce what's the correct number of mappers to
> initially process the data (actually map it to a specific criterion ) and
> the reducers to later do some other magic on it and output a new dataset
> which then saved to a new Hbase Table. I've read that when reading data
> from files I should have something like 10 mappers per DFS block, but I
> have no clue how to translate that in my case where the input is a HBase
> table. Any ideas would be appreciated, even if it's a book or an article I
> should read.
>
> Regards,
> Peter
>

Best number of mappers and reducers when processing data to and from HBase?

Posted by peterm_second <re...@gmail.com>.
Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase 
and I was wondering how am I to know what's the best mapper and reducer 
count, I mean what are the criteria that need to be taken into 
consideration when determining the mapper and reducer counts. My MR job 
is reeding data from a Hbase table, said data is processed in the mapper 
and the reducer takes the data and outputs some stuff to another Hbase 
table. I want to be able to dinamicly deduce what's the correct number 
of mappers to initially process the data (actually map it to a specific 
criterion ) and the reducers to later do some other magic on it and 
output a new dataset which then saved to a new Hbase Table. I've read 
that when reading data from files I should have something like 10 
mappers per DFS block, but I have no clue how to translate that in my 
case where the input is a HBase table. Any ideas would be appreciated, 
even if it's a book or an article I should read.

Regards,
Peter

Best number of mappers and reducers when processing data to and from HBase?

Posted by peterm_second <re...@gmail.com>.
Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase 
and I was wondering how am I to know what's the best mapper and reducer 
count, I mean what are the criteria that need to be taken into 
consideration when determining the mapper and reducer counts. My MR job 
is reeding data from a Hbase table, said data is processed in the mapper 
and the reducer takes the data and outputs some stuff to another Hbase 
table. I want to be able to dinamicly deduce what's the correct number 
of mappers to initially process the data (actually map it to a specific 
criterion ) and the reducers to later do some other magic on it and 
output a new dataset which then saved to a new Hbase Table. I've read 
that when reading data from files I should have something like 10 
mappers per DFS block, but I have no clue how to translate that in my 
case where the input is a HBase table. Any ideas would be appreciated, 
even if it's a book or an article I should read.

Regards,
Peter

Re: Can add a regular check in DataNode on free disk space?

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Sam,

You can set the property "dfs.datanode.du.reserved" to reserve some space
for non-DFS use. By doing that, Hadoop daemons will keep writing to log
files, and it will help you diagnose the issue.

Hope it helps.

Regards,
Aitor

On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:

> Hi Dhiraj,
>
> My cluster only includes 1 datanode and its log does not include any
> warning/error messages for the out of free disk space. That wastes some of
> my time to find the root cause.
>
> Also I did not find any free disk checking code in DataNode.java. So it
> will be better if the datanode could check the free disk frequently and
> write the warning/error info into its log.
>
>
> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>
>  Formatting NameNode will cause data loss – in effect you will lose all
>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>> have no idea where your data(files) are stored. I don’t think that’s what
>> you’re looking for.
>>
>> I am wondering why isn’t there any log information on DataNode for disk
>> full. What version of Hadoop are you using and what’s your configuration(
>> Single Node, Single Node Pseudo Distributed or Cluster)
>>
>>
>>
>> Regards,
>>
>> Dhiraj
>>
>>
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* Monday, October 20, 2014 11:51 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>
>>
>>
>> Hi unmesha,
>>
>> Thanks for your response, but I am not clear what effect will the hadoop
>> cluster has after applying above operations. Could you pls give more
>> explanations?
>>
>>
>>
>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>
>> 1. Stop all Hadoop daemons
>>
>> 2. Remove all files from
>>
>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>
>> 3. Format namenode
>>
>> 4. Start all Hadoop daemons.
>>
>>
>>
>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>  Thanks!
>>
>>
>>
>>
>>
>> --
>>
>> *Thanks & Regards *
>>
>>
>>
>> *Unmesha Sreeveni U.B*
>>
>> *Hadoop, Bigdata Developer*
>>
>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If
>> the reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please notify
>> the sender by telephone or e-mail (as shown above) immediately and destroy
>> any and all copies of this message in your possession (whether hard copies
>> or electronically stored copies).
>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Sam,

You can set the property "dfs.datanode.du.reserved" to reserve some space
for non-DFS use. By doing that, Hadoop daemons will keep writing to log
files, and it will help you diagnose the issue.

Hope it helps.

Regards,
Aitor

On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:

> Hi Dhiraj,
>
> My cluster only includes 1 datanode and its log does not include any
> warning/error messages for the out of free disk space. That wastes some of
> my time to find the root cause.
>
> Also I did not find any free disk checking code in DataNode.java. So it
> will be better if the datanode could check the free disk frequently and
> write the warning/error info into its log.
>
>
> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>
>  Formatting NameNode will cause data loss – in effect you will lose all
>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>> have no idea where your data(files) are stored. I don’t think that’s what
>> you’re looking for.
>>
>> I am wondering why isn’t there any log information on DataNode for disk
>> full. What version of Hadoop are you using and what’s your configuration(
>> Single Node, Single Node Pseudo Distributed or Cluster)
>>
>>
>>
>> Regards,
>>
>> Dhiraj
>>
>>
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* Monday, October 20, 2014 11:51 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>
>>
>>
>> Hi unmesha,
>>
>> Thanks for your response, but I am not clear what effect will the hadoop
>> cluster has after applying above operations. Could you pls give more
>> explanations?
>>
>>
>>
>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>
>> 1. Stop all Hadoop daemons
>>
>> 2. Remove all files from
>>
>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>
>> 3. Format namenode
>>
>> 4. Start all Hadoop daemons.
>>
>>
>>
>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>  Thanks!
>>
>>
>>
>>
>>
>> --
>>
>> *Thanks & Regards *
>>
>>
>>
>> *Unmesha Sreeveni U.B*
>>
>> *Hadoop, Bigdata Developer*
>>
>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If
>> the reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please notify
>> the sender by telephone or e-mail (as shown above) immediately and destroy
>> any and all copies of this message in your possession (whether hard copies
>> or electronically stored copies).
>>
>>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi Sam,

You can set the property "dfs.datanode.du.reserved" to reserve some space
for non-DFS use. By doing that, Hadoop daemons will keep writing to log
files, and it will help you diagnose the issue.

Hope it helps.

Regards,
Aitor

On 20 October 2014 11:27, sam liu <sa...@gmail.com> wrote:

> Hi Dhiraj,
>
> My cluster only includes 1 datanode and its log does not include any
> warning/error messages for the out of free disk space. That wastes some of
> my time to find the root cause.
>
> Also I did not find any free disk checking code in DataNode.java. So it
> will be better if the datanode could check the free disk frequently and
> write the warning/error info into its log.
>
>
> 2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:
>
>  Formatting NameNode will cause data loss – in effect you will lose all
>> your data on DataNodes(rather access to data on DataNodes). NameNode will
>> have no idea where your data(files) are stored. I don’t think that’s what
>> you’re looking for.
>>
>> I am wondering why isn’t there any log information on DataNode for disk
>> full. What version of Hadoop are you using and what’s your configuration(
>> Single Node, Single Node Pseudo Distributed or Cluster)
>>
>>
>>
>> Regards,
>>
>> Dhiraj
>>
>>
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* Monday, October 20, 2014 11:51 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>>
>>
>>
>> Hi unmesha,
>>
>> Thanks for your response, but I am not clear what effect will the hadoop
>> cluster has after applying above operations. Could you pls give more
>> explanations?
>>
>>
>>
>> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>>
>> 1. Stop all Hadoop daemons
>>
>> 2. Remove all files from
>>
>>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>>
>> 3. Format namenode
>>
>> 4. Start all Hadoop daemons.
>>
>>
>>
>> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>  Thanks!
>>
>>
>>
>>
>>
>> --
>>
>> *Thanks & Regards *
>>
>>
>>
>> *Unmesha Sreeveni U.B*
>>
>> *Hadoop, Bigdata Developer*
>>
>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>>
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If
>> the reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please notify
>> the sender by telephone or e-mail (as shown above) immediately and destroy
>> any and all copies of this message in your possession (whether hard copies
>> or electronically stored copies).
>>
>>
>

Best number of mappers and reducers when processing data to and from HBase?

Posted by peterm_second <re...@gmail.com>.
Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase 
and I was wondering how am I to know what's the best mapper and reducer 
count, I mean what are the criteria that need to be taken into 
consideration when determining the mapper and reducer counts. My MR job 
is reeding data from a Hbase table, said data is processed in the mapper 
and the reducer takes the data and outputs some stuff to another Hbase 
table. I want to be able to dinamicly deduce what's the correct number 
of mappers to initially process the data (actually map it to a specific 
criterion ) and the reducers to later do some other magic on it and 
output a new dataset which then saved to a new Hbase Table. I've read 
that when reading data from files I should have something like 10 
mappers per DFS block, but I have no clue how to translate that in my 
case where the input is a HBase table. Any ideas would be appreciated, 
even if it's a book or an article I should read.

Regards,
Peter

Best number of mappers and reducers when processing data to and from HBase?

Posted by peterm_second <re...@gmail.com>.
Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase 
and I was wondering how am I to know what's the best mapper and reducer 
count, I mean what are the criteria that need to be taken into 
consideration when determining the mapper and reducer counts. My MR job 
is reeding data from a Hbase table, said data is processed in the mapper 
and the reducer takes the data and outputs some stuff to another Hbase 
table. I want to be able to dinamicly deduce what's the correct number 
of mappers to initially process the data (actually map it to a specific 
criterion ) and the reducers to later do some other magic on it and 
output a new dataset which then saved to a new Hbase Table. I've read 
that when reading data from files I should have something like 10 
mappers per DFS block, but I have no clue how to translate that in my 
case where the input is a HBase table. Any ideas would be appreciated, 
even if it's a book or an article I should read.

Regards,
Peter

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Dhiraj,

My cluster only includes 1 datanode and its log does not include any
warning/error messages for the out of free disk space. That wastes some of
my time to find the root cause.

Also I did not find any free disk checking code in DataNode.java. So it
will be better if the datanode could check the free disk frequently and
write the warning/error info into its log.


2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Nitin Pawar <ni...@gmail.com>.
Hi Sam,

Monitoring disks and other server related activities can be easily handled
by Nagios

On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble <Dh...@sandisk.com>
wrote:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>


-- 
Nitin Pawar

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Dhiraj,

My cluster only includes 1 datanode and its log does not include any
warning/error messages for the out of free disk space. That wastes some of
my time to find the root cause.

Also I did not find any free disk checking code in DataNode.java. So it
will be better if the datanode could check the free disk frequently and
write the warning/error info into its log.


2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Nitin Pawar <ni...@gmail.com>.
Hi Sam,

Monitoring disks and other server related activities can be easily handled
by Nagios

On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble <Dh...@sandisk.com>
wrote:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>


-- 
Nitin Pawar

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Dhiraj,

My cluster only includes 1 datanode and its log does not include any
warning/error messages for the out of free disk space. That wastes some of
my time to find the root cause.

Also I did not find any free disk checking code in DataNode.java. So it
will be better if the datanode could check the free disk frequently and
write the warning/error info into its log.


2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by Nitin Pawar <ni...@gmail.com>.
Hi Sam,

Monitoring disks and other server related activities can be easily handled
by Nagios

On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble <Dh...@sandisk.com>
wrote:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>


-- 
Nitin Pawar

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi Dhiraj,

My cluster only includes 1 datanode and its log does not include any
warning/error messages for the out of free disk space. That wastes some of
my time to find the root cause.

Also I did not find any free disk checking code in DataNode.java. So it
will be better if the datanode could check the free disk frequently and
write the warning/error info into its log.


2014-10-19 23:28 GMT-07:00 Dhiraj Kamble <Dh...@sandisk.com>:

>  Formatting NameNode will cause data loss – in effect you will lose all
> your data on DataNodes(rather access to data on DataNodes). NameNode will
> have no idea where your data(files) are stored. I don’t think that’s what
> you’re looking for.
>
> I am wondering why isn’t there any log information on DataNode for disk
> full. What version of Hadoop are you using and what’s your configuration(
> Single Node, Single Node Pseudo Distributed or Cluster)
>
>
>
> Regards,
>
> Dhiraj
>
>
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* Monday, October 20, 2014 11:51 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Can add a regular check in DataNode on free disk space?
>
>
>
> Hi unmesha,
>
> Thanks for your response, but I am not clear what effect will the hadoop
> cluster has after applying above operations. Could you pls give more
> explanations?
>
>
>
> 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:
>
> 1. Stop all Hadoop daemons
>
> 2. Remove all files from
>
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
>
> 3. Format namenode
>
> 4. Start all Hadoop daemons.
>
>
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>  Thanks!
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
>
>
> ------------------------------
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
>

RE: Can add a regular check in DataNode on free disk space?

Posted by Dhiraj Kamble <Dh...@sandisk.com>.
Formatting NameNode will cause data loss – in effect you will lose all your data on DataNodes(rather access to data on DataNodes). NameNode will have no idea where your data(files) are stored. I don’t think that’s what you’re looking for.
I am wondering why isn’t there any log information on DataNode for disk full. What version of Hadoop are you using and what’s your configuration( Single Node, Single Node Pseudo Distributed or Cluster)

Regards,
Dhiraj

From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: Monday, October 20, 2014 11:51 AM
To: user@hadoop.apache.org
Subject: Re: Can add a regular check in DataNode on free disk space?

Hi unmesha,
Thanks for your response, but I am not clear what effect will the hadoop cluster has after applying above operations. Could you pls give more explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>>:
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com>> wrote:
Hi Experts and Developers,
At present, if a DataNode does not has free disk space, we can not get this bad situation from anywhere, including DataNode log. At the same time, under this situation, the hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not know the root cause is that the only datanode runs out of disk space, and he also could not get any useful hint in datanode log. So I believe it will be better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg in datanode log if that datanode runs out of space. What's your opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)

Thanks!



--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Center for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


RE: Can add a regular check in DataNode on free disk space?

Posted by Dhiraj Kamble <Dh...@sandisk.com>.
Formatting NameNode will cause data loss – in effect you will lose all your data on DataNodes(rather access to data on DataNodes). NameNode will have no idea where your data(files) are stored. I don’t think that’s what you’re looking for.
I am wondering why isn’t there any log information on DataNode for disk full. What version of Hadoop are you using and what’s your configuration( Single Node, Single Node Pseudo Distributed or Cluster)

Regards,
Dhiraj

From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: Monday, October 20, 2014 11:51 AM
To: user@hadoop.apache.org
Subject: Re: Can add a regular check in DataNode on free disk space?

Hi unmesha,
Thanks for your response, but I am not clear what effect will the hadoop cluster has after applying above operations. Could you pls give more explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>>:
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com>> wrote:
Hi Experts and Developers,
At present, if a DataNode does not has free disk space, we can not get this bad situation from anywhere, including DataNode log. At the same time, under this situation, the hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not know the root cause is that the only datanode runs out of disk space, and he also could not get any useful hint in datanode log. So I believe it will be better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg in datanode log if that datanode runs out of space. What's your opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)

Thanks!



--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Center for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


RE: Can add a regular check in DataNode on free disk space?

Posted by Dhiraj Kamble <Dh...@sandisk.com>.
Formatting NameNode will cause data loss – in effect you will lose all your data on DataNodes(rather access to data on DataNodes). NameNode will have no idea where your data(files) are stored. I don’t think that’s what you’re looking for.
I am wondering why isn’t there any log information on DataNode for disk full. What version of Hadoop are you using and what’s your configuration( Single Node, Single Node Pseudo Distributed or Cluster)

Regards,
Dhiraj

From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: Monday, October 20, 2014 11:51 AM
To: user@hadoop.apache.org
Subject: Re: Can add a regular check in DataNode on free disk space?

Hi unmesha,
Thanks for your response, but I am not clear what effect will the hadoop cluster has after applying above operations. Could you pls give more explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>>:
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com>> wrote:
Hi Experts and Developers,
At present, if a DataNode does not has free disk space, we can not get this bad situation from anywhere, including DataNode log. At the same time, under this situation, the hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not know the root cause is that the only datanode runs out of disk space, and he also could not get any useful hint in datanode log. So I believe it will be better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg in datanode log if that datanode runs out of space. What's your opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)

Thanks!



--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Center for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


RE: Can add a regular check in DataNode on free disk space?

Posted by Dhiraj Kamble <Dh...@sandisk.com>.
Formatting NameNode will cause data loss – in effect you will lose all your data on DataNodes(rather access to data on DataNodes). NameNode will have no idea where your data(files) are stored. I don’t think that’s what you’re looking for.
I am wondering why isn’t there any log information on DataNode for disk full. What version of Hadoop are you using and what’s your configuration( Single Node, Single Node Pseudo Distributed or Cluster)

Regards,
Dhiraj

From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: Monday, October 20, 2014 11:51 AM
To: user@hadoop.apache.org
Subject: Re: Can add a regular check in DataNode on free disk space?

Hi unmesha,
Thanks for your response, but I am not clear what effect will the hadoop cluster has after applying above operations. Could you pls give more explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>>:
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com>> wrote:
Hi Experts and Developers,
At present, if a DataNode does not has free disk space, we can not get this bad situation from anywhere, including DataNode log. At the same time, under this situation, the hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not know the root cause is that the only datanode runs out of disk space, and he also could not get any useful hint in datanode log. So I believe it will be better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg in datanode log if that datanode runs out of space. What's your opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)

Thanks!



--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Center for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi unmesha,

Thanks for your response, but I am not clear what effect will the hadoop
cluster has after applying above operations. Could you pls give more
explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:

> 1. Stop all Hadoop daemons
> 2. Remove all files from
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
> 3. Format namenode
> 4. Start all Hadoop daemons.
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>
>> Thanks!
>>
>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi unmesha,

Thanks for your response, but I am not clear what effect will the hadoop
cluster has after applying above operations. Could you pls give more
explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:

> 1. Stop all Hadoop daemons
> 2. Remove all files from
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
> 3. Format namenode
> 4. Start all Hadoop daemons.
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>
>> Thanks!
>>
>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi unmesha,

Thanks for your response, but I am not clear what effect will the hadoop
cluster has after applying above operations. Could you pls give more
explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:

> 1. Stop all Hadoop daemons
> 2. Remove all files from
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
> 3. Format namenode
> 4. Start all Hadoop daemons.
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>
>> Thanks!
>>
>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by sam liu <sa...@gmail.com>.
Hi unmesha,

Thanks for your response, but I am not clear what effect will the hadoop
cluster has after applying above operations. Could you pls give more
explanations?

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <un...@gmail.com>:

> 1. Stop all Hadoop daemons
> 2. Remove all files from
>                               /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
> 3. Format namenode
> 4. Start all Hadoop daemons.
>
> On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:
>
>> Hi Experts and Developers,
>>
>> At present, if a DataNode does not has free disk space, we can not get
>> this bad situation from anywhere, including DataNode log. At the same time,
>> under this situation, the hdfs writing operation will fail and return error
>> msg as below. However, from the error msg, user could not know the root
>> cause is that the only datanode runs out of disk space, and he also could
>> not get any useful hint in datanode log. So I believe it will be better if
>> we could add a regular check in DataNode on free disk space, and it will
>> add WARNING or ERROR msg in datanode log if that datanode runs out of
>> space. What's your opinion?
>>
>> Error Msg:
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
>> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
>> and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>
>>
>> Thanks!
>>
>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: Can add a regular check in DataNode on free disk space?

Posted by unmesha sreeveni <un...@gmail.com>.
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:

> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>
> Thanks!
>



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Can add a regular check in DataNode on free disk space?

Posted by unmesha sreeveni <un...@gmail.com>.
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:

> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>
> Thanks!
>



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Can add a regular check in DataNode on free disk space?

Posted by unmesha sreeveni <un...@gmail.com>.
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:

> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>
> Thanks!
>



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Can add a regular check in DataNode on free disk space?

Posted by unmesha sreeveni <un...@gmail.com>.
1. Stop all Hadoop daemons
2. Remove all files from
                              /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
3. Format namenode
4. Start all Hadoop daemons.

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <sa...@gmail.com> wrote:

> Hi Experts and Developers,
>
> At present, if a DataNode does not has free disk space, we can not get
> this bad situation from anywhere, including DataNode log. At the same time,
> under this situation, the hdfs writing operation will fail and return error
> msg as below. However, from the error msg, user could not know the root
> cause is that the only datanode runs out of disk space, and he also could
> not get any useful hint in datanode log. So I believe it will be better if
> we could add a regular check in DataNode on free disk space, and it will
> add WARNING or ERROR msg in datanode log if that datanode runs out of
> space. What's your opinion?
>
> Error Msg:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
> and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>
>
> Thanks!
>



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/