You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Anfernee Xu <an...@gmail.com> on 2014/03/30 19:33:42 UTC

task is still running on node has no disk space

 Hi,

I'm running 2.2.0 clusters, my application is pretty disk I/O
expensive(processing huge zip files), overtime I found some job failure due
to "no space on disk", normally the leftover files can be cleaned, but for
some reason if they're not, I expect no more new task can run on this node,
but in fact I still can see new tasks are coming to that node and keep
failing. My application will write data to /tmp(where may cause out of disk
space), so I can configure below properties:

<property>
     <name>yarn.nodemanager.local-dirs</name>
     <value>
                 /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
                /tmp/nm-local-dir
     </value>
   </property>

  <property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>1.0</value>
   </property>

As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
on doc

yarn.nodemanager.disk-health-checker.min-healthy-disks:

The minimum fraction of number of disks to be healthy for the nodemanager
to launch new containers. This correspond to both
yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
are less number of healthy local-dirs (or log-dirs) available, then new
containers will not be launched on this node.

Did I miss anything?

-- 
--Anfernee

Re: task is still running on node has no disk space

Posted by Zhijie Shen <zs...@hortonworks.com>.
Hi Anfernee,

In 2.2, LocalDirsHandlerService doesn't check whether the disk is full or
not. It seem that disk fullness check will be available in 2.4: YARN-1781

- Zhijie


On Sun, Mar 30, 2014 at 10:33 AM, Anfernee Xu <an...@gmail.com> wrote:

>  Hi,
>
> I'm running 2.2.0 clusters, my application is pretty disk I/O
> expensive(processing huge zip files), overtime I found some job failure due
> to "no space on disk", normally the leftover files can be cleaned, but for
> some reason if they're not, I expect no more new task can run on this node,
> but in fact I still can see new tasks are coming to that node and keep
> failing. My application will write data to /tmp(where may cause out of disk
> space), so I can configure below properties:
>
> <property>
>      <name>yarn.nodemanager.local-dirs</name>
>      <value>
>                  /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
>                 /tmp/nm-local-dir
>      </value>
>    </property>
>
>   <property>
>      <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>      <value>1.0</value>
>    </property>
>
> As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
> on doc
>
> yarn.nodemanager.disk-health-checker.min-healthy-disks:
>
> The minimum fraction of number of disks to be healthy for the nodemanager
> to launch new containers. This correspond to both
> yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
> are less number of healthy local-dirs (or log-dirs) available, then new
> containers will not be launched on this node.
>
> Did I miss anything?
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: task is still running on node has no disk space

Posted by Zhijie Shen <zs...@hortonworks.com>.
Hi Anfernee,

In 2.2, LocalDirsHandlerService doesn't check whether the disk is full or
not. It seem that disk fullness check will be available in 2.4: YARN-1781

- Zhijie


On Sun, Mar 30, 2014 at 10:33 AM, Anfernee Xu <an...@gmail.com> wrote:

>  Hi,
>
> I'm running 2.2.0 clusters, my application is pretty disk I/O
> expensive(processing huge zip files), overtime I found some job failure due
> to "no space on disk", normally the leftover files can be cleaned, but for
> some reason if they're not, I expect no more new task can run on this node,
> but in fact I still can see new tasks are coming to that node and keep
> failing. My application will write data to /tmp(where may cause out of disk
> space), so I can configure below properties:
>
> <property>
>      <name>yarn.nodemanager.local-dirs</name>
>      <value>
>                  /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
>                 /tmp/nm-local-dir
>      </value>
>    </property>
>
>   <property>
>      <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>      <value>1.0</value>
>    </property>
>
> As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
> on doc
>
> yarn.nodemanager.disk-health-checker.min-healthy-disks:
>
> The minimum fraction of number of disks to be healthy for the nodemanager
> to launch new containers. This correspond to both
> yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
> are less number of healthy local-dirs (or log-dirs) available, then new
> containers will not be launched on this node.
>
> Did I miss anything?
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: task is still running on node has no disk space

Posted by Zhijie Shen <zs...@hortonworks.com>.
Hi Anfernee,

In 2.2, LocalDirsHandlerService doesn't check whether the disk is full or
not. It seem that disk fullness check will be available in 2.4: YARN-1781

- Zhijie


On Sun, Mar 30, 2014 at 10:33 AM, Anfernee Xu <an...@gmail.com> wrote:

>  Hi,
>
> I'm running 2.2.0 clusters, my application is pretty disk I/O
> expensive(processing huge zip files), overtime I found some job failure due
> to "no space on disk", normally the leftover files can be cleaned, but for
> some reason if they're not, I expect no more new task can run on this node,
> but in fact I still can see new tasks are coming to that node and keep
> failing. My application will write data to /tmp(where may cause out of disk
> space), so I can configure below properties:
>
> <property>
>      <name>yarn.nodemanager.local-dirs</name>
>      <value>
>                  /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
>                 /tmp/nm-local-dir
>      </value>
>    </property>
>
>   <property>
>      <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>      <value>1.0</value>
>    </property>
>
> As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
> on doc
>
> yarn.nodemanager.disk-health-checker.min-healthy-disks:
>
> The minimum fraction of number of disks to be healthy for the nodemanager
> to launch new containers. This correspond to both
> yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
> are less number of healthy local-dirs (or log-dirs) available, then new
> containers will not be launched on this node.
>
> Did I miss anything?
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: task is still running on node has no disk space

Posted by Zhijie Shen <zs...@hortonworks.com>.
Hi Anfernee,

In 2.2, LocalDirsHandlerService doesn't check whether the disk is full or
not. It seem that disk fullness check will be available in 2.4: YARN-1781

- Zhijie


On Sun, Mar 30, 2014 at 10:33 AM, Anfernee Xu <an...@gmail.com> wrote:

>  Hi,
>
> I'm running 2.2.0 clusters, my application is pretty disk I/O
> expensive(processing huge zip files), overtime I found some job failure due
> to "no space on disk", normally the leftover files can be cleaned, but for
> some reason if they're not, I expect no more new task can run on this node,
> but in fact I still can see new tasks are coming to that node and keep
> failing. My application will write data to /tmp(where may cause out of disk
> space), so I can configure below properties:
>
> <property>
>      <name>yarn.nodemanager.local-dirs</name>
>      <value>
>                  /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
>                 /tmp/nm-local-dir
>      </value>
>    </property>
>
>   <property>
>      <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>      <value>1.0</value>
>    </property>
>
> As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
> on doc
>
> yarn.nodemanager.disk-health-checker.min-healthy-disks:
>
> The minimum fraction of number of disks to be healthy for the nodemanager
> to launch new containers. This correspond to both
> yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
> are less number of healthy local-dirs (or log-dirs) available, then new
> containers will not be launched on this node.
>
> Did I miss anything?
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.