You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Jonathan Bender <jo...@stripe.com.INVALID> on 2018/09/17 17:37:38 UTC

LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Hello,

We started are using CGroups with LinuxContainerExecutor recently, running
Apache Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a
yarn container will fail with a message like the following:
WARN privileged.PrivilegedOperationExecutor: Shell execution returned exit
code: 35. Privileged Execution Operation Stderr:
Could not create container dirsCould not create local files and directories

Looking at the container executor source it's traceable to errors here:
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604

And ultimately to
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672

The root failure seems to be in the underlying mkdir call, but that exit
code / errno is swallowed so we don't have more details. We tend to see
this when many containers start at the same time for the same application
on a host, and suspect it may be related to some race conditions around
those shared directories between containers for the same application.

Has anyone seen similar failures in using the LinuxContainerExecutor?

This issue compounded because LinuxContainerExecutor renders the node
unhealthy in these scenarios:
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566

Under some circumstances this seems appropriate, but since this is a
transient failure (none of these machines were at capacity for disks,
inodes, etc) we shouldn't down the NodeManager. The behavior to add this
blacklisting came as part of https://issues.apache.org/jira/browse/YARN-6302
which seems perfectly valid, but perhaps we should make this configurable
so certain users can opt out?

Cheers,
Jon

Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Posted by Jonathan Bender <jo...@stripe.com.INVALID>.

Thanks for the responses all!

@Shane - that's great, we planned to move to 3.1.x soon anyway, all the
more reason to do that.

@Eric - I opened a JIRA here with my findings:
https://issues.apache.org/jira/browse/YARN-8786

On Mon, Sep 17, 2018 at 12:23 PM, Shane Kumpf <sh...@gmail.com>
wrote:

> Hey Jon,
>
> YARN-8751 takes care of the issue that marks the NM unhealthy under these
> conditions. If you can open a JIRA with details on the swallowed error,
> that would be appreciated. As noted, 3.1.1 has a number of fixes to the
> YARN containerization features, so it would be great if you can see if the
> issue still occurs with that release.
>
> Thanks,
> -Shane
>
> On Mon, Sep 17, 2018 at 1:05 PM Jeff Hubbs <jh...@att.net> wrote:
>
>> I would also just suggest moving up to 3.1.1 and trying again. Barring
>> that, maybe you can take the error message at its word. My experience with
>> running Hadoop 3.x jobs is a little limited, but I know that jobs can paint
>> a lot of data into /tmp/hadoop-yarn and if your nodes can't absorb a lot of
>> expansion in that directory, things will error out albeit softly. Noting
>> the way the terasort example behaves in that regard, I set up my worker
>> nodes to make /tmp/hadoop-yarn a mount point for its own disk volume whose
>> size I can preset and I can also optionally enable transparent compression
>> via btrfs. A lot of times, I would expect I could give that volume some
>> token small size but in trying to make a 1/5-scale (i.e., 200GB) terasort
>> run, 128GiB with compression enabled across five workers wasn't enough.
>> 1/10th-scale I could manage but at 1/5, it would fill up one node's
>> /tmp/hadoop-yarn, then the next, then the next, etc. Makes me think that
>> terasort tries to write the whole dang thing out to extra-HDFS file system
>> before making an output file in HDFS.
>>
>> On 9/17/18 1:55 PM, Eric Badger wrote:
>>
>> Hi Jonathan,
>>
>> Have you opened up a YARN JIRA with your findings? If not, that would be
>> the next step in debugging the issue and coding up a fix. This certainly
>> sounds like a bug and something that we should get to the bottom of.
>>
>> As far as Nodemanagers becoming unhealthy, a config could be added to
>> prevent this. But, if you're only seeing 1 failure out of millions of
>> tasks, this seems like it would unmask more problems than it fixes. 1
>> container failing is bad, but a node going bad and failing every container
>> that runs on it forever until it is shutdown is much, much worse. However,
>> if you think that you have a use case that could benefit from the config
>> being optional, that is something we could also look into. That would be a
>> separate YARN JIRA as well.
>>
>> Thanks,
>>
>> Eric
>>
>> On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender <
>> jonbender@stripe.com.invalid> wrote:
>>
>>> Hello,
>>>
>>> We started are using CGroups with LinuxContainerExecutor recently,
>>> running Apache Hadoop 3.0.0. Occasionally (once out of many millions of
>>> tasks) a yarn container will fail with a message like the following:
>>> WARN privileged.PrivilegedOperationExecutor: Shell execution returned
>>> exit code: 35. Privileged Execution Operation Stderr:
>>> Could not create container dirsCould not create local files and
>>> directories
>>>
>>> Looking at the container executor source it's traceable to errors here:
>>> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/
>>> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-
>>> nodemanager/src/main/native/container-executor/impl/
>>> container-executor.c#L1604
>>>
>>> And ultimately to https://github.com/apache/
>>> hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-
>>> yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/
>>> container-executor/impl/container-executor.c#L672
>>>
>>> The root failure seems to be in the underlying mkdir call, but that exit
>>> code / errno is swallowed so we don't have more details. We tend to see
>>> this when many containers start at the same time for the same application
>>> on a host, and suspect it may be related to some race conditions around
>>> those shared directories between containers for the same application.
>>>
>>> Has anyone seen similar failures in using the LinuxContainerExecutor?
>>>
>>> This issue compounded because LinuxContainerExecutor renders the node
>>> unhealthy in these scenarios: https://github.com/apache/
>>> hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-
>>> yarn/hadoop-yarn-server/hadoop-yarn-server-
>>> nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/
>>> LinuxContainerExecutor.java#L566
>>>
>>> Under some circumstances this seems appropriate, but since this is a
>>> transient failure (none of these machines were at capacity for disks,
>>> inodes, etc) we shouldn't down the NodeManager. The behavior to add this
>>> blacklisting came as part of https://issues.apache.org/
>>> jira/browse/YARN-6302 which seems perfectly valid, but perhaps we
>>> should make this configurable so certain users can opt out?
>>>
>>> Cheers,
>>> Jon
>>>
>>
>>
>>

Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Posted by Shane Kumpf <sh...@gmail.com>.

Hey Jon,

YARN-8751 takes care of the issue that marks the NM unhealthy under these
conditions. If you can open a JIRA with details on the swallowed error,
that would be appreciated. As noted, 3.1.1 has a number of fixes to the
YARN containerization features, so it would be great if you can see if the
issue still occurs with that release.

Thanks,
-Shane

On Mon, Sep 17, 2018 at 1:05 PM Jeff Hubbs <jh...@att.net> wrote:

> I would also just suggest moving up to 3.1.1 and trying again. Barring
> that, maybe you can take the error message at its word. My experience with
> running Hadoop 3.x jobs is a little limited, but I know that jobs can paint
> a lot of data into /tmp/hadoop-yarn and if your nodes can't absorb a lot of
> expansion in that directory, things will error out albeit softly. Noting
> the way the terasort example behaves in that regard, I set up my worker
> nodes to make /tmp/hadoop-yarn a mount point for its own disk volume whose
> size I can preset and I can also optionally enable transparent compression
> via btrfs. A lot of times, I would expect I could give that volume some
> token small size but in trying to make a 1/5-scale (i.e., 200GB) terasort
> run, 128GiB with compression enabled across five workers wasn't enough.
> 1/10th-scale I could manage but at 1/5, it would fill up one node's
> /tmp/hadoop-yarn, then the next, then the next, etc. Makes me think that
> terasort tries to write the whole dang thing out to extra-HDFS file system
> before making an output file in HDFS.
>
> On 9/17/18 1:55 PM, Eric Badger wrote:
>
> Hi Jonathan,
>
> Have you opened up a YARN JIRA with your findings? If not, that would be
> the next step in debugging the issue and coding up a fix. This certainly
> sounds like a bug and something that we should get to the bottom of.
>
> As far as Nodemanagers becoming unhealthy, a config could be added to
> prevent this. But, if you're only seeing 1 failure out of millions of
> tasks, this seems like it would unmask more problems than it fixes. 1
> container failing is bad, but a node going bad and failing every container
> that runs on it forever until it is shutdown is much, much worse. However,
> if you think that you have a use case that could benefit from the config
> being optional, that is something we could also look into. That would be a
> separate YARN JIRA as well.
>
> Thanks,
>
> Eric
>
> On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender <
> jonbender@stripe.com.invalid> wrote:
>
>> Hello,
>>
>> We started are using CGroups with LinuxContainerExecutor recently,
>> running Apache Hadoop 3.0.0. Occasionally (once out of many millions of
>> tasks) a yarn container will fail with a message like the following:
>> WARN privileged.PrivilegedOperationExecutor: Shell execution returned
>> exit code: 35. Privileged Execution Operation Stderr:
>> Could not create container dirsCould not create local files and
>> directories
>>
>> Looking at the container executor source it's traceable to errors here:
>> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604
>>
>> And ultimately to
>> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672
>>
>> The root failure seems to be in the underlying mkdir call, but that exit
>> code / errno is swallowed so we don't have more details. We tend to see
>> this when many containers start at the same time for the same application
>> on a host, and suspect it may be related to some race conditions around
>> those shared directories between containers for the same application.
>>
>> Has anyone seen similar failures in using the LinuxContainerExecutor?
>>
>> This issue compounded because LinuxContainerExecutor renders the node
>> unhealthy in these scenarios:
>> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
>>
>> Under some circumstances this seems appropriate, but since this is a
>> transient failure (none of these machines were at capacity for disks,
>> inodes, etc) we shouldn't down the NodeManager. The behavior to add this
>> blacklisting came as part of
>> https://issues.apache.org/jira/browse/YARN-6302 which seems perfectly
>> valid, but perhaps we should make this configurable so certain users can
>> opt out?
>>
>> Cheers,
>> Jon
>>
>
>
>

Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Posted by Jeff Hubbs <jh...@att.net>.

I would also just suggest moving up to 3.1.1 and trying again. Barring 
that, maybe you can take the error message at its word. My experience 
with running Hadoop 3.x jobs is a little limited, but I know that jobs 
can paint a lot of data into /tmp/hadoop-yarn and if your nodes can't 
absorb a lot of expansion in that directory, things will error out 
albeit softly. Noting the way the terasort example behaves in that 
regard, I set up my worker nodes to make /tmp/hadoop-yarn a mount point 
for its own disk volume whose size I can preset and I can also 
optionally enable transparent compression via btrfs. A lot of times, I 
would expect I could give that volume some token small size but in 
trying to make a 1/5-scale (i.e., 200GB) terasort run, 128GiB with 
compression enabled across five workers wasn't enough. 1/10th-scale I 
could manage but at 1/5, it would fill up one node's /tmp/hadoop-yarn, 
then the next, then the next, etc. Makes me think that terasort tries to 
write the whole dang thing out to extra-HDFS file system before making 
an output file in HDFS.

On 9/17/18 1:55 PM, Eric Badger wrote:
> Hi Jonathan,
>
> Have you opened up a YARN JIRA with your findings? If not, that would 
> be the next step in debugging the issue and coding up a fix. This 
> certainly sounds like a bug and something that we should get to the 
> bottom of.
>
> As far as Nodemanagers becoming unhealthy, a config could be added to 
> prevent this. But, if you're only seeing 1 failure out of millions of 
> tasks, this seems like it would unmask more problems than it fixes. 1 
> container failing is bad, but a node going bad and failing every 
> container that runs on it forever until it is shutdown is much, much 
> worse. However, if you think that you have a use case that could 
> benefit from the config being optional, that is something we could 
> also look into. That would be a separate YARN JIRA as well.
>
> Thanks,
>
> Eric
>
> On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender 
> <jonbender@stripe.com.invalid <ma...@stripe.com.invalid>> 
> wrote:
>
>     Hello,
>
>     We started are using CGroups with LinuxContainerExecutor recently,
>     running Apache Hadoop 3.0.0. Occasionally (once out of many
>     millions of tasks) a yarn container will fail with a message like
>     the following:
>     WARN privileged.PrivilegedOperationExecutor: Shell execution
>     returned exit code: 35. Privileged Execution Operation Stderr:
>     Could not create container dirsCould not create local files and
>     directories
>
>     Looking at the container executor source it's traceable to errors
>     here:
>     https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604
>     <https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604>
>
>     And ultimately to
>     https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672
>     <https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672>
>
>     The root failure seems to be in the underlying mkdir call, but
>     that exit code / errno is swallowed so we don't have more details.
>     We tend to see this when many containers start at the same time
>     for the same application on a host, and suspect it may be related
>     to some race conditions around those shared directories between
>     containers for the same application.
>
>     Has anyone seen similar failures in using the LinuxContainerExecutor?
>
>     This issue compounded because LinuxContainerExecutor renders the
>     node unhealthy in these scenarios:
>     https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
>     <https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566>
>
>     Under some circumstances this seems appropriate, but since this is
>     a transient failure (none of these machines were at capacity for
>     disks, inodes, etc) we shouldn't down the NodeManager. The
>     behavior to add this blacklisting came as part of
>     https://issues.apache.org/jira/browse/YARN-6302
>     <https://issues.apache.org/jira/browse/YARN-6302> which seems
>     perfectly valid, but perhaps we should make this configurable so
>     certain users can opt out?
>
>     Cheers,
>     Jon
>
>

Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Posted by Eric Badger <eb...@oath.com.INVALID>.

Hi Jonathan,

Have you opened up a YARN JIRA with your findings? If not, that would be
the next step in debugging the issue and coding up a fix. This certainly
sounds like a bug and something that we should get to the bottom of.

As far as Nodemanagers becoming unhealthy, a config could be added to
prevent this. But, if you're only seeing 1 failure out of millions of
tasks, this seems like it would unmask more problems than it fixes. 1
container failing is bad, but a node going bad and failing every container
that runs on it forever until it is shutdown is much, much worse. However,
if you think that you have a use case that could benefit from the config
being optional, that is something we could also look into. That would be a
separate YARN JIRA as well.

Thanks,

Eric

On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender <
jonbender@stripe.com.invalid> wrote:

> Hello,
>
> We started are using CGroups with LinuxContainerExecutor recently, running
> Apache Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a
> yarn container will fail with a message like the following:
> WARN privileged.PrivilegedOperationExecutor: Shell execution returned
> exit code: 35. Privileged Execution Operation Stderr:
> Could not create container dirsCould not create local files and directories
>
> Looking at the container executor source it's traceable to errors here:
> https://github.com/apache/hadoop/blob/release-3.0.0-RC1/
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-
> nodemanager/src/main/native/container-executor/impl/
> container-executor.c#L1604
>
> And ultimately to https://github.com/apache/hadoop/blob/release-3.0.0-RC1/
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-
> nodemanager/src/main/native/container-executor/impl/
> container-executor.c#L672
>
> The root failure seems to be in the underlying mkdir call, but that exit
> code / errno is swallowed so we don't have more details. We tend to see
> this when many containers start at the same time for the same application
> on a host, and suspect it may be related to some race conditions around
> those shared directories between containers for the same application.
>
> Has anyone seen similar failures in using the LinuxContainerExecutor?
>
> This issue compounded because LinuxContainerExecutor renders the node
> unhealthy in these scenarios: https://github.com/apache/
> hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-
> yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/
> apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
>
> Under some circumstances this seems appropriate, but since this is a
> transient failure (none of these machines were at capacity for disks,
> inodes, etc) we shouldn't down the NodeManager. The behavior to add this
> blacklisting came as part of https://issues.apache.org/
> jira/browse/YARN-6302 which seems perfectly valid, but perhaps we should
> make this configurable so certain users can opt out?
>
> Cheers,
> Jon
>