You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Public Network Services <pu...@gmail.com> on 2013/05/14 23:20:53 UTC

BlockMissingException

Hi...

I am getting a BlockMissingException in a fairly simple application with a
few mappers and reducers (see end of message).

Looking around in the web has not helped much, including JIRA issues
HDFS-767 and HDFS-1907. The configuration variable

   - dfs.client.baseTimeWindow.waitOn.BlockMissingException

does not seem to make a difference, either.

The BlockMissingException occurs in some of the runs, while in others
execution completes normally, which signifies a possible concurrency issue.

Any ideas?

Thanks!


org.apache.hadoop.yarn.YarnException:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-390546703... file=...job.splitmetainfo
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)

Re: BlockMissingException

Posted by Public Network Services <pu...@gmail.com>.

Very reasonable scenario, but the application I run does not delete the
input files, so such a race condition could not manifest itself at any
point.

Funnily enough, experimenting around we have changed some local path
permissions and it seems to work now.

Thanks! :-)


On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Is it possible that you have multiple MR jobs (or other HDFS clients)
> operating on the same file paths that could cause a conflict if run
> concurrently?
>
> At MR job submission time, the MR client identifies the set of input
> splits, which roughly correspond to the the blocks of the input HDFS files.
>  (This is a simplified description, because CombineFileInputFormat or your
> own custom InputFormat can complicate the picture, but this simplification
> is fine for our purposes.)  When map tasks launch, they read from the input
> splits (the HDFS file blocks).  If you have an MR job that decides once of
> its input splits needs block X, and then another process decides to delete
> the HDFS file containing block X before the map task that would read the
> block launches, then you'd have a race condition that could trigger a
> problem similar to this.
>
> Typically, the solution is to design applications such that concurrent
> deletes while reading from a particular HDFS file are not possible.  For
> example, you might code file deletion after the MR job that consumes those
> files, so that you know nothing else is reading while you're trying to
> delete.
>
> BlockMissingException could also show up if you've lost all replicas of a
> block, but this would be extremely rare for a typical deployment with a
> replication factor of 3.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I am getting a BlockMissingException in a fairly simple application with
>> a few mappers and reducers (see end of message).
>>
>> Looking around in the web has not helped much, including JIRA issues
>> HDFS-767 and HDFS-1907. The configuration variable
>>
>>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>>
>> does not seem to make a difference, either.
>>
>> The BlockMissingException occurs in some of the runs, while in others
>> execution completes normally, which signifies a possible concurrency issue.
>>
>> Any ideas?
>>
>> Thanks!
>>
>>
>> org.apache.hadoop.yarn.YarnException:
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-390546703... file=...job.splitmetainfo
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>>
>>
>>
>

Re: BlockMissingException

Posted by Public Network Services <pu...@gmail.com>.

Very reasonable scenario, but the application I run does not delete the
input files, so such a race condition could not manifest itself at any
point.

Funnily enough, experimenting around we have changed some local path
permissions and it seems to work now.

Thanks! :-)


On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Is it possible that you have multiple MR jobs (or other HDFS clients)
> operating on the same file paths that could cause a conflict if run
> concurrently?
>
> At MR job submission time, the MR client identifies the set of input
> splits, which roughly correspond to the the blocks of the input HDFS files.
>  (This is a simplified description, because CombineFileInputFormat or your
> own custom InputFormat can complicate the picture, but this simplification
> is fine for our purposes.)  When map tasks launch, they read from the input
> splits (the HDFS file blocks).  If you have an MR job that decides once of
> its input splits needs block X, and then another process decides to delete
> the HDFS file containing block X before the map task that would read the
> block launches, then you'd have a race condition that could trigger a
> problem similar to this.
>
> Typically, the solution is to design applications such that concurrent
> deletes while reading from a particular HDFS file are not possible.  For
> example, you might code file deletion after the MR job that consumes those
> files, so that you know nothing else is reading while you're trying to
> delete.
>
> BlockMissingException could also show up if you've lost all replicas of a
> block, but this would be extremely rare for a typical deployment with a
> replication factor of 3.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I am getting a BlockMissingException in a fairly simple application with
>> a few mappers and reducers (see end of message).
>>
>> Looking around in the web has not helped much, including JIRA issues
>> HDFS-767 and HDFS-1907. The configuration variable
>>
>>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>>
>> does not seem to make a difference, either.
>>
>> The BlockMissingException occurs in some of the runs, while in others
>> execution completes normally, which signifies a possible concurrency issue.
>>
>> Any ideas?
>>
>> Thanks!
>>
>>
>> org.apache.hadoop.yarn.YarnException:
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-390546703... file=...job.splitmetainfo
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>>
>>
>>
>

Re: BlockMissingException

Posted by Public Network Services <pu...@gmail.com>.

Very reasonable scenario, but the application I run does not delete the
input files, so such a race condition could not manifest itself at any
point.

Funnily enough, experimenting around we have changed some local path
permissions and it seems to work now.

Thanks! :-)


On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Is it possible that you have multiple MR jobs (or other HDFS clients)
> operating on the same file paths that could cause a conflict if run
> concurrently?
>
> At MR job submission time, the MR client identifies the set of input
> splits, which roughly correspond to the the blocks of the input HDFS files.
>  (This is a simplified description, because CombineFileInputFormat or your
> own custom InputFormat can complicate the picture, but this simplification
> is fine for our purposes.)  When map tasks launch, they read from the input
> splits (the HDFS file blocks).  If you have an MR job that decides once of
> its input splits needs block X, and then another process decides to delete
> the HDFS file containing block X before the map task that would read the
> block launches, then you'd have a race condition that could trigger a
> problem similar to this.
>
> Typically, the solution is to design applications such that concurrent
> deletes while reading from a particular HDFS file are not possible.  For
> example, you might code file deletion after the MR job that consumes those
> files, so that you know nothing else is reading while you're trying to
> delete.
>
> BlockMissingException could also show up if you've lost all replicas of a
> block, but this would be extremely rare for a typical deployment with a
> replication factor of 3.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I am getting a BlockMissingException in a fairly simple application with
>> a few mappers and reducers (see end of message).
>>
>> Looking around in the web has not helped much, including JIRA issues
>> HDFS-767 and HDFS-1907. The configuration variable
>>
>>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>>
>> does not seem to make a difference, either.
>>
>> The BlockMissingException occurs in some of the runs, while in others
>> execution completes normally, which signifies a possible concurrency issue.
>>
>> Any ideas?
>>
>> Thanks!
>>
>>
>> org.apache.hadoop.yarn.YarnException:
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-390546703... file=...job.splitmetainfo
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>>
>>
>>
>

Re: BlockMissingException

Posted by Public Network Services <pu...@gmail.com>.

Very reasonable scenario, but the application I run does not delete the
input files, so such a race condition could not manifest itself at any
point.

Funnily enough, experimenting around we have changed some local path
permissions and it seems to work now.

Thanks! :-)


On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Is it possible that you have multiple MR jobs (or other HDFS clients)
> operating on the same file paths that could cause a conflict if run
> concurrently?
>
> At MR job submission time, the MR client identifies the set of input
> splits, which roughly correspond to the the blocks of the input HDFS files.
>  (This is a simplified description, because CombineFileInputFormat or your
> own custom InputFormat can complicate the picture, but this simplification
> is fine for our purposes.)  When map tasks launch, they read from the input
> splits (the HDFS file blocks).  If you have an MR job that decides once of
> its input splits needs block X, and then another process decides to delete
> the HDFS file containing block X before the map task that would read the
> block launches, then you'd have a race condition that could trigger a
> problem similar to this.
>
> Typically, the solution is to design applications such that concurrent
> deletes while reading from a particular HDFS file are not possible.  For
> example, you might code file deletion after the MR job that consumes those
> files, so that you know nothing else is reading while you're trying to
> delete.
>
> BlockMissingException could also show up if you've lost all replicas of a
> block, but this would be extremely rare for a typical deployment with a
> replication factor of 3.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I am getting a BlockMissingException in a fairly simple application with
>> a few mappers and reducers (see end of message).
>>
>> Looking around in the web has not helped much, including JIRA issues
>> HDFS-767 and HDFS-1907. The configuration variable
>>
>>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>>
>> does not seem to make a difference, either.
>>
>> The BlockMissingException occurs in some of the runs, while in others
>> execution completes normally, which signifies a possible concurrency issue.
>>
>> Any ideas?
>>
>> Thanks!
>>
>>
>> org.apache.hadoop.yarn.YarnException:
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-390546703... file=...job.splitmetainfo
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>>         at
>> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>>         at
>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>>
>>
>>
>

Re: BlockMissingException

Posted by Chris Nauroth <cn...@hortonworks.com>.

Is it possible that you have multiple MR jobs (or other HDFS clients)
operating on the same file paths that could cause a conflict if run
concurrently?

At MR job submission time, the MR client identifies the set of input
splits, which roughly correspond to the the blocks of the input HDFS files.
 (This is a simplified description, because CombineFileInputFormat or your
own custom InputFormat can complicate the picture, but this simplification
is fine for our purposes.)  When map tasks launch, they read from the input
splits (the HDFS file blocks).  If you have an MR job that decides once of
its input splits needs block X, and then another process decides to delete
the HDFS file containing block X before the map task that would read the
block launches, then you'd have a race condition that could trigger a
problem similar to this.

Typically, the solution is to design applications such that concurrent
deletes while reading from a particular HDFS file are not possible.  For
example, you might code file deletion after the MR job that consumes those
files, so that you know nothing else is reading while you're trying to
delete.

BlockMissingException could also show up if you've lost all replicas of a
block, but this would be extremely rare for a typical deployment with a
replication factor of 3.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I am getting a BlockMissingException in a fairly simple application with a
> few mappers and reducers (see end of message).
>
> Looking around in the web has not helped much, including JIRA issues
> HDFS-767 and HDFS-1907. The configuration variable
>
>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>
> does not seem to make a difference, either.
>
> The BlockMissingException occurs in some of the runs, while in others
> execution completes normally, which signifies a possible concurrency issue.
>
> Any ideas?
>
> Thanks!
>
>
> org.apache.hadoop.yarn.YarnException:
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-390546703... file=...job.splitmetainfo
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>
>
>

Re: BlockMissingException

Posted by Chris Nauroth <cn...@hortonworks.com>.

Is it possible that you have multiple MR jobs (or other HDFS clients)
operating on the same file paths that could cause a conflict if run
concurrently?

At MR job submission time, the MR client identifies the set of input
splits, which roughly correspond to the the blocks of the input HDFS files.
 (This is a simplified description, because CombineFileInputFormat or your
own custom InputFormat can complicate the picture, but this simplification
is fine for our purposes.)  When map tasks launch, they read from the input
splits (the HDFS file blocks).  If you have an MR job that decides once of
its input splits needs block X, and then another process decides to delete
the HDFS file containing block X before the map task that would read the
block launches, then you'd have a race condition that could trigger a
problem similar to this.

Typically, the solution is to design applications such that concurrent
deletes while reading from a particular HDFS file are not possible.  For
example, you might code file deletion after the MR job that consumes those
files, so that you know nothing else is reading while you're trying to
delete.

BlockMissingException could also show up if you've lost all replicas of a
block, but this would be extremely rare for a typical deployment with a
replication factor of 3.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I am getting a BlockMissingException in a fairly simple application with a
> few mappers and reducers (see end of message).
>
> Looking around in the web has not helped much, including JIRA issues
> HDFS-767 and HDFS-1907. The configuration variable
>
>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>
> does not seem to make a difference, either.
>
> The BlockMissingException occurs in some of the runs, while in others
> execution completes normally, which signifies a possible concurrency issue.
>
> Any ideas?
>
> Thanks!
>
>
> org.apache.hadoop.yarn.YarnException:
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-390546703... file=...job.splitmetainfo
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>
>
>

Re: BlockMissingException

Posted by Chris Nauroth <cn...@hortonworks.com>.

Is it possible that you have multiple MR jobs (or other HDFS clients)
operating on the same file paths that could cause a conflict if run
concurrently?

At MR job submission time, the MR client identifies the set of input
splits, which roughly correspond to the the blocks of the input HDFS files.
 (This is a simplified description, because CombineFileInputFormat or your
own custom InputFormat can complicate the picture, but this simplification
is fine for our purposes.)  When map tasks launch, they read from the input
splits (the HDFS file blocks).  If you have an MR job that decides once of
its input splits needs block X, and then another process decides to delete
the HDFS file containing block X before the map task that would read the
block launches, then you'd have a race condition that could trigger a
problem similar to this.

Typically, the solution is to design applications such that concurrent
deletes while reading from a particular HDFS file are not possible.  For
example, you might code file deletion after the MR job that consumes those
files, so that you know nothing else is reading while you're trying to
delete.

BlockMissingException could also show up if you've lost all replicas of a
block, but this would be extremely rare for a typical deployment with a
replication factor of 3.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I am getting a BlockMissingException in a fairly simple application with a
> few mappers and reducers (see end of message).
>
> Looking around in the web has not helped much, including JIRA issues
> HDFS-767 and HDFS-1907. The configuration variable
>
>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>
> does not seem to make a difference, either.
>
> The BlockMissingException occurs in some of the runs, while in others
> execution completes normally, which signifies a possible concurrency issue.
>
> Any ideas?
>
> Thanks!
>
>
> org.apache.hadoop.yarn.YarnException:
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-390546703... file=...job.splitmetainfo
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>
>
>

Re: BlockMissingException

Posted by Chris Nauroth <cn...@hortonworks.com>.

Is it possible that you have multiple MR jobs (or other HDFS clients)
operating on the same file paths that could cause a conflict if run
concurrently?

At MR job submission time, the MR client identifies the set of input
splits, which roughly correspond to the the blocks of the input HDFS files.
 (This is a simplified description, because CombineFileInputFormat or your
own custom InputFormat can complicate the picture, but this simplification
is fine for our purposes.)  When map tasks launch, they read from the input
splits (the HDFS file blocks).  If you have an MR job that decides once of
its input splits needs block X, and then another process decides to delete
the HDFS file containing block X before the map task that would read the
block launches, then you'd have a race condition that could trigger a
problem similar to this.

Typically, the solution is to design applications such that concurrent
deletes while reading from a particular HDFS file are not possible.  For
example, you might code file deletion after the MR job that consumes those
files, so that you know nothing else is reading while you're trying to
delete.

BlockMissingException could also show up if you've lost all replicas of a
block, but this would be extremely rare for a typical deployment with a
replication factor of 3.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I am getting a BlockMissingException in a fairly simple application with a
> few mappers and reducers (see end of message).
>
> Looking around in the web has not helped much, including JIRA issues
> HDFS-767 and HDFS-1907. The configuration variable
>
>    - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>
> does not seem to make a difference, either.
>
> The BlockMissingException occurs in some of the runs, while in others
> execution completes normally, which signifies a possible concurrency issue.
>
> Any ideas?
>
> Thanks!
>
>
> org.apache.hadoop.yarn.YarnException:
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-390546703... file=...job.splitmetainfo
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1159)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1013)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:985)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:694)
>         at
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:119)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:904)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:854)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1070)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1066)
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1025)
>
>
>