You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stan Rosenberg <st...@gmail.com> on 2013/01/04 22:02:37 UTC
sporadic failure
Hi,
Any ideas why a staging directory would suddenly become unavailable
after the completion of the map phase but before the start of the
reduce phase? We noticed a sporadic failure yesterday wherein all the
map tasks completed
successfully and all the reduce tasks failed. Upon examining task
tracker logs, the following exception stack trace was revealed:
2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker:
Error initializing attempt_201211150255_237458_r_000108_1:
java.io.FileNotFoundException: File does not exist:
hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
at org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
This problem doesn't seem relevant to only a specific distribution,
but for completeness we are running CDH3u3.
Thanks!
stan
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Thanks for following up, glad to know it is resolved!
On Mon, Jan 7, 2013 at 6:42 AM, Stan Rosenberg <st...@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 2:44 AM, Harsh J <ha...@cloudera.com> wrote:
>> I'd check the NN audit logs for the file
>> /user/apache/.staging/job_201211150255_237458/job.xml to see when/who
>> deleted it away, perhaps that would give more insight.
>>
>
> The audit logs led to a trail which revealed user error. Thanks Harsh!
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Thanks for following up, glad to know it is resolved!
On Mon, Jan 7, 2013 at 6:42 AM, Stan Rosenberg <st...@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 2:44 AM, Harsh J <ha...@cloudera.com> wrote:
>> I'd check the NN audit logs for the file
>> /user/apache/.staging/job_201211150255_237458/job.xml to see when/who
>> deleted it away, perhaps that would give more insight.
>>
>
> The audit logs led to a trail which revealed user error. Thanks Harsh!
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Thanks for following up, glad to know it is resolved!
On Mon, Jan 7, 2013 at 6:42 AM, Stan Rosenberg <st...@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 2:44 AM, Harsh J <ha...@cloudera.com> wrote:
>> I'd check the NN audit logs for the file
>> /user/apache/.staging/job_201211150255_237458/job.xml to see when/who
>> deleted it away, perhaps that would give more insight.
>>
>
> The audit logs led to a trail which revealed user error. Thanks Harsh!
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Thanks for following up, glad to know it is resolved!
On Mon, Jan 7, 2013 at 6:42 AM, Stan Rosenberg <st...@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 2:44 AM, Harsh J <ha...@cloudera.com> wrote:
>> I'd check the NN audit logs for the file
>> /user/apache/.staging/job_201211150255_237458/job.xml to see when/who
>> deleted it away, perhaps that would give more insight.
>>
>
> The audit logs led to a trail which revealed user error. Thanks Harsh!
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Hi Stan,
I'd check the NN audit logs for the file /user/apache/.staging/
job_201211150255_237458/job.xml to see when/who deleted it away, perhaps
that would give more insight.
On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <st...@gmail.com>wrote:
> Hi,
>
> Any ideas why a staging directory would suddenly become unavailable
> after the completion of the map phase but before the start of the
> reduce phase? We noticed a sporadic failure yesterday wherein all the
> map tasks completed
> successfully and all the reduce tasks failed. Upon examining task
> tracker logs, the following exception stack trace was revealed:
>
> 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker:
> Error initializing attempt_201211150255_237458_r_000108_1:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434)
> at
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
>
> This problem doesn't seem relevant to only a specific distribution,
> but for completeness we are running CDH3u3.
>
> Thanks!
>
> stan
>
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Hi Stan,
I'd check the NN audit logs for the file /user/apache/.staging/
job_201211150255_237458/job.xml to see when/who deleted it away, perhaps
that would give more insight.
On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <st...@gmail.com>wrote:
> Hi,
>
> Any ideas why a staging directory would suddenly become unavailable
> after the completion of the map phase but before the start of the
> reduce phase? We noticed a sporadic failure yesterday wherein all the
> map tasks completed
> successfully and all the reduce tasks failed. Upon examining task
> tracker logs, the following exception stack trace was revealed:
>
> 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker:
> Error initializing attempt_201211150255_237458_r_000108_1:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434)
> at
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
>
> This problem doesn't seem relevant to only a specific distribution,
> but for completeness we are running CDH3u3.
>
> Thanks!
>
> stan
>
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Hi Stan,
I'd check the NN audit logs for the file /user/apache/.staging/
job_201211150255_237458/job.xml to see when/who deleted it away, perhaps
that would give more insight.
On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <st...@gmail.com>wrote:
> Hi,
>
> Any ideas why a staging directory would suddenly become unavailable
> after the completion of the map phase but before the start of the
> reduce phase? We noticed a sporadic failure yesterday wherein all the
> map tasks completed
> successfully and all the reduce tasks failed. Upon examining task
> tracker logs, the following exception stack trace was revealed:
>
> 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker:
> Error initializing attempt_201211150255_237458_r_000108_1:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434)
> at
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
>
> This problem doesn't seem relevant to only a specific distribution,
> but for completeness we are running CDH3u3.
>
> Thanks!
>
> stan
>
--
Harsh J
Re: sporadic failure
Posted by Harsh J <ha...@cloudera.com>.
Hi Stan,
I'd check the NN audit logs for the file /user/apache/.staging/
job_201211150255_237458/job.xml to see when/who deleted it away, perhaps
that would give more insight.
On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <st...@gmail.com>wrote:
> Hi,
>
> Any ideas why a staging directory would suddenly become unavailable
> after the completion of the map phase but before the start of the
> reduce phase? We noticed a sporadic failure yesterday wherein all the
> map tasks completed
> successfully and all the reduce tasks failed. Upon examining task
> tracker logs, the following exception stack trace was revealed:
>
> 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker:
> Error initializing attempt_201211150255_237458_r_000108_1:
> java.io.FileNotFoundException: File does not exist:
>
> hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
> at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434)
> at
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
>
> This problem doesn't seem relevant to only a specific distribution,
> but for completeness we are running CDH3u3.
>
> Thanks!
>
> stan
>
--
Harsh J