You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/19 20:27:18 UTC

The reduce copier failed

Hi
In the middle of a map-reduce job I get

map 20% reduce 6%
...
The reduce copier failed
....
map 20% reduce 0%
map 20% reduce 1%

map 20% reduce 2%
map 20% reduce 3%
 

Does that imply a *retry* process? Or I have to be worried about that message?


Regards,
Mahmood

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Rather than memory problem, it was a disk problem. I made more free spaces and it fixed


 
Regards,
Mahmood



On Saturday, March 22, 2014 8:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22
 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO
 mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Rather than memory problem, it was a disk problem. I made more free spaces and it fixed


 
Regards,
Mahmood



On Saturday, March 22, 2014 8:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22
 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO
 mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Rather than memory problem, it was a disk problem. I made more free spaces and it fixed


 
Regards,
Mahmood



On Saturday, March 22, 2014 8:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22
 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO
 mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Rather than memory problem, it was a disk problem. I made more free spaces and it fixed


 
Regards,
Mahmood



On Saturday, March 22, 2014 8:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22
 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO
 mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Really stuck at this step. I have test with smaller data set and it works. Now I am using wikipedia articles (46GB) with 600 chunks (each 64MB)

I have set number of mappers and reducers to 1 to ensure consistency and I am running on a local node. Why reducer doesn't report anything within 600 seconds??


14/03/22 15:00:51 INFO mapred.JobClient:  map 15% reduce 5%
14/03/22 15:18:43 INFO mapred.JobClient:  map 16% reduce 5%
14/03/22 15:46:38 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_m_000118_0, Status : FAILED
Task attempt_201403212248_0002_m_000118_0 failed to report status for 600 seconds. Killing!
14/03/22 15:48:54 INFO mapred.JobClient:  map 17% reduce 5%
14/03/22 16:06:32 INFO mapred.JobClient:  map 18% reduce 5%
14/03/22 16:07:08 INFO mapred.JobClient:  map 18% reduce 6%
14/03/22 16:24:09 INFO mapred.JobClient:  map 19% reduce 6%
14/03/22 16:41:58 INFO mapred.JobClient:  map 20% reduce 6%
14/03/22 16:55:13 INFO mapred.JobClient: Task Id : attempt_201403212248_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201403212248_0002_r_000000_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403212248_0002/attempt_201403212248_0002_r_000000_0/output/map_107.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690)

attempt_201403212248_0002_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201403212248_0002_r_000000_0: log4j:WARN Please initialize the log4j system properly.
14/03/22 16:55:15 INFO mapred.JobClient:  map 20% reduce 0%
14/03/22 16:55:34 INFO mapred.JobClient:  map 20% reduce 1%





 
Regards,
Mahmood



On Saturday, March 22, 2014 10:27 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

Again I got the same error and it says

The reducer copier failed
...
could not find any valid local directory for file /tmp/hadoop-hadoop/....map_150.out

Searching the web shows that I have to clean up the /tmp/hadoop-hadoop folder but the total size of this folder is 800KB with 1100 files. Does that really matter?


 
Regards,
Mahmood



On Friday, March 21, 2014 3:52 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

OK it seems that there was a "free disk space" issue.
I made more spaces and running again.


 
Regards,
Mahmood



On Friday, March 21, 2014 11:43 AM, shashwat shriparv <dw...@gmail.com> wrote:
 
Check if the tmp dir, hdfs remaining or log directory are getting filled up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

that imply a *retry* process? Or I have to be wo





Warm Regards_∞_
Shashwat Shriparv

Re: The reduce copier failed

Posted by shashwat shriparv <dw...@gmail.com>.

Check if the tmp dir, hdfs remaining or log directory are getting filled
up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> that imply a *retry* process? Or I have to be wo






*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>

Re: The reduce copier failed

Posted by shashwat shriparv <dw...@gmail.com>.

Check if the tmp dir, hdfs remaining or log directory are getting filled
up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> that imply a *retry* process? Or I have to be wo






*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>

Re: The reduce copier failed

Posted by shashwat shriparv <dw...@gmail.com>.

Check if the tmp dir, hdfs remaining or log directory are getting filled
up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> that imply a *retry* process? Or I have to be wo






*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>

Re: The reduce copier failed

Posted by shashwat shriparv <dw...@gmail.com>.

Check if the tmp dir, hdfs remaining or log directory are getting filled
up while this job runs..

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan <nt...@yahoo.com>wrote:

> that imply a *retry* process? Or I have to be wo






*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

How can I find the reason on why the reduce copier failed?


 
Regards,
Mahmood



On Thursday, March 20, 2014 12:17 PM, Harsh J <ha...@cloudera.com> wrote:
 
At the end it says clearly that the job has failed.


On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

How can I find the reason on why the reduce copier failed?


 
Regards,
Mahmood



On Thursday, March 20, 2014 12:17 PM, Harsh J <ha...@cloudera.com> wrote:
 
At the end it says clearly that the job has failed.


On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

How can I find the reason on why the reduce copier failed?


 
Regards,
Mahmood



On Thursday, March 20, 2014 12:17 PM, Harsh J <ha...@cloudera.com> wrote:
 
At the end it says clearly that the job has failed.


On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

How can I find the reason on why the reduce copier failed?


 
Regards,
Mahmood



On Thursday, March 20, 2014 12:17 PM, Harsh J <ha...@cloudera.com> wrote:
 
At the end it says clearly that the job has failed.


On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

At the end it says clearly that the job has failed.

On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

At the end it says clearly that the job has failed.

On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

At the end it says clearly that the job has failed.

On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

At the end it says clearly that the job has failed.

On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
> After multiple messages, it says that the job has been completed. I really
> wonder if the job has been truly completed or failed.
>
> 14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
> 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
> 14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
> 14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
> 14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
> 14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
> 14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
> 14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters
> 14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
> 14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized
> bytes=4300573634
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
> 14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
> 14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=62965284864
> 14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
> 14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
>     at
> org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
> Regards,
> Mahmood
>
>
> On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
> While it does mean a retry, if the job eventually fails (after finite
> retries all fail as well), then you have a problem to investigate. If
> the job eventually succeeded, then this may have been a transient
> issue. Worth investigating either way.
>
> On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>> Hi
>> In the middle of a map-reduce job I get
>>
>> map 20% reduce 6%
>> ...
>> The reduce copier failed
>> ....
>> map 20% reduce 0%
>> map 20% reduce 1%
>> map 20% reduce 2%
>> map 20% reduce 3%
>>
>>
>> Does that imply a *retry* process? Or I have to be worried about that
>> message?
>>
>> Regards,
>> Mahmood
>
>
>
>
> --
> Harsh J
>



-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

After multiple messages, it says that the job has been completed. I really wonder if the job has been truly completed or failed.

14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized bytes=4300573634
14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=62965284864
14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Regards,
Mahmood

On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

After multiple messages, it says that the job has been completed. I really wonder if the job has been truly completed or failed.

14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized bytes=4300573634
14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=62965284864
14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Regards,
Mahmood

On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

After multiple messages, it says that the job has been completed. I really wonder if the job has been truly completed or failed.

14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized bytes=4300573634
14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=62965284864
14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Regards,
Mahmood

On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Mahmood Naderan <nt...@yahoo.com>.

After multiple messages, it says that the job has been completed. I really wonder if the job has been truly completed or failed.

14/03/20 03:49:04 INFO mapred.JobClient:  map 50% reduce 0%
14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001
14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20
14/03/20 03:49:20 INFO mapred.JobClient:   Job Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Launched reduce tasks=4
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=121826447
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/03/20 03:49:20 INFO mapred.JobClient:     Launched map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Data-local map tasks=357
14/03/20 03:49:20 INFO mapred.JobClient:     Failed reduce tasks=1
14/03/20 03:49:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=27097157
14/03/20 03:49:20 INFO mapred.JobClient:   FileSystemCounters
14/03/20 03:49:20 INFO mapred.JobClient:     HDFS_BYTES_READ=23648804348
14/03/20 03:49:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4320784806
14/03/20 03:49:20 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 03:49:20 INFO mapred.JobClient:     Bytes Read=23648753804
14/03/20 03:49:20 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 03:49:20 INFO mapred.JobClient:     Map output materialized bytes=4300573634
14/03/20 03:49:20 INFO mapred.JobClient:     Combine output records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map input records=7131117
14/03/20 03:49:20 INFO mapred.JobClient:     Spilled Records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     Map output bytes=4296978520
14/03/20 03:49:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=62965284864
14/03/20 03:49:20 INFO mapred.JobClient:     Combine input records=0
14/03/20 03:49:20 INFO mapred.JobClient:     Map output records=903190
14/03/20 03:49:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=45981
Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187)
    at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Regards,
Mahmood

On Thursday, March 20, 2014 3:41 AM, Harsh J <ha...@cloudera.com> wrote:
While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J

Re: The reduce copier failed

Posted by Harsh J <ha...@cloudera.com>.

While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
> Hi
> In the middle of a map-reduce job I get
>
> map 20% reduce 6%
> ...
> The reduce copier failed
> ....
> map 20% reduce 0%
> map 20% reduce 1%
> map 20% reduce 2%
> map 20% reduce 3%
>
>
> Does that imply a *retry* process? Or I have to be worried about that
> message?
>
> Regards,
> Mahmood

-- 
Harsh J