You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by David Parks <da...@yahoo.com> on 2013/02/11 07:02:43 UTC

File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the
cleanup() method?

 

Ive got a reducer that is downloading a few 10s of millions of images from
a set of URLs feed to it.

 

To be efficient I run many connections in parallel, but limit connections
per domain and frequency of connections.

 

In order to do that efficiently I read in many URLs from the reduce method
and queue them in a processing queue, so at some point we read in all the
data and Hadoop calls the cleanup()  method where I block until all threads
have finished processing. 

 

We may continue processing and writing results (in a synchronized manner)
for 20 or 30 minutes after Hadoop reports 100% input records delivered, then
at the end, my code appears to exit normally and I get this exception
immediately after:

 

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor
Main Loop): Processing complete, shut down normally
1

2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logsÊ¼ truncater with mapRetainSize=-1 and
reduceRetainSize=-1

2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.

2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Got UserName hadoop for UID 106 from the native implementation

2013-02-11 05:15:23,687 ERROR
org.apache.hadoop.security.UserGroupInformation (main):
PriviledgedActionException as:hadoop
cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop

.hdfs.server.namenode.LeaseExpiredException: No lease on
/frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/p
art-r-00002 File does not exist. Holder
DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
java:1642)

 

I have suspicion that there are some subtle rules of Hadoops Im violating
here.

Re: File does not exist on part-r-00000 file after reducer runs

Posted by Robert Evans <ev...@yahoo-inc.com>.

I am not sure everything that may be causing this, especially because the stack trace is cut off. Your file lease has expired on the output file.  Typically the client is supposed to keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting this problem.  It could also be somehow related to the OutputCommitter in another task deleting the file out from under the task.

--Bobby

From: David Parks <da...@yahoo.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, February 11, 2013 12:02 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a set of URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections.

In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup()  method where I block until all threads have finished processing.

We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.

Re: File does not exist on part-r-00000 file after reducer runs

Posted by Robert Evans <ev...@yahoo-inc.com>.

I am not sure everything that may be causing this, especially because the stack trace is cut off. Your file lease has expired on the output file.  Typically the client is supposed to keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting this problem.  It could also be somehow related to the OutputCommitter in another task deleting the file out from under the task.

--Bobby

From: David Parks <da...@yahoo.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, February 11, 2013 12:02 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a set of URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections.

In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup()  method where I block until all threads have finished processing.

We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.

Re: File does not exist on part-r-00000 file after reducer runs

Posted by Robert Evans <ev...@yahoo-inc.com>.

I am not sure everything that may be causing this, especially because the stack trace is cut off. Your file lease has expired on the output file.  Typically the client is supposed to keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting this problem.  It could also be somehow related to the OutputCommitter in another task deleting the file out from under the task.

--Bobby

From: David Parks <da...@yahoo.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, February 11, 2013 12:02 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a set of URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections.

In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup()  method where I block until all threads have finished processing.

We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.

Re: File does not exist on part-r-00000 file after reducer runs

Posted by Robert Evans <ev...@yahoo-inc.com>.

I am not sure everything that may be causing this, especially because the stack trace is cut off. Your file lease has expired on the output file.  Typically the client is supposed to keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting this problem.  It could also be somehow related to the OutputCommitter in another task deleting the file out from under the task.

--Bobby

From: David Parks <da...@yahoo.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, February 11, 2013 12:02 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a set of URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections.

In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup()  method where I block until all threads have finished processing.

We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.