You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by James Moore <ja...@gmail.com> on 2008/05/12 04:32:14 UTC

Problems saving to s3 using distcp

Just upgraded to 0.16.4, and tried a distcp to s3.  I'm seeing many
errors - according to the jobtracker, 8,008 files were copied, but
5,880 were skipped.  I'm assume that the number of skipped files needs
to be 0 for a successful copy.  And 56 maps failed (log file given
below).

Is there something special you need to do to have a distcp work?  Run
with only a few maps?  Is S3 not prepared to have that many
simultaneous connections from a single user (cluster runs 74 maps on
19 machines right now)?

2008-05-11 18:55:12,821 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-05-11 18:55:13,367 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2008-05-11 18:55:32,812 WARN org.jets3t.service.S3Service: Encountered
1 S3 Internal Server error(s), will retry in 50ms
2008-05-11 18:55:32,904 INFO org.apache.hadoop.util.CopyFiles: FAIL
linkdb/current/part-00028/data :
org.jets3t.service.io.UnrecoverableIOException: Input stream is not
repeatable as 33554432 bytes have been written, exceeding the
available buffer size of 131072
	at org.jets3t.service.io.RepeatableInputStream.repeatInputStream(RepeatableInputStream.java:92)
	at org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity.writeRequest(RepeatableRequestEntity.java:127)
	at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:495)
	at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1973)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:993)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
	at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:255)
	at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestPut(RestS3Service.java:565)
	at org.jets3t.service.impl.rest.httpclient.RestS3Service.createObjectImpl(RestS3Service.java:962)
	at org.jets3t.service.impl.rest.httpclient.RestS3Service.putObjectImpl(RestS3Service.java:918)
	at org.jets3t.service.S3Service.putObject(S3Service.java:706)
	at org.jets3t.service.S3Service.putObject(S3Service.java:731)
	at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.put(Jets3tFileSystemStore.java:336)
	at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.storeBlock(Jets3tFileSystemStore.java:353)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy2.storeBlock(Unknown Source)
	at org.apache.hadoop.fs.s3.S3OutputStream.endBlock(S3OutputStream.java:172)
	at org.apache.hadoop.fs.s3.S3OutputStream.flush(S3OutputStream.java:138)
	at org.apache.hadoop.fs.s3.S3OutputStream.write(S3OutputStream.java:123)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:310)
	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:421)
	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:221)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)

2008-05-11 18:56:18,719 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: Copied: 3 Skipped: 0 Failed: 1
	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.close(CopyFiles.java:455)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:53)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)

-- 
James Moore | james@restphone.com
blog.restphone.com