You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 林家銘 <ro...@gmail.com> on 2015/11/19 13:24:15 UTC

Teragen against the Openstack Swift

Hi

I am trying to generate 100GB data in Swift by Teragen.
My environments
1) All in one Devstack
2) 2T space for Swift with LVM installed
3) 6+1 node for hadoop cluster
4) 64G RAM, 16 Core CPU, 1G NIC for devstack
5) 12G RAM, 4 Core CPU, 1G NIC, 1 local disk for Hadoop cluster nodes.
6) three containers for each Hadoop node, so there are 18 containers

And My mapper number is set to be 100, so for each split, the size is about 1G.

There are a lot of timeout error messages, but such task would be
restarted, and I think that's fine.

The problem is, right after the map progress reach 100%, it started to
move the generated files from a temporary directory to the input
directory which is specified by me.

It seems to be a single thread process, and it always stop at about
the eighth file, no matter how large is the split. Then I can only
have part of generated data in the Swift.
And the following is a cut of the error message

15/11/19 11:01:57 INFO mapreduce.Job:  map 100% reduce 0%
15/11/19 11:10:35 INFO mapreduce.Job: Job job_1447927234554_0002
failed with state FAILED due to: Job commit failed:
java.net.SocketException: HEAD
http://10.10.103.145:8080/v1/AUTH_4e115a735b304c3ca147c4da41e9a5b7/gigantic/HiBench/Terasort/Input1/part-m-00088
failed on exception: java.net.SocketException: Connection reset
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.fs.swift.http.ExceptionDiags.wrapWithMessage(ExceptionDiags.java:90)
        at org.apache.hadoop.fs.swift.http.ExceptionDiags.wrapException(ExceptionDiags.java:76)
        at org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1389)
        at org.apache.hadoop.fs.swift.http.SwiftRestClient.headRequest(SwiftRestClient.java:1016)
        at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.stat(SwiftNativeFileSystemStore.java:258)
        at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:213)
        at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:182)
        at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:173)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:358)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
        at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
        at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Connection reset


So I am wandering what is the cause of this problem. Does anyone has
experiences with Swift/Hadoop integration?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org