You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by "Zhang, Xiuzhu(AWF)" <xi...@paypal.com> on 2017/12/14 11:58:03 UTC
Distcp
Hi friends,
I am running distcp between two Hadoop clusters to copy data, unfortunately it is failed but is work within one Hadoop cluster. Could you help me to look at it? I am very thanks you reply.
Looks like the file have copied to task-staging in hdfs and then it failed, I cost a lot of time on it now I very puzzle.
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-12-14 03:43:52 PST INFO [ForkExecutor-0] gobblin.runtime.fork.Fork 452 - Wrapping writer gobblin.writer.PartitionedDataWriter@299fc14f
2017-12-14 03:43:52 PST WARN [ForkExecutor-0] gobblin.writer.RetryWriter$1 95 - Caught exception. This may be retried.
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/data/test.txt, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)
Configuration:
job.name=etndistcp
job.group=etn
job.description=Distcpexample.......
source.filebased.fs.uri=hdfs://10.176.0.184:8020
source.filebased.data.directory=/data
source.class=gobblin.data.management.copy.CopySource
gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
gobblin.dataset.pattern=/data
extract.namespace=gobblin.data.management.copy.extractor
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
writer.destination.type=HDFS
writer.fs.uri=hdfs://10.176.3.115:8020
data.publisher.final.dir=hdfs://10.176.3.115:8020/demo/etnwork/distcp_dest
Thanks,
Ethan
Re: Distcp
Posted by Zhixiong Chen <zh...@linkedin.com>.
Hi Ethan,
If we look at the log
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/demo/etnwork/distcp_src/twitter9d99m1.avro, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:217)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
at
The actual cause of the exception is line 217 of FileAwareInputStreamDataWriter, where you add a print message:
217 System.out.println("******************** copyablefile origin path1 long "+FileSystem.get(new Configuration()).makeQualified(copyableFile.getOrigin().getPath()).toUri());
?
Here, it turns out your default file system configuration is `hadoop-master` (10.176.3.115). But copyableFile has its origin path from the source file system 10.176.0.184. Hence the error message: Wrong FS: hdfs://10.176.0.184:8020/data/test.txt, expected: hdfs://hadoop-master:8020
If you remove line 217, the file could be successfully copied to the destination.
Zhixiong
________________________________
From: Zhang, Xiuzhu(AWF) <xi...@paypal.com>
Sent: Sunday, December 17, 2017 6:17 PM
To: Zhixiong Chen; user@gobblin.incubator.apache.org
Subject: RE: Distcp
Hi Zhixiong,
The version of gobblin:0.10.0 from https://github.com/apache/incubator-gobblin/releases
Ip address of ‘hadoop-master’: 10.176.3.115 which is the destination addr and 10.176.0.184 is source addr.
I add System…sentence under 214 line in FileAwareInputStreamDataWriter.java
214 try {
215
216 System.out.println("******************** copyablefile origin path2 "+copyableFile.getOrigin().getPath());
217 System.out.println("******************** copyablefile origin path1 long "+FileSystem.get(new Configuration()).makeQualified(copyableFile.getOrigin().getPath()).toUri());
218 System.out.println("******************** target path "+this.fs.makeQualified(writeAt).toUri());
219
220 StreamThrottler<GobblinScopeTypes> throttler =
221 this.taskBroker.getSharedResource(new StreamThrottler.Factory<GobblinScopeTypes>(), new EmptyKey());
Detail logs:
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/demo/etnwork/distcp_src/twitter9d99m1.avro, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:217)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
at gobblin.runtime.fork.Fork.run(Fork.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It is throws from https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java at 758 line they are not match
if (thisScheme.equalsIgnoreCase(thatScheme)) {// schemes match
You mean compile all of source code then just copy jars under gobblin lib directory? Need to change any other file?
Thanks,
Ethan
From: Zhixiong Chen [mailto:zhchen@linkedin.com]
Sent: 2017年12月16日 3:13
To: Zhang, Xiuzhu(AWF) <xi...@paypal.com>; user@gobblin.incubator.apache.org
Subject: Re: Distcp
Hi Ethan,
Can you provide us the following information:
- The version of gobblin
- ip address that corresponds to `hadoop-master`
- more log information after `at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)`
Where do you print this info?
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
I saw you're still using gobblin version-0, which is not maintained any more. You might think about upgrading it to the incubator-version: https://github.com/apache/incubator-gobblin
Zhixiong
________________________________
From: Zhang, Xiuzhu(AWF) <xi...@paypal.com>>
Sent: Thursday, December 14, 2017 3:58 AM
To: user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>
Subject: Distcp
Hi friends,
I am running distcp between two Hadoop clusters to copy data, unfortunately it is failed but is work within one Hadoop cluster. Could you help me to look at it? I am very thanks you reply.
Looks like the file have copied to task-staging in hdfs and then it failed, I cost a lot of time on it now I very puzzle.
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-12-14 03:43:52 PST INFO [ForkExecutor-0] gobblin.runtime.fork.Fork 452 - Wrapping writer gobblin.writer.PartitionedDataWriter@299fc14f<ma...@299fc14f>
2017-12-14 03:43:52 PST WARN [ForkExecutor-0] gobblin.writer.RetryWriter$1 95 - Caught exception. This may be retried.
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/data/test.txt, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)
Configuration:
job.name=etndistcp
job.group=etn
job.description=Distcpexample.......
source.filebased.fs.uri=hdfs://10.176.0.184:8020
source.filebased.data.directory=/data
source.class=gobblin.data.management.copy.CopySource
gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
gobblin.dataset.pattern=/data
extract.namespace=gobblin.data.management.copy.extractor
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
writer.destination.type=HDFS
writer.fs.uri=hdfs://10.176.3.115:8020
data.publisher.final.dir=hdfs://10.176.3.115:8020/demo/etnwork/distcp_dest
Thanks,
Ethan
RE: Distcp
Posted by "Zhang, Xiuzhu(AWF)" <xi...@paypal.com>.
Hi Zhixiong,
The version of gobblin:0.10.0 from https://github.com/apache/incubator-gobblin/releases
Ip address of ‘hadoop-master’: 10.176.3.115 which is the destination addr and 10.176.0.184 is source addr.
I add System…sentence under 214 line in FileAwareInputStreamDataWriter.java
214 try {
215
216 System.out.println("******************** copyablefile origin path2 "+copyableFile.getOrigin().getPath());
217 System.out.println("******************** copyablefile origin path1 long "+FileSystem.get(new Configuration()).makeQualified(copyableFile.getOrigin().getPath()).toUri());
218 System.out.println("******************** target path "+this.fs.makeQualified(writeAt).toUri());
219
220 StreamThrottler<GobblinScopeTypes> throttler =
221 this.taskBroker.getSharedResource(new StreamThrottler.Factory<GobblinScopeTypes>(), new EmptyKey());
Detail logs:
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/demo/etnwork/distcp_src/twitter9d99m1.avro, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:217)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
at gobblin.runtime.fork.Fork.run(Fork.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It is throws from https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java at 758 line they are not match
if (thisScheme.equalsIgnoreCase(thatScheme)) {// schemes match
You mean compile all of source code then just copy jars under gobblin lib directory? Need to change any other file?
Thanks,
Ethan
From: Zhixiong Chen [mailto:zhchen@linkedin.com]
Sent: 2017年12月16日 3:13
To: Zhang, Xiuzhu(AWF) <xi...@paypal.com>; user@gobblin.incubator.apache.org
Subject: Re: Distcp
Hi Ethan,
Can you provide us the following information:
- The version of gobblin
- ip address that corresponds to `hadoop-master`
- more log information after `at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)`
Where do you print this info?
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
I saw you're still using gobblin version-0, which is not maintained any more. You might think about upgrading it to the incubator-version: https://github.com/apache/incubator-gobblin
Zhixiong
________________________________
From: Zhang, Xiuzhu(AWF) <xi...@paypal.com>>
Sent: Thursday, December 14, 2017 3:58 AM
To: user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>
Subject: Distcp
Hi friends,
I am running distcp between two Hadoop clusters to copy data, unfortunately it is failed but is work within one Hadoop cluster. Could you help me to look at it? I am very thanks you reply.
Looks like the file have copied to task-staging in hdfs and then it failed, I cost a lot of time on it now I very puzzle.
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-12-14 03:43:52 PST INFO [ForkExecutor-0] gobblin.runtime.fork.Fork 452 - Wrapping writer gobblin.writer.PartitionedDataWriter@299fc14f<ma...@299fc14f>
2017-12-14 03:43:52 PST WARN [ForkExecutor-0] gobblin.writer.RetryWriter$1 95 - Caught exception. This may be retried.
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/data/test.txt, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)
Configuration:
job.name=etndistcp
job.group=etn
job.description=Distcpexample.......
source.filebased.fs.uri=hdfs://10.176.0.184:8020
source.filebased.data.directory=/data
source.class=gobblin.data.management.copy.CopySource
gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
gobblin.dataset.pattern=/data
extract.namespace=gobblin.data.management.copy.extractor
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
writer.destination.type=HDFS
writer.fs.uri=hdfs://10.176.3.115:8020
data.publisher.final.dir=hdfs://10.176.3.115:8020/demo/etnwork/distcp_dest
Thanks,
Ethan
Re: Distcp
Posted by Zhixiong Chen <zh...@linkedin.com>.
Hi Ethan,
Can you provide us the following information:
- The version of gobblin
- ip address that corresponds to `hadoop-master`
- more log information after `at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)`
Where do you print this info?
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
I saw you're still using gobblin version-0, which is not maintained any more. You might think about upgrading it to the incubator-version: https://github.com/apache/incubator-gobblin
Zhixiong
________________________________
From: Zhang, Xiuzhu(AWF) <xi...@paypal.com>
Sent: Thursday, December 14, 2017 3:58 AM
To: user@gobblin.incubator.apache.org
Subject: Distcp
Hi friends,
I am running distcp between two Hadoop clusters to copy data, unfortunately it is failed but is work within one Hadoop cluster. Could you help me to look at it? I am very thanks you reply.
Looks like the file have copied to task-staging in hdfs and then it failed, I cost a lot of time on it now I very puzzle.
******************** copyablefile origin path hdfs://10.176.0.184:8020/data/test.txt
******************** target path hdfs://10.176.3.115:8020/home/etn/programs/gobblin-dist/workdir/task-staging/job_etndistcp_1513251826306/task_etndistcp_1513251826306_0/attempt_local305940860_0001_m_000000_0/test.txt
Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-12-14 03:43:52 PST INFO [ForkExecutor-0] gobblin.runtime.fork.Fork 452 - Wrapping writer gobblin.writer.PartitionedDataWriter@299fc14f
2017-12-14 03:43:52 PST WARN [ForkExecutor-0] gobblin.writer.RetryWriter$1 95 - Caught exception. This may be retried.
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.176.0.184:8020/data/test.txt, expected: hdfs://hadoop-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:644)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:464)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:222)
Configuration:
job.name=etndistcp
job.group=etn
job.description=Distcpexample.......
source.filebased.fs.uri=hdfs://10.176.0.184:8020
source.filebased.data.directory=/data
source.class=gobblin.data.management.copy.CopySource
gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
gobblin.dataset.pattern=/data
extract.namespace=gobblin.data.management.copy.extractor
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
writer.destination.type=HDFS
writer.fs.uri=hdfs://10.176.3.115:8020
data.publisher.final.dir=hdfs://10.176.3.115:8020/demo/etnwork/distcp_dest
Thanks,
Ethan