You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "mazhiyong (JIRA)" <ji...@apache.org> on 2017/09/07 05:44:00 UTC

[jira] [Created] (GOBBLIN-242) distcp error java.lang.IllegalArgumentException: Wrong FS: hdfs://HDFS_A/data/gobblin-current.log, expected: hdfs://HDFS_B

mazhiyong created GOBBLIN-242:
---------------------------------

             Summary: distcp error java.lang.IllegalArgumentException: Wrong FS: hdfs://HDFS_A/data/gobblin-current.log, expected: hdfs://HDFS_B
                 Key: GOBBLIN-242
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-242
             Project: Apache Gobblin
          Issue Type: Bug
            Reporter: mazhiyong


I am use gobblin-distcp copy data from HDFS_A to HDFS_B.
My gobblin deploy in Hadoop_A(contain Yarn_A, HDFS_A) 
When i run the gobblin-distcp job copy data of HDFS_A to HDFS_B is successfully.
But, i run the gobblin-distcp job copy data of HDFS_B to HDFS_A always failed.

*the container log*
2017-09-07 10:12:56,022 INFO [main] gobblin.runtime.TaskExecutor: Executing task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,076 INFO [TaskExecutor-0] gobblin.runtime.TaskExecutor: Submitting fork 0 of task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,089 INFO [main] gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: Waiting for submitted tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 to complete in container attempt_1503884889988_9291_m_000000_0...
2017-09-07 10:12:56,089 INFO [main] gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: 1 out of 1 tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 are running in container attempt_1503884889988_9291_m_000000_0
2017-09-07 10:12:56,111 INFO [ForkExecutor-0] gobblin.runtime.TaskContext: Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Extracted 1 data records
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Row quality checker finished with results: 
2017-09-07 10:12:56,149 INFO [ForkExecutor-0] gobblin.runtime.fork.Fork-0: Wrapping writer gobblin.writer.PartitionedDataWriter@2774ab51
2017-09-07 10:12:56,225 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.
{color:red}java.lang.IllegalArgumentException: Wrong FS: hdfs://HDFS_B/data/test/gobblin-current.log, expected: hdfs://HDFS_A{color}
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648)
	at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468)
	at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:218)
	at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
	at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
	at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
	at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
	at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
	at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
	at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
	at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
	at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
	at com.github.rholder.retry.Retryer.call(Retryer.java:160)
	at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
	at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
	at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
	at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
	at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
	at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
	at gobblin.runtime.fork.Fork.run(Fork.java:180)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:57,227 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.
java.io.IOException: gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter can only process one file.
	at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:162)
	at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
	at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
	at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
	at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
	at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
	at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
	at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
	at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
	at com.github.rholder.retry.Retryer.call(Retryer.java:160)
	at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
	at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
	at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
	at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
	at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
	at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
	at gobblin.runtime.fork.Fork.run(Fork.java:180)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:59,228 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.

myjob config
job.name=distcp-hdfs-to-yarnhdfs
job.group=distcp-hdfs-to-yarnhdfs
job.description=distcp
job.class=gobblin.azkaban.AzkabanJobLauncher

source.class=gobblin.data.management.copy.CopySource
source.filebased.fs.uri=hdfs://HDFA_B
gobblin.dataset.pattern=/data/test/*.log
#gobblin.dataset.pattern=/data/huiting_3000h_test_set/*.tar.gz
#gobblin.dataset.pattern=/gobblin/distcp/data/*.tar.gz

extract.namespace=gobblin.copy

converter.classes=gobblin.converter.IdentityConverter

writer.destination.type=HDFS
writer.fs.uri=hdfs://HDFS_A
#writer.output.format=txt
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
writer.file.path.type=tablename

data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
data.publisher.final.dir=/gobblin/data

data.publisher.final.name=mz

distcp.persist.dir=/gobblin/distcp/data

task.maxretries=0

workunit.retry.enabled=false

# Intermediate steps configuration.
work.dir=/gobblin/distcp
state.store.dir=${work.dir}/state-store
writer.staging.dir=${work.dir}/taskStaging
writer.output.dir=${work.dir}/taskOutput

mr.job.root.dir=${work.dir}/working

job.lock.enabled=true
job.lock.dir=${work.dir}/locks




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)