You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "mazhiyong (JIRA)" <ji...@apache.org> on 2017/09/07 05:44:00 UTC
[jira] [Created] (GOBBLIN-242) distcp error
java.lang.IllegalArgumentException: Wrong FS:
hdfs://HDFS_A/data/gobblin-current.log, expected: hdfs://HDFS_B
mazhiyong created GOBBLIN-242:
---------------------------------
Summary: distcp error java.lang.IllegalArgumentException: Wrong FS: hdfs://HDFS_A/data/gobblin-current.log, expected: hdfs://HDFS_B
Key: GOBBLIN-242
URL: https://issues.apache.org/jira/browse/GOBBLIN-242
Project: Apache Gobblin
Issue Type: Bug
Reporter: mazhiyong
I am use gobblin-distcp copy data from HDFS_A to HDFS_B.
My gobblin deploy in Hadoop_A(contain Yarn_A, HDFS_A)
When i run the gobblin-distcp job copy data of HDFS_A to HDFS_B is successfully.
But, i run the gobblin-distcp job copy data of HDFS_B to HDFS_A always failed.
*the container log*
2017-09-07 10:12:56,022 INFO [main] gobblin.runtime.TaskExecutor: Executing task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,076 INFO [TaskExecutor-0] gobblin.runtime.TaskExecutor: Submitting fork 0 of task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,089 INFO [main] gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: Waiting for submitted tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 to complete in container attempt_1503884889988_9291_m_000000_0...
2017-09-07 10:12:56,089 INFO [main] gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: 1 out of 1 tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 are running in container attempt_1503884889988_9291_m_000000_0
2017-09-07 10:12:56,111 INFO [ForkExecutor-0] gobblin.runtime.TaskContext: Found configured writer builder as gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Extracted 1 data records
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Row quality checker finished with results:
2017-09-07 10:12:56,149 INFO [ForkExecutor-0] gobblin.runtime.fork.Fork-0: Wrapping writer gobblin.writer.PartitionedDataWriter@2774ab51
2017-09-07 10:12:56,225 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.
{color:red}java.lang.IllegalArgumentException: Wrong FS: hdfs://HDFS_B/data/test/gobblin-current.log, expected: hdfs://HDFS_A{color}
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:218)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
at gobblin.runtime.fork.Fork.run(Fork.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:57,227 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.
java.io.IOException: gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter can only process one file.
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:162)
at gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
at gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
at gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
at gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
at gobblin.runtime.fork.Fork.run(Fork.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:59,228 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: Caught exception. This may be retried.
myjob config
job.name=distcp-hdfs-to-yarnhdfs
job.group=distcp-hdfs-to-yarnhdfs
job.description=distcp
job.class=gobblin.azkaban.AzkabanJobLauncher
source.class=gobblin.data.management.copy.CopySource
source.filebased.fs.uri=hdfs://HDFA_B
gobblin.dataset.pattern=/data/test/*.log
#gobblin.dataset.pattern=/data/huiting_3000h_test_set/*.tar.gz
#gobblin.dataset.pattern=/gobblin/distcp/data/*.tar.gz
extract.namespace=gobblin.copy
converter.classes=gobblin.converter.IdentityConverter
writer.destination.type=HDFS
writer.fs.uri=hdfs://HDFS_A
#writer.output.format=txt
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
writer.file.path.type=tablename
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
data.publisher.final.dir=/gobblin/data
data.publisher.final.name=mz
distcp.persist.dir=/gobblin/distcp/data
task.maxretries=0
workunit.retry.enabled=false
# Intermediate steps configuration.
work.dir=/gobblin/distcp
state.store.dir=${work.dir}/state-store
writer.staging.dir=${work.dir}/taskStaging
writer.output.dir=${work.dir}/taskOutput
mr.job.root.dir=${work.dir}/working
job.lock.enabled=true
job.lock.dir=${work.dir}/locks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)