You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by "Zhang, Xiuzhu(AWF)" <xi...@paypal.com> on 2017/12/13 01:54:28 UTC

distcp in hadoop cluster

Hi guys,

I am using gobblin distcp feature to copy files between two Hadoop clusters and I encounter strange problem but it is can work winthin one Hadoop cluster.

I am very appreciate you can give me directions and I am very like to communicate with all of you on technology.

[cid:image001.png@01D373F7.B9D819A0]

[cid:image002.png@01D373F7.B9D819A0]

It is configured by:

job.name=ETN DISTCP
job.group=etn
job.description=Distcp example.......
#job.lock.enabled=false

#source.filebased.fs.uri=hdfs://10.176.20.39:9000
#source.filebased.data.directory=/user/ethan/distcp_src/

source.filebased.fs.uri=hdfs://10.176.0.184:8020
source.filebased.data.directory=/data/

source.class=gobblin.data.management.copy.CopySource
gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
#gobblin.dataset.pattern=/user/ethan/distcp_src/
gobblin.dataset.pattern=/data/

extract.namespace=gobblin.data.management.copy.extractor
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
#writer.builder.class=gobblin.data.management.copy.writer.TarArchiveInputStreamDataWriterBuilder
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher

writer.destination.type=HDFS

data.publisher.final.dir=hdfs://10.176.4.40:9000/user/ethan/dest/

Thanks,
Ethan