You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Selby <ms...@unseelie.name> on 2016/02/25 02:16:54 UTC

hadoop distcp and hbase ExportSnapshot hdfs replication factor question.

I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I 
am backing up to a remote data center that has many fewer machines with 
a higher per disk density.

The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.

When I run distcp on the primary cluster specifying the remote are the 
source, and I DO NOT specify preserve replication factor as an argument, 
I still get 3 replicas on the remote.

All my HBase snapshots that are copied from the primary to the backup 
also end up with h-files that have a replication factor of 3.

As a test I ran distcp from the backup pulling from the primary and this 
did result in a replication factor of 2. I have many fewer resources on 
the backup and think that it would be faster to perform the large copy 
with a larger number of machines.

As well I can not pull HBase snapshots from the backup cluster. The 
ExportSnapshot utility does not support this.

Does anyone know if it is possible to distcp to another cluster that has 
a smaller replication factor and have that take effect.

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org