You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Selby <ms...@unseelie.name> on 2016/02/25 02:16:54 UTC
hadoop distcp and hbase ExportSnapshot hdfs replication factor
question.
I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I
am backing up to a remote data center that has many fewer machines with
a higher per disk density.
The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.
When I run distcp on the primary cluster specifying the remote are the
source, and I DO NOT specify preserve replication factor as an argument,
I still get 3 replicas on the remote.
All my HBase snapshots that are copied from the primary to the backup
also end up with h-files that have a replication factor of 3.
As a test I ran distcp from the backup pulling from the primary and this
did result in a replication factor of 2. I have many fewer resources on
the backup and think that it would be faster to perform the large copy
with a larger number of machines.
As well I can not pull HBase snapshots from the backup cluster. The
ExportSnapshot utility does not support this.
Does anyone know if it is possible to distcp to another cluster that has
a smaller replication factor and have that take effect.
Thanks!
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org