You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiang Li (JIRA)" <ji...@apache.org> on 2016/10/28 02:10:58 UTC

[jira] [Updated] (HBASE-16959) Export snapshot to local file system of a single node

     [ https://issues.apache.org/jira/browse/HBASE-16959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiang Li updated HBASE-16959:
-----------------------------
    Summary: Export snapshot to local file system of a single node  (was: Export snapshot to local file system)

> Export snapshot to local file system of a single node
> -----------------------------------------------------
>
>                 Key: HBASE-16959
>                 URL: https://issues.apache.org/jira/browse/HBASE-16959
>             Project: HBase
>          Issue Type: New Feature
>          Components: snapshots
>            Reporter: Xiang Li
>            Priority: Critical
>
> ExportSnapshot allows uses to specify "file://" in "copy-to".
> Based on the implementation (use Map jobs), it works as follow
> (1) The manifest of the snapshot(.hbase-snapshot) is exported to the local file system of the HBase client node where the command is issued
> (2) The data of the snapshot(archive)  is exported to the local file system of the nodes where the map jobs run, so spread everywhere.
> That causes 2 problems we meet so far:
> (1) The last step to verify the snapshot integrity fails, due to that not all the data can be found on the HBase client node where the command is issued. "-no-target-verify" can be of help here to suppress the verification, but it is not a good idea
> (2) When the HBase client (where the command is issued) is also a NodeManager of Yarn, and it happens to have a map job (to write data of snapshot) running on it, the "copy-to" directory will be created firstly when writing the manifest by user=hbase and then user=yarn(if it is not controlled) will try to write data into it. If the directory permission is not set properly, let say, umask = 022, both hbase and yarn are in hadoop group, the "copy-to" is created with no write permission(777-022=755, so rwxr-xr-x) for the same group, user=yarn can not write data into the "copy-to" directory, as it is created by user=hbase. We have the following exception
> {code}
> Error: java.io.IOException: Mkdirs failed to create file:/tmp/snap_export/archive/data/default/table_xxx/regionid_xxx/info (exists=false, cwd=file:/hadoop/yarn/local/usercache/hbase/appcache/application_1477577812726_0001/container_1477577812726_0001_01_000004)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:275)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:193)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:119)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> We can control the permission to resolve that, but it is not a good idea either.
> *Propoal*
> Add reduce to move all "distributed" data of the snapshot to the HBase client node where the command is issued, to be together with the manifest of the snapshot. That can resolve the verification problem above in (1)
> For problem (2), have no idea so far



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)