You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by 王勇强 <wa...@163.com> on 2015/11/07 07:12:08 UTC
improvements on snapshot
the version is hbase0.98.10, find some improvements on snapshot:
1. in class SnapshotManager, line 725, we make a clone of HTableDescriptor, the clone is not so need to make
2 class HBaseAdmin, method
public void restoreSnapshot(final String snapshotName, boolean takeFailSafeSnapshot)
{
TableName tableName = null;
for (SnapshotDescription snapshotInfo: listSnapshots()) {
if (snapshotInfo.getName().equals(snapshotName)) {
tableName = TableName.valueOf(snapshotInfo.getTable());
break;
}
}
......
}
must get all snapshot from master and then get the one we need, can improve this, master just return the snapshot client needed
3. there should better has a method 'isSnapShotExists' in HBaseAdmin
Re: Re: improvements on snapshot
Posted by Jesse Yates <je...@gmail.com>.
I'd be fine if you wanted to file a JIRA and post a patch for #4. Would
certainly be cleaner. However, as Ted said, I don't know if its going to be
any faster or better, otherwise.
On Sat, Nov 7, 2015 at 10:55 AM Ted Yu <yu...@gmail.com> wrote:
> For #1, note that the code is in else block. Meaning tableName table
> doesn't exist.
>
> cloneTableSchema() is needed because table name would be different.
>
> For #4, is checkAndDeleteFiles() a hot spot ? If not, there is not much
> gain in refactoring.
>
> Cheers
>
> On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:
>
> > 1. i download the src code from:
> > http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
> > class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
> > method: public void restoreSnapshot(SnapshotDescription reqSnapshot)
> > throws IOException
> > line: 725
> > code:
> >
> >
> > if (cpHost != null) {
> > cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
> > }
> > } else {
> > HTableDescriptor htd =
> > RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> > if (cpHost != null) {
> > cpHost.preCloneSnapshot(reqSnapshot, htd);
> > }
> > cloneSnapshot(fsSnapshot, htd);
> > LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> > tableName);
> >
> >
> > if (cpHost != null) {
> > cpHost.postCloneSnapshot(reqSnapshot, htd);
> > }
> >
> > i think the clone:
> > HTableDescriptor htd =
> > RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> > can be removed.
> >
> >
> > 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> > 3. add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> > there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> > before we create a snopshot of a table, we can check if there is a
> > snopshot with same name
> > 4. simplify the code
> > version: hbase 0.98.15
> > class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> > method:private boolean checkAndDeleteFiles(List<FileStatus> files)
> > line: 235-243
> > code:
> >
> >
> > // trace which cleaner is holding on to each file
> > if (LOG.isTraceEnabled()) {
> > ImmutableSet<FileStatus> filteredFileSet =
> > ImmutableSet.copyOf(filteredFiles);
> > for (FileStatus file : deletableValidFiles) {
> > if (!filteredFileSet.contains(file)) {
> > LOG.trace(file.getPath() + " is not deletable according to:"
> +
> > cleaner);
> > }
> > }
> > }
> > i think these code can use set's difference operation instead, which
> > is more straightforward
> >
> >
> > At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> > >For #1, I took a look at current 0.98 code but didn't find it.
> > >Can you double check and tell us the updated line number ?
> > >
> > >For #2, I think it makes sense - considering there may be large number
> of
> > >snapshots in the cluster.
> > >
> > >For #3, can you tell us the use case for the new API ?
> > >
> > >Cheers
> > >
> > >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> > >
> > >> the version is hbase0.98.10, find some improvements on snapshot:
> > >> 1. in class SnapshotManager, line 725, we make a clone of
> > >> HTableDescriptor, the clone is not so need to make
> > >> 2 class HBaseAdmin, method
> > >> public void restoreSnapshot(final String snapshotName, boolean
> > >> takeFailSafeSnapshot)
> > >> {
> > >> TableName tableName = null;
> > >> for (SnapshotDescription snapshotInfo: listSnapshots()) {
> > >> if (snapshotInfo.getName().equals(snapshotName)) {
> > >> tableName = TableName.valueOf(snapshotInfo.getTable());
> > >> break;
> > >> }
> > >> }
> > >> ......
> > >> }
> > >> must get all snapshot from master and then get the one we need, can
> > >> improve this, master just return the snapshot client needed
> > >>
> > >>
> > >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>
Re: Re: improvements on snapshot
Posted by Ted Yu <yu...@gmail.com>.
For #1, note that the code is in else block. Meaning tableName table
doesn't exist.
cloneTableSchema() is needed because table name would be different.
For #4, is checkAndDeleteFiles() a hot spot ? If not, there is not much
gain in refactoring.
Cheers
On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:
> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
> class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
> method: public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
> line: 725
> code:
>
>
> if (cpHost != null) {
> cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
> }
> } else {
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> if (cpHost != null) {
> cpHost.preCloneSnapshot(reqSnapshot, htd);
> }
> cloneSnapshot(fsSnapshot, htd);
> LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
> if (cpHost != null) {
> cpHost.postCloneSnapshot(reqSnapshot, htd);
> }
>
> i think the clone:
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3. add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
> line: 235-243
> code:
>
>
> // trace which cleaner is holding on to each file
> if (LOG.isTraceEnabled()) {
> ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
> for (FileStatus file : deletableValidFiles) {
> if (!filteredFileSet.contains(file)) {
> LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
> }
> }
> }
> i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor, the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >> public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >> TableName tableName = null;
> >> for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >> if (snapshotInfo.getName().equals(snapshotName)) {
> >> tableName = TableName.valueOf(snapshotInfo.getTable());
> >> break;
> >> }
> >> }
> >> ......
> >> }
> >> must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
回复:Re: Re: improvements on snapshot
Posted by WangYQ <wa...@163.com>.
thanks for all your suggesstions
发自 网易邮箱大师
在2015年11月08日 03:47,Ted Yu 写道:
Current behavior for creating duplicate snapshot is:
hbase(main):005:0> snapshot 'IntegrationTestBigLinkedList', 'snap1'
ERROR: org.apache.hadoop.hbase.snapshot.SnapshotExistsException: Snapshot
'snap1' already stored on the filesystem.
at
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.takeSnapshot(SnapshotManager.java:530)
at
org.apache.hadoop.hbase.master.MasterRpcServices.snapshot(MasterRpcServices.java:1275)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51123)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
bq. before we create a snopshot of a table, we can check if there is a
snopshot with same name
Suppose snapshot name has been taken, are you going to try with different
name blindly ?
The second attempt may receive SnapshotExistsException, right ?
Better approach is to use list_snapshots command to obtain existing
snapshots so that your attempt is more guided.
Cheers
On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:
> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
> class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
> method: public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
> line: 725
> code:
>
>
> if (cpHost != null) {
> cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
> }
> } else {
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> if (cpHost != null) {
> cpHost.preCloneSnapshot(reqSnapshot, htd);
> }
> cloneSnapshot(fsSnapshot, htd);
> LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
> if (cpHost != null) {
> cpHost.postCloneSnapshot(reqSnapshot, htd);
> }
>
> i think the clone:
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3. add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
> line: 235-243
> code:
>
>
> // trace which cleaner is holding on to each file
> if (LOG.isTraceEnabled()) {
> ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
> for (FileStatus file : deletableValidFiles) {
> if (!filteredFileSet.contains(file)) {
> LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
> }
> }
> }
> i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor, the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >> public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >> TableName tableName = null;
> >> for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >> if (snapshotInfo.getName().equals(snapshotName)) {
> >> tableName = TableName.valueOf(snapshotInfo.getTable());
> >> break;
> >> }
> >> }
> >> ......
> >> }
> >> must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
Re: Re: improvements on snapshot
Posted by Ted Yu <yu...@gmail.com>.
Current behavior for creating duplicate snapshot is:
hbase(main):005:0> snapshot 'IntegrationTestBigLinkedList', 'snap1'
ERROR: org.apache.hadoop.hbase.snapshot.SnapshotExistsException: Snapshot
'snap1' already stored on the filesystem.
at
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.takeSnapshot(SnapshotManager.java:530)
at
org.apache.hadoop.hbase.master.MasterRpcServices.snapshot(MasterRpcServices.java:1275)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51123)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
bq. before we create a snopshot of a table, we can check if there is a
snopshot with same name
Suppose snapshot name has been taken, are you going to try with different
name blindly ?
The second attempt may receive SnapshotExistsException, right ?
Better approach is to use list_snapshots command to obtain existing
snapshots so that your attempt is more guided.
Cheers
On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:
> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
> class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
> method: public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
> line: 725
> code:
>
>
> if (cpHost != null) {
> cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
> }
> } else {
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> if (cpHost != null) {
> cpHost.preCloneSnapshot(reqSnapshot, htd);
> }
> cloneSnapshot(fsSnapshot, htd);
> LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
> if (cpHost != null) {
> cpHost.postCloneSnapshot(reqSnapshot, htd);
> }
>
> i think the clone:
> HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3. add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
> line: 235-243
> code:
>
>
> // trace which cleaner is holding on to each file
> if (LOG.isTraceEnabled()) {
> ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
> for (FileStatus file : deletableValidFiles) {
> if (!filteredFileSet.contains(file)) {
> LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
> }
> }
> }
> i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor, the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >> public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >> TableName tableName = null;
> >> for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >> if (snapshotInfo.getName().equals(snapshotName)) {
> >> tableName = TableName.valueOf(snapshotInfo.getTable());
> >> break;
> >> }
> >> }
> >> ......
> >> }
> >> must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
Re:Re: improvements on snapshot
Posted by WangYQ <wa...@163.com>.
1. i download the src code from: http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
method: public void restoreSnapshot(SnapshotDescription reqSnapshot) throws IOException
line: 725
code:
if (cpHost != null) {
cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
}
} else {
HTableDescriptor htd = RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
if (cpHost != null) {
cpHost.preCloneSnapshot(reqSnapshot, htd);
}
cloneSnapshot(fsSnapshot, htd);
LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" + tableName);
if (cpHost != null) {
cpHost.postCloneSnapshot(reqSnapshot, htd);
}
i think the clone:
HTableDescriptor htd = RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
can be removed.
2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
3. add method 'isSnapShotExists' in HBaseAdmin to help us to know if there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
before we create a snopshot of a table, we can check if there is a snopshot with same name
4. simplify the code
version: hbase 0.98.15
class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
method:private boolean checkAndDeleteFiles(List<FileStatus> files)
line: 235-243
code:
// trace which cleaner is holding on to each file
if (LOG.isTraceEnabled()) {
ImmutableSet<FileStatus> filteredFileSet = ImmutableSet.copyOf(filteredFiles);
for (FileStatus file : deletableValidFiles) {
if (!filteredFileSet.contains(file)) {
LOG.trace(file.getPath() + " is not deletable according to:" + cleaner);
}
}
}
i think these code can use set's difference operation instead, which is more straightforward
At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
>For #1, I took a look at current 0.98 code but didn't find it.
>Can you double check and tell us the updated line number ?
>
>For #2, I think it makes sense - considering there may be large number of
>snapshots in the cluster.
>
>For #3, can you tell us the use case for the new API ?
>
>Cheers
>
>On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
>
>> the version is hbase0.98.10, find some improvements on snapshot:
>> 1. in class SnapshotManager, line 725, we make a clone of
>> HTableDescriptor, the clone is not so need to make
>> 2 class HBaseAdmin, method
>> public void restoreSnapshot(final String snapshotName, boolean
>> takeFailSafeSnapshot)
>> {
>> TableName tableName = null;
>> for (SnapshotDescription snapshotInfo: listSnapshots()) {
>> if (snapshotInfo.getName().equals(snapshotName)) {
>> tableName = TableName.valueOf(snapshotInfo.getTable());
>> break;
>> }
>> }
>> ......
>> }
>> must get all snapshot from master and then get the one we need, can
>> improve this, master just return the snapshot client needed
>>
>>
>> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
>>
>>
>>
>>
>>
>>
>>
>>
Re: improvements on snapshot
Posted by Ted Yu <yu...@gmail.com>.
For #1, I took a look at current 0.98 code but didn't find it.
Can you double check and tell us the updated line number ?
For #2, I think it makes sense - considering there may be large number of
snapshots in the cluster.
For #3, can you tell us the use case for the new API ?
Cheers
On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> the version is hbase0.98.10, find some improvements on snapshot:
> 1. in class SnapshotManager, line 725, we make a clone of
> HTableDescriptor, the clone is not so need to make
> 2 class HBaseAdmin, method
> public void restoreSnapshot(final String snapshotName, boolean
> takeFailSafeSnapshot)
> {
> TableName tableName = null;
> for (SnapshotDescription snapshotInfo: listSnapshots()) {
> if (snapshotInfo.getName().equals(snapshotName)) {
> tableName = TableName.valueOf(snapshotInfo.getTable());
> break;
> }
> }
> ......
> }
> must get all snapshot from master and then get the one we need, can
> improve this, master just return the snapshot client needed
>
>
> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
>
>
>
>
>
>
>
>