You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by 王勇强 <wa...@163.com> on 2015/11/07 07:12:08 UTC

improvements on snapshot

the version is hbase0.98.10, find some improvements on snapshot:
1. in class SnapshotManager, line 725, we make a clone of HTableDescriptor,  the clone is not so need to make
2 class HBaseAdmin, method   
    public void restoreSnapshot(final String snapshotName, boolean takeFailSafeSnapshot)
{
    TableName tableName = null;
    for (SnapshotDescription snapshotInfo: listSnapshots()) {
      if (snapshotInfo.getName().equals(snapshotName)) {
        tableName = TableName.valueOf(snapshotInfo.getTable());
        break;
      }
    }
......
}
  must get all snapshot from master and then get the one we need, can improve this, master just return the snapshot client needed


3. there should better has a method 'isSnapShotExists' in HBaseAdmin








Re: Re: improvements on snapshot

Posted by Jesse Yates <je...@gmail.com>.
I'd be fine if you wanted to file a JIRA and post a patch for #4. Would
certainly be cleaner. However, as Ted said, I don't know if its going to be
any faster or better, otherwise.

On Sat, Nov 7, 2015 at 10:55 AM Ted Yu <yu...@gmail.com> wrote:

> For #1, note that the code is in else block. Meaning tableName table
> doesn't exist.
>
> cloneTableSchema() is needed because table name would be different.
>
> For #4, is checkAndDeleteFiles() a hot spot ? If not, there is not much
> gain in refactoring.
>
> Cheers
>
> On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:
>
> > 1. i download the src code from:
> > http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
> >     class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
> >    method:   public void restoreSnapshot(SnapshotDescription reqSnapshot)
> > throws IOException
> >    line: 725
> >   code:
> >
> >
> >       if (cpHost != null) {
> >         cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
> >       }
> >     } else {
> >       HTableDescriptor htd =
> > RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> >       if (cpHost != null) {
> >         cpHost.preCloneSnapshot(reqSnapshot, htd);
> >       }
> >       cloneSnapshot(fsSnapshot, htd);
> >       LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> > tableName);
> >
> >
> >       if (cpHost != null) {
> >         cpHost.postCloneSnapshot(reqSnapshot, htd);
> >       }
> >
> >      i think the clone:
> >       HTableDescriptor htd =
> > RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
> >       can be removed.
> >
> >
> > 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> > 3.  add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> > there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> > before we create a snopshot of a table, we can check if there is a
> > snopshot with same name
> > 4. simplify the code
> > version: hbase 0.98.15
> > class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> > method:private boolean checkAndDeleteFiles(List<FileStatus> files)
> >     line: 235-243
> >     code:
> >
> >
> >       // trace which cleaner is holding on to each file
> >       if (LOG.isTraceEnabled()) {
> >         ImmutableSet<FileStatus> filteredFileSet =
> > ImmutableSet.copyOf(filteredFiles);
> >         for (FileStatus file : deletableValidFiles) {
> >           if (!filteredFileSet.contains(file)) {
> >             LOG.trace(file.getPath() + " is not deletable according to:"
> +
> > cleaner);
> >           }
> >         }
> >       }
> >      i think these code can use set's difference operation instead, which
> > is more straightforward
> >
> >
> > At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> > >For #1, I took a look at current 0.98 code but didn't find it.
> > >Can you double check and tell us the updated line number ?
> > >
> > >For #2, I think it makes sense - considering there may be large number
> of
> > >snapshots in the cluster.
> > >
> > >For #3, can you tell us the use case for the new API ?
> > >
> > >Cheers
> > >
> > >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> > >
> > >> the version is hbase0.98.10, find some improvements on snapshot:
> > >> 1. in class SnapshotManager, line 725, we make a clone of
> > >> HTableDescriptor,  the clone is not so need to make
> > >> 2 class HBaseAdmin, method
> > >>     public void restoreSnapshot(final String snapshotName, boolean
> > >> takeFailSafeSnapshot)
> > >> {
> > >>     TableName tableName = null;
> > >>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
> > >>       if (snapshotInfo.getName().equals(snapshotName)) {
> > >>         tableName = TableName.valueOf(snapshotInfo.getTable());
> > >>         break;
> > >>       }
> > >>     }
> > >> ......
> > >> }
> > >>   must get all snapshot from master and then get the one we need, can
> > >> improve this, master just return the snapshot client needed
> > >>
> > >>
> > >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>

Re: Re: improvements on snapshot

Posted by Ted Yu <yu...@gmail.com>.
For #1, note that the code is in else block. Meaning tableName table
doesn't exist.

cloneTableSchema() is needed because table name would be different.

For #4, is checkAndDeleteFiles() a hot spot ? If not, there is not much
gain in refactoring.

Cheers

On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:

> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
>     class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
>    method:   public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
>    line: 725
>   code:
>
>
>       if (cpHost != null) {
>         cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
>       }
>     } else {
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       if (cpHost != null) {
>         cpHost.preCloneSnapshot(reqSnapshot, htd);
>       }
>       cloneSnapshot(fsSnapshot, htd);
>       LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
>       if (cpHost != null) {
>         cpHost.postCloneSnapshot(reqSnapshot, htd);
>       }
>
>      i think the clone:
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3.  add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
>     line: 235-243
>     code:
>
>
>       // trace which cleaner is holding on to each file
>       if (LOG.isTraceEnabled()) {
>         ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
>         for (FileStatus file : deletableValidFiles) {
>           if (!filteredFileSet.contains(file)) {
>             LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
>           }
>         }
>       }
>      i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor,  the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >>     public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >>     TableName tableName = null;
> >>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >>       if (snapshotInfo.getName().equals(snapshotName)) {
> >>         tableName = TableName.valueOf(snapshotInfo.getTable());
> >>         break;
> >>       }
> >>     }
> >> ......
> >> }
> >>   must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>

回复:Re: Re: improvements on snapshot

Posted by WangYQ <wa...@163.com>.
thanks for all your suggesstions




发自 网易邮箱大师
在2015年11月08日 03:47,Ted Yu 写道:
Current behavior for creating duplicate snapshot is:

hbase(main):005:0> snapshot 'IntegrationTestBigLinkedList', 'snap1'

ERROR: org.apache.hadoop.hbase.snapshot.SnapshotExistsException: Snapshot
'snap1' already stored on the filesystem.
at
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.takeSnapshot(SnapshotManager.java:530)
at
org.apache.hadoop.hbase.master.MasterRpcServices.snapshot(MasterRpcServices.java:1275)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51123)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)

bq. before we create a snopshot of a table, we can check if there is a
snopshot with same name

Suppose snapshot name has been taken, are you going to try with different
name blindly ?
The second attempt may receive SnapshotExistsException, right ?

Better approach is to use list_snapshots command to obtain existing
snapshots so that your attempt is more guided.

Cheers

On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:

> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
>     class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
>    method:   public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
>    line: 725
>   code:
>
>
>       if (cpHost != null) {
>         cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
>       }
>     } else {
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       if (cpHost != null) {
>         cpHost.preCloneSnapshot(reqSnapshot, htd);
>       }
>       cloneSnapshot(fsSnapshot, htd);
>       LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
>       if (cpHost != null) {
>         cpHost.postCloneSnapshot(reqSnapshot, htd);
>       }
>
>      i think the clone:
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3.  add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
>     line: 235-243
>     code:
>
>
>       // trace which cleaner is holding on to each file
>       if (LOG.isTraceEnabled()) {
>         ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
>         for (FileStatus file : deletableValidFiles) {
>           if (!filteredFileSet.contains(file)) {
>             LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
>           }
>         }
>       }
>      i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor,  the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >>     public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >>     TableName tableName = null;
> >>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >>       if (snapshotInfo.getName().equals(snapshotName)) {
> >>         tableName = TableName.valueOf(snapshotInfo.getTable());
> >>         break;
> >>       }
> >>     }
> >> ......
> >> }
> >>   must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>

Re: Re: improvements on snapshot

Posted by Ted Yu <yu...@gmail.com>.
Current behavior for creating duplicate snapshot is:

hbase(main):005:0> snapshot 'IntegrationTestBigLinkedList', 'snap1'

ERROR: org.apache.hadoop.hbase.snapshot.SnapshotExistsException: Snapshot
'snap1' already stored on the filesystem.
at
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.takeSnapshot(SnapshotManager.java:530)
at
org.apache.hadoop.hbase.master.MasterRpcServices.snapshot(MasterRpcServices.java:1275)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51123)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)

bq. before we create a snopshot of a table, we can check if there is a
snopshot with same name

Suppose snapshot name has been taken, are you going to try with different
name blindly ?
The second attempt may receive SnapshotExistsException, right ?

Better approach is to use list_snapshots command to obtain existing
snapshots so that your attempt is more guided.

Cheers

On Sat, Nov 7, 2015 at 10:21 AM, WangYQ <wa...@163.com> wrote:

> 1. i download the src code from:
> http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
>     class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
>    method:   public void restoreSnapshot(SnapshotDescription reqSnapshot)
> throws IOException
>    line: 725
>   code:
>
>
>       if (cpHost != null) {
>         cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
>       }
>     } else {
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       if (cpHost != null) {
>         cpHost.preCloneSnapshot(reqSnapshot, htd);
>       }
>       cloneSnapshot(fsSnapshot, htd);
>       LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" +
> tableName);
>
>
>       if (cpHost != null) {
>         cpHost.postCloneSnapshot(reqSnapshot, htd);
>       }
>
>      i think the clone:
>       HTableDescriptor htd =
> RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
>       can be removed.
>
>
> 2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
> 3.  add method 'isSnapShotExists' in HBaseAdmin to help us to know if
> there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
> before we create a snopshot of a table, we can check if there is a
> snopshot with same name
> 4. simplify the code
> version: hbase 0.98.15
> class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
> method:private boolean checkAndDeleteFiles(List<FileStatus> files)
>     line: 235-243
>     code:
>
>
>       // trace which cleaner is holding on to each file
>       if (LOG.isTraceEnabled()) {
>         ImmutableSet<FileStatus> filteredFileSet =
> ImmutableSet.copyOf(filteredFiles);
>         for (FileStatus file : deletableValidFiles) {
>           if (!filteredFileSet.contains(file)) {
>             LOG.trace(file.getPath() + " is not deletable according to:" +
> cleaner);
>           }
>         }
>       }
>      i think these code can use set's difference operation instead, which
> is more straightforward
>
>
> At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
> >For #1, I took a look at current 0.98 code but didn't find it.
> >Can you double check and tell us the updated line number ?
> >
> >For #2, I think it makes sense - considering there may be large number of
> >snapshots in the cluster.
> >
> >For #3, can you tell us the use case for the new API ?
> >
> >Cheers
> >
> >On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
> >
> >> the version is hbase0.98.10, find some improvements on snapshot:
> >> 1. in class SnapshotManager, line 725, we make a clone of
> >> HTableDescriptor,  the clone is not so need to make
> >> 2 class HBaseAdmin, method
> >>     public void restoreSnapshot(final String snapshotName, boolean
> >> takeFailSafeSnapshot)
> >> {
> >>     TableName tableName = null;
> >>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
> >>       if (snapshotInfo.getName().equals(snapshotName)) {
> >>         tableName = TableName.valueOf(snapshotInfo.getTable());
> >>         break;
> >>       }
> >>     }
> >> ......
> >> }
> >>   must get all snapshot from master and then get the one we need, can
> >> improve this, master just return the snapshot client needed
> >>
> >>
> >> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>

Re:Re: improvements on snapshot

Posted by WangYQ <wa...@163.com>.
1. i download the src code from:    http://mirrors.hust.edu.cn/apache/hbase/0.98.15/
    class: org.apache.hadoop.hbase.master.snapshot.SnapshotManager
   method:   public void restoreSnapshot(SnapshotDescription reqSnapshot) throws IOException 
   line: 725
  code:


      if (cpHost != null) {
        cpHost.postRestoreSnapshot(reqSnapshot, snapshotTableDesc);
      }
    } else {
      HTableDescriptor htd = RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
      if (cpHost != null) {
        cpHost.preCloneSnapshot(reqSnapshot, htd);
      }
      cloneSnapshot(fsSnapshot, htd);
      LOG.info("Clone snapshot=" + fsSnapshot.getName() + " as table=" + tableName);


      if (cpHost != null) {
        cpHost.postCloneSnapshot(reqSnapshot, htd);
      }
      
     i think the clone:
      HTableDescriptor htd = RestoreSnapshotHelper.cloneTableSchema(snapshotTableDesc, tableName);
      can be removed.


2. this one is similar to the one in HBaseAdmin.getTableDecsriptor
3.  add method 'isSnapShotExists' in HBaseAdmin to help us to know if there is a snopshot on HDFS, just like HBaseAdmin.tableExists,
before we create a snopshot of a table, we can check if there is a snopshot with same name
4. simplify the code
version: hbase 0.98.15
class: org.apache.hadoop.hbase.master.cleaner.CleanerChore
method:private boolean checkAndDeleteFiles(List<FileStatus> files)
    line: 235-243
    code:


      // trace which cleaner is holding on to each file
      if (LOG.isTraceEnabled()) {
        ImmutableSet<FileStatus> filteredFileSet = ImmutableSet.copyOf(filteredFiles);
        for (FileStatus file : deletableValidFiles) {
          if (!filteredFileSet.contains(file)) {
            LOG.trace(file.getPath() + " is not deletable according to:" + cleaner);
          }
        }
      }
     i think these code can use set's difference operation instead, which is more straightforward


At 2015-11-07 21:10:21, "Ted Yu" <yu...@gmail.com> wrote:
>For #1, I took a look at current 0.98 code but didn't find it.
>Can you double check and tell us the updated line number ?
>
>For #2, I think it makes sense - considering there may be large number of
>snapshots in the cluster.
>
>For #3, can you tell us the use case for the new API ?
>
>Cheers
>
>On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:
>
>> the version is hbase0.98.10, find some improvements on snapshot:
>> 1. in class SnapshotManager, line 725, we make a clone of
>> HTableDescriptor,  the clone is not so need to make
>> 2 class HBaseAdmin, method
>>     public void restoreSnapshot(final String snapshotName, boolean
>> takeFailSafeSnapshot)
>> {
>>     TableName tableName = null;
>>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
>>       if (snapshotInfo.getName().equals(snapshotName)) {
>>         tableName = TableName.valueOf(snapshotInfo.getTable());
>>         break;
>>       }
>>     }
>> ......
>> }
>>   must get all snapshot from master and then get the one we need, can
>> improve this, master just return the snapshot client needed
>>
>>
>> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
>>
>>
>>
>>
>>
>>
>>
>>

Re: improvements on snapshot

Posted by Ted Yu <yu...@gmail.com>.
For #1, I took a look at current 0.98 code but didn't find it.
Can you double check and tell us the updated line number ?

For #2, I think it makes sense - considering there may be large number of
snapshots in the cluster.

For #3, can you tell us the use case for the new API ?

Cheers

On Fri, Nov 6, 2015 at 10:12 PM, 王勇强 <wa...@163.com> wrote:

> the version is hbase0.98.10, find some improvements on snapshot:
> 1. in class SnapshotManager, line 725, we make a clone of
> HTableDescriptor,  the clone is not so need to make
> 2 class HBaseAdmin, method
>     public void restoreSnapshot(final String snapshotName, boolean
> takeFailSafeSnapshot)
> {
>     TableName tableName = null;
>     for (SnapshotDescription snapshotInfo: listSnapshots()) {
>       if (snapshotInfo.getName().equals(snapshotName)) {
>         tableName = TableName.valueOf(snapshotInfo.getTable());
>         break;
>       }
>     }
> ......
> }
>   must get all snapshot from master and then get the one we need, can
> improve this, master just return the snapshot client needed
>
>
> 3. there should better has a method 'isSnapShotExists' in HBaseAdmin
>
>
>
>
>
>
>
>