You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by rahul gidwani <ra...@gmail.com> on 2015/07/09 21:12:31 UTC

hbase hdfs snapshots

HBase snapshots are a very useful feature. but it was implemented back
before there was the ability to snapshot via HDFS.

Newer versions of Hadoop support HDFS snapshots.  I was wondering if the
community would be interested in something like a Snapshot V3 where we use
HDFS to take these snapshots.

Even with manifests (Snapshot V2) for our larger tables it can take hours
to Snapshot and Clone a table.

Would this feature be of use to anyone?

thanks
rahul

Re: hbase hdfs snapshots

Posted by rahul gidwani <ra...@gmail.com>.

Hey guys,

I am going to take a snapshot very soon of that table and have some jstacks
running on both the regionservers and hmaster.

I'll post my results.

rahul

On Fri, Jul 10, 2015 at 6:23 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Matteo, thanks for reminding me about verification stage - this is it.
> Master, verifying 60K files and regions in a live cluster ... this can
> explain 30 minutes.
>
> If even one region splits during a snapshot, snapshot will fail. Its
> amazing, that these guys are able to to finish snapshot at all.
>
> -Vlad
>
>
> On Fri, Jul 10, 2015 at 6:12 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > yeah, something along that line. but I doubt the problem is RS side.
> > or the communication between the master and RSs.
> >
> > in theory the problem may be the verification step where the master
> > is checking the snapshot. I was just trying to figure out where he is
> > spending the time
> > and that "30 minutes to snapshot" does not sound right to me,
> > because the snapshot phase where each RS take a manifest should not take
> > that long.
> >
> > Matteo
> >
> >
> > On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > wrote:
> >
> > > Matteo, there should be some explanation for 30 min flash_skip
> snapshot.
> > I
> > > think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
> > > load is extreme, it is probably does not scale well with # DNs and
> #files
> > > per directory. I presume that NN performance on file operations
> degrades
> > > when # of DNs and/or directory sizes increase.
> > >
> > > -Vlad
> > >
> > > On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > > > Manifest per Region, not family.
> > > > we couldn't send them back to the master/table to keep compatibility.
> > > > 60k region on 1200 RS are ~50 manifest per RS that alone should not
> > take
> > > > 30sec
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <
> > > vladrodionov@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > OK, even with 1 manifest file per region (per column family?) -
> 60K X
> > > 4 =
> > > > > 240,000 new files
> > > > > 8000 per minute, 135 per second. That is probably NN limit.
> > > > >
> > > > > Anyway, the root cause is the same as with reference files during
> > > region
> > > > > split:
> > > > >
> > > > > HDFS does not do well on file create/open/close/delete.
> > > > >
> > > > > -Vlad
> > > > >
> > > > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> > > > theo.bertozzi@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > @Vladimir there is no hfile link creation on snapshot. we create
> 1
> > > > > manifest
> > > > > > per region
> > > > > >
> > > > > > Matteo
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > > > > vladrodionov@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Being not very familiar with snapshot code, I could speculate
> > only
> > > on
> > > > > > where
> > > > > > > most time is spent ...
> > > > > > >
> > > > > > > In creating 60K x 4 x K (K is average # of store files per
> > region)
> > > > > small
> > > > > > > HFileLInks? This can be very large # of files.
> > > > > > >
> > > > > > > -Vlad
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > > > > theo.bertozzi@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > the total time taken by a snapshot should be bounded by the
> > > slowest
> > > > > > > > machine.
> > > > > > > > we send a notification to each RS and each RS execute the
> > > snapshot
> > > > > > > > operation for each region.
> > > > > > > > can you track down what is slow in your case?
> > > > > > > >
> > > > > > > > clone has to create a reference for each file, and that is a
> > > master
> > > > > > > > operation, and these calls may all go away if we change the
> > > layout
> > > > > in a
> > > > > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > > > > Most of the time should be spent on the enableTable phase of
> > the
> > > > > clone.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > > >
> > > > > > > > > Hi Rahul,
> > > > > > > > >
> > > > > > > > > Have you identified with it takes those 30 minutes? Is the
> > > table
> > > > > > > balances
> > > > > > > > > correctly across the servers? Form the logs, are you able
> to
> > > > > identify
> > > > > > > > what
> > > > > > > > > takes that much time?
> > > > > > > > >
> > > > > > > > > JM
> > > > > > > > >
> > > > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> > > > rahul.gidwani@gmail.com
> > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi Matteo,
> > > > > > > > > >
> > > > > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a
> > single
> > > > > table
> > > > > > > with
> > > > > > > > > 60k
> > > > > > > > > > regions and 4 column families.  It takes around 30
> minutes
> > to
> > > > > > > snapshot
> > > > > > > > > this
> > > > > > > > > > table using manifests compared to just seconds doing this
> > > with
> > > > > > hdfs.
> > > > > > > > > > Cloning this table takes considerably longer.
> > > > > > > > > >
> > > > > > > > > > For cases where someone would want to run Map/Reduce over
> > > > > snapshots
> > > > > > > > this
> > > > > > > > > > could be much faster as we could take an hdfs snapshot
> and
> > > > bypass
> > > > > > the
> > > > > > > > > > clone.
> > > > > > > > > >
> > > > > > > > > > rahul
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > > > > theo.bertozzi@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Even with manifests (Snapshot V2) for our larger
> tables
> > > it
> > > > > can
> > > > > > > take
> > > > > > > > > > hours
> > > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > on snapshot time the only thing that can take hours, is
> > > > > "flush".
> > > > > > > > > > > if you don't need that (which is what you get with hdfs
> > > > > > snapshots)
> > > > > > > > you
> > > > > > > > > > can
> > > > > > > > > > > specify SKIP_FLUSH => true
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Matteo
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > > > > implemented
> > > > > > > > > back
> > > > > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > > > > >
> > > > > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I
> was
> > > > > > wondering
> > > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > > > community would be interested in something like a
> > > Snapshot
> > > > V3
> > > > > > > where
> > > > > > > > > we
> > > > > > > > > > > use
> > > > > > > > > > > > HDFS to take these snapshots.
> > > > > > > > > > > >
> > > > > > > > > > > > Even with manifests (Snapshot V2) for our larger
> tables
> > > it
> > > > > can
> > > > > > > take
> > > > > > > > > > hours
> > > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > > >
> > > > > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > > > > >
> > > > > > > > > > > > thanks
> > > > > > > > > > > > rahul
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Vladimir Rodionov <vl...@gmail.com>.

Matteo, thanks for reminding me about verification stage - this is it.
Master, verifying 60K files and regions in a live cluster ... this can
explain 30 minutes.

If even one region splits during a snapshot, snapshot will fail. Its
amazing, that these guys are able to to finish snapshot at all.

-Vlad


On Fri, Jul 10, 2015 at 6:12 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> yeah, something along that line. but I doubt the problem is RS side.
> or the communication between the master and RSs.
>
> in theory the problem may be the verification step where the master
> is checking the snapshot. I was just trying to figure out where he is
> spending the time
> and that "30 minutes to snapshot" does not sound right to me,
> because the snapshot phase where each RS take a manifest should not take
> that long.
>
> Matteo
>
>
> On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <vladrodionov@gmail.com
> >
> wrote:
>
> > Matteo, there should be some explanation for 30 min flash_skip snapshot.
> I
> > think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
> > load is extreme, it is probably does not scale well with # DNs and #files
> > per directory. I presume that NN performance on file operations degrades
> > when # of DNs and/or directory sizes increase.
> >
> > -Vlad
> >
> > On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > Manifest per Region, not family.
> > > we couldn't send them back to the master/table to keep compatibility.
> > > 60k region on 1200 RS are ~50 manifest per RS that alone should not
> take
> > > 30sec
> > >
> > >
> > > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <
> > vladrodionov@gmail.com
> > > >
> > > wrote:
> > >
> > > > OK, even with 1 manifest file per region (per column family?) - 60K X
> > 4 =
> > > > 240,000 new files
> > > > 8000 per minute, 135 per second. That is probably NN limit.
> > > >
> > > > Anyway, the root cause is the same as with reference files during
> > region
> > > > split:
> > > >
> > > > HDFS does not do well on file create/open/close/delete.
> > > >
> > > > -Vlad
> > > >
> > > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> > > theo.bertozzi@gmail.com>
> > > > wrote:
> > > >
> > > > > @Vladimir there is no hfile link creation on snapshot. we create 1
> > > > manifest
> > > > > per region
> > > > >
> > > > > Matteo
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > > > vladrodionov@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Being not very familiar with snapshot code, I could speculate
> only
> > on
> > > > > where
> > > > > > most time is spent ...
> > > > > >
> > > > > > In creating 60K x 4 x K (K is average # of store files per
> region)
> > > > small
> > > > > > HFileLInks? This can be very large # of files.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > > > theo.bertozzi@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > the total time taken by a snapshot should be bounded by the
> > slowest
> > > > > > > machine.
> > > > > > > we send a notification to each RS and each RS execute the
> > snapshot
> > > > > > > operation for each region.
> > > > > > > can you track down what is slow in your case?
> > > > > > >
> > > > > > > clone has to create a reference for each file, and that is a
> > master
> > > > > > > operation, and these calls may all go away if we change the
> > layout
> > > > in a
> > > > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > > > Most of the time should be spent on the enableTable phase of
> the
> > > > clone.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > >
> > > > > > > > Hi Rahul,
> > > > > > > >
> > > > > > > > Have you identified with it takes those 30 minutes? Is the
> > table
> > > > > > balances
> > > > > > > > correctly across the servers? Form the logs, are you able to
> > > > identify
> > > > > > > what
> > > > > > > > takes that much time?
> > > > > > > >
> > > > > > > > JM
> > > > > > > >
> > > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> > > rahul.gidwani@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > > > Hi Matteo,
> > > > > > > > >
> > > > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a
> single
> > > > table
> > > > > > with
> > > > > > > > 60k
> > > > > > > > > regions and 4 column families.  It takes around 30 minutes
> to
> > > > > > snapshot
> > > > > > > > this
> > > > > > > > > table using manifests compared to just seconds doing this
> > with
> > > > > hdfs.
> > > > > > > > > Cloning this table takes considerably longer.
> > > > > > > > >
> > > > > > > > > For cases where someone would want to run Map/Reduce over
> > > > snapshots
> > > > > > > this
> > > > > > > > > could be much faster as we could take an hdfs snapshot and
> > > bypass
> > > > > the
> > > > > > > > > clone.
> > > > > > > > >
> > > > > > > > > rahul
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > > > theo.bertozzi@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> > it
> > > > can
> > > > > > take
> > > > > > > > > hours
> > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > on snapshot time the only thing that can take hours, is
> > > > "flush".
> > > > > > > > > > if you don't need that (which is what you get with hdfs
> > > > > snapshots)
> > > > > > > you
> > > > > > > > > can
> > > > > > > > > > specify SKIP_FLUSH => true
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Matteo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > > > implemented
> > > > > > > > back
> > > > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > > > >
> > > > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > > > > wondering
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > > > community would be interested in something like a
> > Snapshot
> > > V3
> > > > > > where
> > > > > > > > we
> > > > > > > > > > use
> > > > > > > > > > > HDFS to take these snapshots.
> > > > > > > > > > >
> > > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> > it
> > > > can
> > > > > > take
> > > > > > > > > hours
> > > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > > >
> > > > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > > > >
> > > > > > > > > > > thanks
> > > > > > > > > > > rahul
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Matteo Bertozzi <th...@gmail.com>.

yeah, something along that line. but I doubt the problem is RS side.
or the communication between the master and RSs.

in theory the problem may be the verification step where the master
is checking the snapshot. I was just trying to figure out where he is
spending the time
and that "30 minutes to snapshot" does not sound right to me,
because the snapshot phase where each RS take a manifest should not take
that long.

Matteo


On Fri, Jul 10, 2015 at 6:04 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Matteo, there should be some explanation for 30 min flash_skip snapshot. I
> think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
> load is extreme, it is probably does not scale well with # DNs and #files
> per directory. I presume that NN performance on file operations degrades
> when # of DNs and/or directory sizes increase.
>
> -Vlad
>
> On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > Manifest per Region, not family.
> > we couldn't send them back to the master/table to keep compatibility.
> > 60k region on 1200 RS are ~50 manifest per RS that alone should not take
> > 30sec
> >
> >
> > On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > wrote:
> >
> > > OK, even with 1 manifest file per region (per column family?) - 60K X
> 4 =
> > > 240,000 new files
> > > 8000 per minute, 135 per second. That is probably NN limit.
> > >
> > > Anyway, the root cause is the same as with reference files during
> region
> > > split:
> > >
> > > HDFS does not do well on file create/open/close/delete.
> > >
> > > -Vlad
> > >
> > > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > > > @Vladimir there is no hfile link creation on snapshot. we create 1
> > > manifest
> > > > per region
> > > >
> > > > Matteo
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > > vladrodionov@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Being not very familiar with snapshot code, I could speculate only
> on
> > > > where
> > > > > most time is spent ...
> > > > >
> > > > > In creating 60K x 4 x K (K is average # of store files per region)
> > > small
> > > > > HFileLInks? This can be very large # of files.
> > > > >
> > > > > -Vlad
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > > theo.bertozzi@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > the total time taken by a snapshot should be bounded by the
> slowest
> > > > > > machine.
> > > > > > we send a notification to each RS and each RS execute the
> snapshot
> > > > > > operation for each region.
> > > > > > can you track down what is slow in your case?
> > > > > >
> > > > > > clone has to create a reference for each file, and that is a
> master
> > > > > > operation, and these calls may all go away if we change the
> layout
> > > in a
> > > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > > Most of the time should be spent on the enableTable phase of the
> > > clone.
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > > jean-marc@spaggiari.org> wrote:
> > > > > >
> > > > > > > Hi Rahul,
> > > > > > >
> > > > > > > Have you identified with it takes those 30 minutes? Is the
> table
> > > > > balances
> > > > > > > correctly across the servers? Form the logs, are you able to
> > > identify
> > > > > > what
> > > > > > > takes that much time?
> > > > > > >
> > > > > > > JM
> > > > > > >
> > > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> > rahul.gidwani@gmail.com
> > > >:
> > > > > > >
> > > > > > > > Hi Matteo,
> > > > > > > >
> > > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single
> > > table
> > > > > with
> > > > > > > 60k
> > > > > > > > regions and 4 column families.  It takes around 30 minutes to
> > > > > snapshot
> > > > > > > this
> > > > > > > > table using manifests compared to just seconds doing this
> with
> > > > hdfs.
> > > > > > > > Cloning this table takes considerably longer.
> > > > > > > >
> > > > > > > > For cases where someone would want to run Map/Reduce over
> > > snapshots
> > > > > > this
> > > > > > > > could be much faster as we could take an hdfs snapshot and
> > bypass
> > > > the
> > > > > > > > clone.
> > > > > > > >
> > > > > > > > rahul
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > > theo.bertozzi@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> it
> > > can
> > > > > take
> > > > > > > > hours
> > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > on snapshot time the only thing that can take hours, is
> > > "flush".
> > > > > > > > > if you don't need that (which is what you get with hdfs
> > > > snapshots)
> > > > > > you
> > > > > > > > can
> > > > > > > > > specify SKIP_FLUSH => true
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Matteo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > > rahul.gidwani@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > > implemented
> > > > > > > back
> > > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > > >
> > > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > > > wondering
> > > > > > if
> > > > > > > > the
> > > > > > > > > > community would be interested in something like a
> Snapshot
> > V3
> > > > > where
> > > > > > > we
> > > > > > > > > use
> > > > > > > > > > HDFS to take these snapshots.
> > > > > > > > > >
> > > > > > > > > > Even with manifests (Snapshot V2) for our larger tables
> it
> > > can
> > > > > take
> > > > > > > > hours
> > > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > > >
> > > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > > >
> > > > > > > > > > thanks
> > > > > > > > > > rahul
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Vladimir Rodionov <vl...@gmail.com>.

Matteo, there should be some explanation for 30 min flash_skip snapshot. I
think its should be somewhere in NN/Hdfs. This is a huge cluster and NN
load is extreme, it is probably does not scale well with # DNs and #files
per directory. I presume that NN performance on file operations degrades
when # of DNs and/or directory sizes increase.

-Vlad

On Fri, Jul 10, 2015 at 5:29 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> Manifest per Region, not family.
> we couldn't send them back to the master/table to keep compatibility.
> 60k region on 1200 RS are ~50 manifest per RS that alone should not take
> 30sec
>
>
> On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <vladrodionov@gmail.com
> >
> wrote:
>
> > OK, even with 1 manifest file per region (per column family?) - 60K X 4 =
> > 240,000 new files
> > 8000 per minute, 135 per second. That is probably NN limit.
> >
> > Anyway, the root cause is the same as with reference files during region
> > split:
> >
> > HDFS does not do well on file create/open/close/delete.
> >
> > -Vlad
> >
> > On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > @Vladimir there is no hfile link creation on snapshot. we create 1
> > manifest
> > > per region
> > >
> > > Matteo
> > >
> > >
> > > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> > vladrodionov@gmail.com
> > > >
> > > wrote:
> > >
> > > > Being not very familiar with snapshot code, I could speculate only on
> > > where
> > > > most time is spent ...
> > > >
> > > > In creating 60K x 4 x K (K is average # of store files per region)
> > small
> > > > HFileLInks? This can be very large # of files.
> > > >
> > > > -Vlad
> > > >
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > > theo.bertozzi@gmail.com>
> > > > wrote:
> > > >
> > > > > the total time taken by a snapshot should be bounded by the slowest
> > > > > machine.
> > > > > we send a notification to each RS and each RS execute the snapshot
> > > > > operation for each region.
> > > > > can you track down what is slow in your case?
> > > > >
> > > > > clone has to create a reference for each file, and that is a master
> > > > > operation, and these calls may all go away if we change the layout
> > in a
> > > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > > Most of the time should be spent on the enableTable phase of the
> > clone.
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Rahul,
> > > > > >
> > > > > > Have you identified with it takes those 30 minutes? Is the table
> > > > balances
> > > > > > correctly across the servers? Form the logs, are you able to
> > identify
> > > > > what
> > > > > > takes that much time?
> > > > > >
> > > > > > JM
> > > > > >
> > > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <
> rahul.gidwani@gmail.com
> > >:
> > > > > >
> > > > > > > Hi Matteo,
> > > > > > >
> > > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single
> > table
> > > > with
> > > > > > 60k
> > > > > > > regions and 4 column families.  It takes around 30 minutes to
> > > > snapshot
> > > > > > this
> > > > > > > table using manifests compared to just seconds doing this with
> > > hdfs.
> > > > > > > Cloning this table takes considerably longer.
> > > > > > >
> > > > > > > For cases where someone would want to run Map/Reduce over
> > snapshots
> > > > > this
> > > > > > > could be much faster as we could take an hdfs snapshot and
> bypass
> > > the
> > > > > > > clone.
> > > > > > >
> > > > > > > rahul
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > > theo.bertozzi@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > rahul.gidwani@gmail.com>
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Even with manifests (Snapshot V2) for our larger tables it
> > can
> > > > take
> > > > > > > hours
> > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > >
> > > > > > > >
> > > > > > > > on snapshot time the only thing that can take hours, is
> > "flush".
> > > > > > > > if you don't need that (which is what you get with hdfs
> > > snapshots)
> > > > > you
> > > > > > > can
> > > > > > > > specify SKIP_FLUSH => true
> > > > > > > >
> > > > > > > >
> > > > > > > > Matteo
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > > rahul.gidwani@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > HBase snapshots are a very useful feature. but it was
> > > implemented
> > > > > > back
> > > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > > >
> > > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > > wondering
> > > > > if
> > > > > > > the
> > > > > > > > > community would be interested in something like a Snapshot
> V3
> > > > where
> > > > > > we
> > > > > > > > use
> > > > > > > > > HDFS to take these snapshots.
> > > > > > > > >
> > > > > > > > > Even with manifests (Snapshot V2) for our larger tables it
> > can
> > > > take
> > > > > > > hours
> > > > > > > > > to Snapshot and Clone a table.
> > > > > > > > >
> > > > > > > > > Would this feature be of use to anyone?
> > > > > > > > >
> > > > > > > > > thanks
> > > > > > > > > rahul
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Matteo Bertozzi <th...@gmail.com>.

Manifest per Region, not family.
we couldn't send them back to the master/table to keep compatibility.
60k region on 1200 RS are ~50 manifest per RS that alone should not take
30sec


On Fri, Jul 10, 2015 at 5:21 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> OK, even with 1 manifest file per region (per column family?) - 60K X 4 =
> 240,000 new files
> 8000 per minute, 135 per second. That is probably NN limit.
>
> Anyway, the root cause is the same as with reference files during region
> split:
>
> HDFS does not do well on file create/open/close/delete.
>
> -Vlad
>
> On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > @Vladimir there is no hfile link creation on snapshot. we create 1
> manifest
> > per region
> >
> > Matteo
> >
> >
> > On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > wrote:
> >
> > > Being not very familiar with snapshot code, I could speculate only on
> > where
> > > most time is spent ...
> > >
> > > In creating 60K x 4 x K (K is average # of store files per region)
> small
> > > HFileLInks? This can be very large # of files.
> > >
> > > -Vlad
> > >
> > >
> > >
> > > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > > > the total time taken by a snapshot should be bounded by the slowest
> > > > machine.
> > > > we send a notification to each RS and each RS execute the snapshot
> > > > operation for each region.
> > > > can you track down what is slow in your case?
> > > >
> > > > clone has to create a reference for each file, and that is a master
> > > > operation, and these calls may all go away if we change the layout
> in a
> > > > proper way instead of doing what is proposed in HBASE-13991.
> > > > Most of the time should be spent on the enableTable phase of the
> clone.
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi Rahul,
> > > > >
> > > > > Have you identified with it takes those 30 minutes? Is the table
> > > balances
> > > > > correctly across the servers? Form the logs, are you able to
> identify
> > > > what
> > > > > takes that much time?
> > > > >
> > > > > JM
> > > > >
> > > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <rahul.gidwani@gmail.com
> >:
> > > > >
> > > > > > Hi Matteo,
> > > > > >
> > > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single
> table
> > > with
> > > > > 60k
> > > > > > regions and 4 column families.  It takes around 30 minutes to
> > > snapshot
> > > > > this
> > > > > > table using manifests compared to just seconds doing this with
> > hdfs.
> > > > > > Cloning this table takes considerably longer.
> > > > > >
> > > > > > For cases where someone would want to run Map/Reduce over
> snapshots
> > > > this
> > > > > > could be much faster as we could take an hdfs snapshot and bypass
> > the
> > > > > > clone.
> > > > > >
> > > > > > rahul
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > > theo.bertozzi@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > rahul.gidwani@gmail.com>
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Even with manifests (Snapshot V2) for our larger tables it
> can
> > > take
> > > > > > hours
> > > > > > > > to Snapshot and Clone a table.
> > > > > > > >
> > > > > > >
> > > > > > > on snapshot time the only thing that can take hours, is
> "flush".
> > > > > > > if you don't need that (which is what you get with hdfs
> > snapshots)
> > > > you
> > > > > > can
> > > > > > > specify SKIP_FLUSH => true
> > > > > > >
> > > > > > >
> > > > > > > Matteo
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > > rahul.gidwani@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > HBase snapshots are a very useful feature. but it was
> > implemented
> > > > > back
> > > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > > >
> > > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> > wondering
> > > > if
> > > > > > the
> > > > > > > > community would be interested in something like a Snapshot V3
> > > where
> > > > > we
> > > > > > > use
> > > > > > > > HDFS to take these snapshots.
> > > > > > > >
> > > > > > > > Even with manifests (Snapshot V2) for our larger tables it
> can
> > > take
> > > > > > hours
> > > > > > > > to Snapshot and Clone a table.
> > > > > > > >
> > > > > > > > Would this feature be of use to anyone?
> > > > > > > >
> > > > > > > > thanks
> > > > > > > > rahul
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Vladimir Rodionov <vl...@gmail.com>.

OK, even with 1 manifest file per region (per column family?) - 60K X 4 =
240,000 new files
8000 per minute, 135 per second. That is probably NN limit.

Anyway, the root cause is the same as with reference files during region
split:

HDFS does not do well on file create/open/close/delete.

-Vlad

On Fri, Jul 10, 2015 at 5:09 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> @Vladimir there is no hfile link creation on snapshot. we create 1 manifest
> per region
>
> Matteo
>
>
> On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <vladrodionov@gmail.com
> >
> wrote:
>
> > Being not very familiar with snapshot code, I could speculate only on
> where
> > most time is spent ...
> >
> > In creating 60K x 4 x K (K is average # of store files per region) small
> > HFileLInks? This can be very large # of files.
> >
> > -Vlad
> >
> >
> >
> > On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > the total time taken by a snapshot should be bounded by the slowest
> > > machine.
> > > we send a notification to each RS and each RS execute the snapshot
> > > operation for each region.
> > > can you track down what is slow in your case?
> > >
> > > clone has to create a reference for each file, and that is a master
> > > operation, and these calls may all go away if we change the layout in a
> > > proper way instead of doing what is proposed in HBASE-13991.
> > > Most of the time should be spent on the enableTable phase of the clone.
> > >
> > >
> > > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Rahul,
> > > >
> > > > Have you identified with it takes those 30 minutes? Is the table
> > balances
> > > > correctly across the servers? Form the logs, are you able to identify
> > > what
> > > > takes that much time?
> > > >
> > > > JM
> > > >
> > > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <ra...@gmail.com>:
> > > >
> > > > > Hi Matteo,
> > > > >
> > > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single table
> > with
> > > > 60k
> > > > > regions and 4 column families.  It takes around 30 minutes to
> > snapshot
> > > > this
> > > > > table using manifests compared to just seconds doing this with
> hdfs.
> > > > > Cloning this table takes considerably longer.
> > > > >
> > > > > For cases where someone would want to run Map/Reduce over snapshots
> > > this
> > > > > could be much faster as we could take an hdfs snapshot and bypass
> the
> > > > > clone.
> > > > >
> > > > > rahul
> > > > >
> > > > >
> > > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > > theo.bertozzi@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > rahul.gidwani@gmail.com>
> > > > > >  wrote:
> > > > > >
> > > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> > take
> > > > > hours
> > > > > > > to Snapshot and Clone a table.
> > > > > > >
> > > > > >
> > > > > > on snapshot time the only thing that can take hours, is "flush".
> > > > > > if you don't need that (which is what you get with hdfs
> snapshots)
> > > you
> > > > > can
> > > > > > specify SKIP_FLUSH => true
> > > > > >
> > > > > >
> > > > > > Matteo
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > > rahul.gidwani@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > HBase snapshots are a very useful feature. but it was
> implemented
> > > > back
> > > > > > > before there was the ability to snapshot via HDFS.
> > > > > > >
> > > > > > > Newer versions of Hadoop support HDFS snapshots.  I was
> wondering
> > > if
> > > > > the
> > > > > > > community would be interested in something like a Snapshot V3
> > where
> > > > we
> > > > > > use
> > > > > > > HDFS to take these snapshots.
> > > > > > >
> > > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> > take
> > > > > hours
> > > > > > > to Snapshot and Clone a table.
> > > > > > >
> > > > > > > Would this feature be of use to anyone?
> > > > > > >
> > > > > > > thanks
> > > > > > > rahul
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Matteo Bertozzi <th...@gmail.com>.

@Vladimir there is no hfile link creation on snapshot. we create 1 manifest
per region

Matteo


On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Being not very familiar with snapshot code, I could speculate only on where
> most time is spent ...
>
> In creating 60K x 4 x K (K is average # of store files per region) small
> HFileLInks? This can be very large # of files.
>
> -Vlad
>
>
>
> On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > the total time taken by a snapshot should be bounded by the slowest
> > machine.
> > we send a notification to each RS and each RS execute the snapshot
> > operation for each region.
> > can you track down what is slow in your case?
> >
> > clone has to create a reference for each file, and that is a master
> > operation, and these calls may all go away if we change the layout in a
> > proper way instead of doing what is proposed in HBASE-13991.
> > Most of the time should be spent on the enableTable phase of the clone.
> >
> >
> > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Rahul,
> > >
> > > Have you identified with it takes those 30 minutes? Is the table
> balances
> > > correctly across the servers? Form the logs, are you able to identify
> > what
> > > takes that much time?
> > >
> > > JM
> > >
> > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <ra...@gmail.com>:
> > >
> > > > Hi Matteo,
> > > >
> > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single table
> with
> > > 60k
> > > > regions and 4 column families.  It takes around 30 minutes to
> snapshot
> > > this
> > > > table using manifests compared to just seconds doing this with hdfs.
> > > > Cloning this table takes considerably longer.
> > > >
> > > > For cases where someone would want to run Map/Reduce over snapshots
> > this
> > > > could be much faster as we could take an hdfs snapshot and bypass the
> > > > clone.
> > > >
> > > > rahul
> > > >
> > > >
> > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > theo.bertozzi@gmail.com>
> > > > wrote:
> > > >
> > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > rahul.gidwani@gmail.com>
> > > > >  wrote:
> > > > >
> > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> take
> > > > hours
> > > > > > to Snapshot and Clone a table.
> > > > > >
> > > > >
> > > > > on snapshot time the only thing that can take hours, is "flush".
> > > > > if you don't need that (which is what you get with hdfs snapshots)
> > you
> > > > can
> > > > > specify SKIP_FLUSH => true
> > > > >
> > > > >
> > > > > Matteo
> > > > >
> > > > >
> > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > rahul.gidwani@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > HBase snapshots are a very useful feature. but it was implemented
> > > back
> > > > > > before there was the ability to snapshot via HDFS.
> > > > > >
> > > > > > Newer versions of Hadoop support HDFS snapshots.  I was wondering
> > if
> > > > the
> > > > > > community would be interested in something like a Snapshot V3
> where
> > > we
> > > > > use
> > > > > > HDFS to take these snapshots.
> > > > > >
> > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> take
> > > > hours
> > > > > > to Snapshot and Clone a table.
> > > > > >
> > > > > > Would this feature be of use to anyone?
> > > > > >
> > > > > > thanks
> > > > > > rahul
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Vladimir Rodionov <vl...@gmail.com>.

Being not very familiar with snapshot code, I could speculate only on where
most time is spent ...

In creating 60K x 4 x K (K is average # of store files per region) small
HFileLInks? This can be very large # of files.

-Vlad



On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> the total time taken by a snapshot should be bounded by the slowest
> machine.
> we send a notification to each RS and each RS execute the snapshot
> operation for each region.
> can you track down what is slow in your case?
>
> clone has to create a reference for each file, and that is a master
> operation, and these calls may all go away if we change the layout in a
> proper way instead of doing what is proposed in HBASE-13991.
> Most of the time should be spent on the enableTable phase of the clone.
>
>
> On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Rahul,
> >
> > Have you identified with it takes those 30 minutes? Is the table balances
> > correctly across the servers? Form the logs, are you able to identify
> what
> > takes that much time?
> >
> > JM
> >
> > 2015-07-10 18:46 GMT-04:00 rahul gidwani <ra...@gmail.com>:
> >
> > > Hi Matteo,
> > >
> > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single table with
> > 60k
> > > regions and 4 column families.  It takes around 30 minutes to snapshot
> > this
> > > table using manifests compared to just seconds doing this with hdfs.
> > > Cloning this table takes considerably longer.
> > >
> > > For cases where someone would want to run Map/Reduce over snapshots
> this
> > > could be much faster as we could take an hdfs snapshot and bypass the
> > > clone.
> > >
> > > rahul
> > >
> > >
> > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > rahul.gidwani@gmail.com>
> > > >  wrote:
> > > >
> > > > > Even with manifests (Snapshot V2) for our larger tables it can take
> > > hours
> > > > > to Snapshot and Clone a table.
> > > > >
> > > >
> > > > on snapshot time the only thing that can take hours, is "flush".
> > > > if you don't need that (which is what you get with hdfs snapshots)
> you
> > > can
> > > > specify SKIP_FLUSH => true
> > > >
> > > >
> > > > Matteo
> > > >
> > > >
> > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > rahul.gidwani@gmail.com>
> > > > wrote:
> > > >
> > > > > HBase snapshots are a very useful feature. but it was implemented
> > back
> > > > > before there was the ability to snapshot via HDFS.
> > > > >
> > > > > Newer versions of Hadoop support HDFS snapshots.  I was wondering
> if
> > > the
> > > > > community would be interested in something like a Snapshot V3 where
> > we
> > > > use
> > > > > HDFS to take these snapshots.
> > > > >
> > > > > Even with manifests (Snapshot V2) for our larger tables it can take
> > > hours
> > > > > to Snapshot and Clone a table.
> > > > >
> > > > > Would this feature be of use to anyone?
> > > > >
> > > > > thanks
> > > > > rahul
> > > > >
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Matteo Bertozzi <th...@gmail.com>.

the total time taken by a snapshot should be bounded by the slowest machine.
we send a notification to each RS and each RS execute the snapshot
operation for each region.
can you track down what is slow in your case?

clone has to create a reference for each file, and that is a master
operation, and these calls may all go away if we change the layout in a
proper way instead of doing what is proposed in HBASE-13991.
Most of the time should be spent on the enableTable phase of the clone.


On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Rahul,
>
> Have you identified with it takes those 30 minutes? Is the table balances
> correctly across the servers? Form the logs, are you able to identify what
> takes that much time?
>
> JM
>
> 2015-07-10 18:46 GMT-04:00 rahul gidwani <ra...@gmail.com>:
>
> > Hi Matteo,
> >
> > We do SKIP_FLUSH.  We have 1200+ regionservers with a single table with
> 60k
> > regions and 4 column families.  It takes around 30 minutes to snapshot
> this
> > table using manifests compared to just seconds doing this with hdfs.
> > Cloning this table takes considerably longer.
> >
> > For cases where someone would want to run Map/Reduce over snapshots this
> > could be much faster as we could take an hdfs snapshot and bypass the
> > clone.
> >
> > rahul
> >
> >
> > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> rahul.gidwani@gmail.com>
> > >  wrote:
> > >
> > > > Even with manifests (Snapshot V2) for our larger tables it can take
> > hours
> > > > to Snapshot and Clone a table.
> > > >
> > >
> > > on snapshot time the only thing that can take hours, is "flush".
> > > if you don't need that (which is what you get with hdfs snapshots) you
> > can
> > > specify SKIP_FLUSH => true
> > >
> > >
> > > Matteo
> > >
> > >
> > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> rahul.gidwani@gmail.com>
> > > wrote:
> > >
> > > > HBase snapshots are a very useful feature. but it was implemented
> back
> > > > before there was the ability to snapshot via HDFS.
> > > >
> > > > Newer versions of Hadoop support HDFS snapshots.  I was wondering if
> > the
> > > > community would be interested in something like a Snapshot V3 where
> we
> > > use
> > > > HDFS to take these snapshots.
> > > >
> > > > Even with manifests (Snapshot V2) for our larger tables it can take
> > hours
> > > > to Snapshot and Clone a table.
> > > >
> > > > Would this feature be of use to anyone?
> > > >
> > > > thanks
> > > > rahul
> > > >
> > >
> >
>

Re: hbase hdfs snapshots

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Rahul,

Have you identified with it takes those 30 minutes? Is the table balances
correctly across the servers? Form the logs, are you able to identify what
takes that much time?

JM

2015-07-10 18:46 GMT-04:00 rahul gidwani <ra...@gmail.com>:

> Hi Matteo,
>
> We do SKIP_FLUSH.  We have 1200+ regionservers with a single table with 60k
> regions and 4 column families.  It takes around 30 minutes to snapshot this
> table using manifests compared to just seconds doing this with hdfs.
> Cloning this table takes considerably longer.
>
> For cases where someone would want to run Map/Reduce over snapshots this
> could be much faster as we could take an hdfs snapshot and bypass the
> clone.
>
> rahul
>
>
> On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
> >  wrote:
> >
> > > Even with manifests (Snapshot V2) for our larger tables it can take
> hours
> > > to Snapshot and Clone a table.
> > >
> >
> > on snapshot time the only thing that can take hours, is "flush".
> > if you don't need that (which is what you get with hdfs snapshots) you
> can
> > specify SKIP_FLUSH => true
> >
> >
> > Matteo
> >
> >
> > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
> > wrote:
> >
> > > HBase snapshots are a very useful feature. but it was implemented back
> > > before there was the ability to snapshot via HDFS.
> > >
> > > Newer versions of Hadoop support HDFS snapshots.  I was wondering if
> the
> > > community would be interested in something like a Snapshot V3 where we
> > use
> > > HDFS to take these snapshots.
> > >
> > > Even with manifests (Snapshot V2) for our larger tables it can take
> hours
> > > to Snapshot and Clone a table.
> > >
> > > Would this feature be of use to anyone?
> > >
> > > thanks
> > > rahul
> > >
> >
>

Re: hbase hdfs snapshots

Posted by rahul gidwani <ra...@gmail.com>.

Hi Matteo,

We do SKIP_FLUSH.  We have 1200+ regionservers with a single table with 60k
regions and 4 column families.  It takes around 30 minutes to snapshot this
table using manifests compared to just seconds doing this with hdfs.
Cloning this table takes considerably longer.

For cases where someone would want to run Map/Reduce over snapshots this
could be much faster as we could take an hdfs snapshot and bypass the clone.

rahul

On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
>  wrote:
>
> > Even with manifests (Snapshot V2) for our larger tables it can take hours
> > to Snapshot and Clone a table.
> >
>
> on snapshot time the only thing that can take hours, is "flush".
> if you don't need that (which is what you get with hdfs snapshots) you can
> specify SKIP_FLUSH => true
>
>
> Matteo
>
>
> On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
> wrote:
>
> > HBase snapshots are a very useful feature. but it was implemented back
> > before there was the ability to snapshot via HDFS.
> >
> > Newer versions of Hadoop support HDFS snapshots.  I was wondering if the
> > community would be interested in something like a Snapshot V3 where we
> use
> > HDFS to take these snapshots.
> >
> > Even with manifests (Snapshot V2) for our larger tables it can take hours
> > to Snapshot and Clone a table.
> >
> > Would this feature be of use to anyone?
> >
> > thanks
> > rahul
> >
>

Re: hbase hdfs snapshots

Posted by Matteo Bertozzi <th...@gmail.com>.

On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
 wrote:

> Even with manifests (Snapshot V2) for our larger tables it can take hours
> to Snapshot and Clone a table.
>

on snapshot time the only thing that can take hours, is "flush".
if you don't need that (which is what you get with hdfs snapshots) you can
specify SKIP_FLUSH => true

Matteo

On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <ra...@gmail.com>
wrote:

> HBase snapshots are a very useful feature. but it was implemented back
> before there was the ability to snapshot via HDFS.
>
> Newer versions of Hadoop support HDFS snapshots.  I was wondering if the
> community would be interested in something like a Snapshot V3 where we use
> HDFS to take these snapshots.
>
> Even with manifests (Snapshot V2) for our larger tables it can take hours
> to Snapshot and Clone a table.
>
> Would this feature be of use to anyone?
>
> thanks
> rahul
>