You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vladimir Rodionov <vl...@gmail.com> on 2015/03/03 18:18:05 UTC
Online snapshots are very slow?
Some discussions:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/43616
Any ideas why? It should not take 10s of seconds (unless we flush several
GBs per server)
I got info from my coworker that it is indeed slow (20+ sec on an almost
empty table).
I have not started testing myself yet but before I start digging into it I
would like to collect opinions from HBase folks.
-Vlad
Re: Online snapshots are very slow?
Posted by Matteo Bertozzi <th...@gmail.com>.
On Tue, Mar 3, 2015 at 5:37 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:
> Matteo,
>
> For large cluster/table this one:
>
> - the master will aggregate the result and verify the integrity
>
> looks like a real bottleneck.
>
the integrity verification is just file names verification not actual data
verification.
> Any other hidden serialized parts of the implementation?
>
nothing else I can think about
> On Tue, Mar 3, 2015 at 9:25 AM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > the high-level overview of snapshot is:
> > - client ask the master to take a snapshot
> > - the master lookup the RS that are hosting the regions for the
> specified
> > table
> > - the master creates a znode to notify the RSs to take a snapshot
> > - each RS involved will get notified and take the snapshot. which is
> flush
> > + writing a manifest
> > - each RS involved will respond to the master
> > - the master will aggregate the result and verify the integrity
> > - snapshot complete
> >
> > so, the time required to take a snapshot is bounded by the slowest region
> > to flush/respond.
> > You can try with SKIP_FLUSH = true
> > also, if you grep Snapshot from the master log you can see what is taking
> > long.
> >
> > Matteo
> >
> >
> > On Tue, Mar 3, 2015 at 5:18 PM, Vladimir Rodionov <
> vladrodionov@gmail.com>
> > wrote:
> >
> > > Some discussions:
> > > http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/43616
> > >
> > > Any ideas why? It should not take 10s of seconds (unless we flush
> several
> > > GBs per server)
> > > I got info from my coworker that it is indeed slow (20+ sec on an
> almost
> > > empty table).
> > >
> > > I have not started testing myself yet but before I start digging into
> it
> > I
> > > would like to collect opinions from HBase folks.
> > >
> > > -Vlad
> > >
> >
>
Re: Online snapshots are very slow?
Posted by Vladimir Rodionov <vl...@gmail.com>.
Matteo,
For large cluster/table this one:
- the master will aggregate the result and verify the integrity
looks like a real bottleneck.
Any other hidden serialized parts of the implementation?
-Vlad
On Tue, Mar 3, 2015 at 9:25 AM, Matteo Bertozzi <th...@gmail.com>
wrote:
> the high-level overview of snapshot is:
> - client ask the master to take a snapshot
> - the master lookup the RS that are hosting the regions for the specified
> table
> - the master creates a znode to notify the RSs to take a snapshot
> - each RS involved will get notified and take the snapshot. which is flush
> + writing a manifest
> - each RS involved will respond to the master
> - the master will aggregate the result and verify the integrity
> - snapshot complete
>
> so, the time required to take a snapshot is bounded by the slowest region
> to flush/respond.
> You can try with SKIP_FLUSH = true
> also, if you grep Snapshot from the master log you can see what is taking
> long.
>
> Matteo
>
>
> On Tue, Mar 3, 2015 at 5:18 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
> > Some discussions:
> > http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/43616
> >
> > Any ideas why? It should not take 10s of seconds (unless we flush several
> > GBs per server)
> > I got info from my coworker that it is indeed slow (20+ sec on an almost
> > empty table).
> >
> > I have not started testing myself yet but before I start digging into it
> I
> > would like to collect opinions from HBase folks.
> >
> > -Vlad
> >
>
Re: Online snapshots are very slow?
Posted by Matteo Bertozzi <th...@gmail.com>.
the high-level overview of snapshot is:
- client ask the master to take a snapshot
- the master lookup the RS that are hosting the regions for the specified
table
- the master creates a znode to notify the RSs to take a snapshot
- each RS involved will get notified and take the snapshot. which is flush
+ writing a manifest
- each RS involved will respond to the master
- the master will aggregate the result and verify the integrity
- snapshot complete
so, the time required to take a snapshot is bounded by the slowest region
to flush/respond.
You can try with SKIP_FLUSH = true
also, if you grep Snapshot from the master log you can see what is taking
long.
Matteo
On Tue, Mar 3, 2015 at 5:18 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:
> Some discussions:
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/43616
>
> Any ideas why? It should not take 10s of seconds (unless we flush several
> GBs per server)
> I got info from my coworker that it is indeed slow (20+ sec on an almost
> empty table).
>
> I have not started testing myself yet but before I start digging into it I
> would like to collect opinions from HBase folks.
>
> -Vlad
>