You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Gautam <ga...@gmail.com> on 2014/08/12 23:58:30 UTC

Hbase Scan/Snapshot Performance...

Hello,

     We'v been using and loving Hbase for couple of months now. Our primary
usecase for Hbase is writing events in stream to an online time series
Hbase table. Every so often we run medium to large batch scan MR jobs on
sections (1hour, 1 day, 1 week)  of this same time series table. This
online table is now showing spikes whenever these large batched read jobs
are run. Write throughput goes down while these sequential scans are
running on the table.

We'v been playing around with snapshots and are considering using snapshots
to take over the responsibility for running these scheduled hourly, daily,
weekly jobs so that the online table doesn't get affected. From preliminary
tests it looks like online snapshots take waay too long. The snapshot job
times out after 60secs. The time was spent flushing the memstores on all
region servers (as expected) which seems to take too long.  Also it seems
from the RS logs like this is done serially.

Offline snapshots isn't an option since we can't disable this table which
serves the event writing.

*We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
with 240 regions, 1 Column Family, 4 region servers. *

All in all, I'd like to ask if things would improve if we upgraded to Hbase
0.98.+ Are there known benchmark numbers on expected snapshot performance
for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
dynamically take a snapshot, run the job, delete/re-use the snapshot based
on freshness. At the least, we need the snapshot to be fresh until the last
hour.

Also from what I understand in Hbase, scans are not consistent at the table
level but are at the row level. Are there other ways I can query the online
table without hurting the write throughput?

Cheers,
-Gautam.

Re: Hbase Scan/Snapshot Performance...

Posted by Ted Yu <yu...@gmail.com>.

Gautum:
See also HBASE-10642 which went into 0.94.18

You can do rolling upgrade from 94.6 to 94.21

Cheers


On Tue, Aug 12, 2014 at 5:42 PM, Gautam <ga...@gmail.com> wrote:

> Thanks for the replies..
>
> Matteo,
>
>   We'r running 94.6 since February so, sadly the prod cluster doesn't have
> this SKIP_FLUSH option right now. Would be great if there are options I
> could use right now until we upgrade to 98.
>
> Ted,
>      Thanks for the jira. That is exactly what we intend to use for running
> the MR jobs over snapshots. Just wanted to know how easy/lightweight
> snapshotting can be before we set our eyes on moving the whole thing over.
>
>
> Cheers,
> -Gautam.
>
>
>
> On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Gautum:
> > Please take a look at this:
> > HBASE-8369 MapReduce over snapshot files
> >
> > Cheers
> >
> >
> > On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > There is HBASE-10935, included in  0.94.21 where you can specify to
> skip
> > > the memstore flush and the result will be the online version of an
> > "offline
> > > snapshot"
> > >
> > >
> > > snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
> > >
> > >
> > >
> > > On Tue, Aug 12, 2014 at 10:58 PM, Gautam <ga...@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > >      We'v been using and loving Hbase for couple of months now. Our
> > > primary
> > > > usecase for Hbase is writing events in stream to an online time
> series
> > > > Hbase table. Every so often we run medium to large batch scan MR jobs
> > on
> > > > sections (1hour, 1 day, 1 week)  of this same time series table. This
> > > > online table is now showing spikes whenever these large batched read
> > jobs
> > > > are run. Write throughput goes down while these sequential scans are
> > > > running on the table.
> > > >
> > > > We'v been playing around with snapshots and are considering using
> > > snapshots
> > > > to take over the responsibility for running these scheduled hourly,
> > > daily,
> > > > weekly jobs so that the online table doesn't get affected. From
> > > preliminary
> > > > tests it looks like online snapshots take waay too long. The snapshot
> > job
> > > > times out after 60secs. The time was spent flushing the memstores on
> > all
> > > > region servers (as expected) which seems to take too long.  Also it
> > seems
> > > > from the RS logs like this is done serially.
> > > >
> > > > Offline snapshots isn't an option since we can't disable this table
> > which
> > > > serves the event writing.
> > > >
> > > > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB
> > Table
> > > > with 240 regions, 1 Column Family, 4 region servers. *
> > > >
> > > > All in all, I'd like to ask if things would improve if we upgraded to
> > > Hbase
> > > > 0.98.+ Are there known benchmark numbers on expected snapshot
> > performance
> > > > for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> > > > dynamically take a snapshot, run the job, delete/re-use the snapshot
> > > based
> > > > on freshness. At the least, we need the snapshot to be fresh until
> the
> > > last
> > > > hour.
> > > >
> > > > Also from what I understand in Hbase, scans are not consistent at the
> > > table
> > > > level but are at the row level. Are there other ways I can query the
> > > online
> > > > table without hurting the write throughput?
> > > >
> > > > Cheers,
> > > > -Gautam.
> > > >
> > >
> >
>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Re: Hbase Scan/Snapshot Performance...

Posted by Gautam <ga...@gmail.com>.

Thanks for the replies..

Matteo,

  We'r running 94.6 since February so, sadly the prod cluster doesn't have
this SKIP_FLUSH option right now. Would be great if there are options I
could use right now until we upgrade to 98.

Ted,
     Thanks for the jira. That is exactly what we intend to use for running
the MR jobs over snapshots. Just wanted to know how easy/lightweight
snapshotting can be before we set our eyes on moving the whole thing over.


Cheers,
-Gautam.



On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu <yu...@gmail.com> wrote:

> Gautum:
> Please take a look at this:
> HBASE-8369 MapReduce over snapshot files
>
> Cheers
>
>
> On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > There is HBASE-10935, included in  0.94.21 where you can specify to skip
> > the memstore flush and the result will be the online version of an
> "offline
> > snapshot"
> >
> >
> > snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
> >
> >
> >
> > On Tue, Aug 12, 2014 at 10:58 PM, Gautam <ga...@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > >      We'v been using and loving Hbase for couple of months now. Our
> > primary
> > > usecase for Hbase is writing events in stream to an online time series
> > > Hbase table. Every so often we run medium to large batch scan MR jobs
> on
> > > sections (1hour, 1 day, 1 week)  of this same time series table. This
> > > online table is now showing spikes whenever these large batched read
> jobs
> > > are run. Write throughput goes down while these sequential scans are
> > > running on the table.
> > >
> > > We'v been playing around with snapshots and are considering using
> > snapshots
> > > to take over the responsibility for running these scheduled hourly,
> > daily,
> > > weekly jobs so that the online table doesn't get affected. From
> > preliminary
> > > tests it looks like online snapshots take waay too long. The snapshot
> job
> > > times out after 60secs. The time was spent flushing the memstores on
> all
> > > region servers (as expected) which seems to take too long.  Also it
> seems
> > > from the RS logs like this is done serially.
> > >
> > > Offline snapshots isn't an option since we can't disable this table
> which
> > > serves the event writing.
> > >
> > > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB
> Table
> > > with 240 regions, 1 Column Family, 4 region servers. *
> > >
> > > All in all, I'd like to ask if things would improve if we upgraded to
> > Hbase
> > > 0.98.+ Are there known benchmark numbers on expected snapshot
> performance
> > > for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> > > dynamically take a snapshot, run the job, delete/re-use the snapshot
> > based
> > > on freshness. At the least, we need the snapshot to be fresh until the
> > last
> > > hour.
> > >
> > > Also from what I understand in Hbase, scans are not consistent at the
> > table
> > > level but are at the row level. Are there other ways I can query the
> > online
> > > table without hurting the write throughput?
> > >
> > > Cheers,
> > > -Gautam.
> > >
> >
>



-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Hbase Scan/Snapshot Performance...

Posted by Ted Yu <yu...@gmail.com>.

Gautum:
Please take a look at this:
HBASE-8369 MapReduce over snapshot files

Cheers


On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> There is HBASE-10935, included in  0.94.21 where you can specify to skip
> the memstore flush and the result will be the online version of an "offline
> snapshot"
>
>
> snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}
>
>
>
> On Tue, Aug 12, 2014 at 10:58 PM, Gautam <ga...@gmail.com> wrote:
>
> > Hello,
> >
> >      We'v been using and loving Hbase for couple of months now. Our
> primary
> > usecase for Hbase is writing events in stream to an online time series
> > Hbase table. Every so often we run medium to large batch scan MR jobs on
> > sections (1hour, 1 day, 1 week)  of this same time series table. This
> > online table is now showing spikes whenever these large batched read jobs
> > are run. Write throughput goes down while these sequential scans are
> > running on the table.
> >
> > We'v been playing around with snapshots and are considering using
> snapshots
> > to take over the responsibility for running these scheduled hourly,
> daily,
> > weekly jobs so that the online table doesn't get affected. From
> preliminary
> > tests it looks like online snapshots take waay too long. The snapshot job
> > times out after 60secs. The time was spent flushing the memstores on all
> > region servers (as expected) which seems to take too long.  Also it seems
> > from the RS logs like this is done serially.
> >
> > Offline snapshots isn't an option since we can't disable this table which
> > serves the event writing.
> >
> > *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
> > with 240 regions, 1 Column Family, 4 region servers. *
> >
> > All in all, I'd like to ask if things would improve if we upgraded to
> Hbase
> > 0.98.+ Are there known benchmark numbers on expected snapshot performance
> > for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> > dynamically take a snapshot, run the job, delete/re-use the snapshot
> based
> > on freshness. At the least, we need the snapshot to be fresh until the
> last
> > hour.
> >
> > Also from what I understand in Hbase, scans are not consistent at the
> table
> > level but are at the row level. Are there other ways I can query the
> online
> > table without hurting the write throughput?
> >
> > Cheers,
> > -Gautam.
> >
>

Re: Hbase Scan/Snapshot Performance...

Posted by Matteo Bertozzi <th...@gmail.com>.

There is HBASE-10935, included in  0.94.21 where you can specify to skip
the memstore flush and the result will be the online version of an "offline
snapshot"


snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}



On Tue, Aug 12, 2014 at 10:58 PM, Gautam <ga...@gmail.com> wrote:

> Hello,
>
>      We'v been using and loving Hbase for couple of months now. Our primary
> usecase for Hbase is writing events in stream to an online time series
> Hbase table. Every so often we run medium to large batch scan MR jobs on
> sections (1hour, 1 day, 1 week)  of this same time series table. This
> online table is now showing spikes whenever these large batched read jobs
> are run. Write throughput goes down while these sequential scans are
> running on the table.
>
> We'v been playing around with snapshots and are considering using snapshots
> to take over the responsibility for running these scheduled hourly, daily,
> weekly jobs so that the online table doesn't get affected. From preliminary
> tests it looks like online snapshots take waay too long. The snapshot job
> times out after 60secs. The time was spent flushing the memstores on all
> region servers (as expected) which seems to take too long.  Also it seems
> from the RS logs like this is done serially.
>
> Offline snapshots isn't an option since we can't disable this table which
> serves the event writing.
>
> *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
> with 240 regions, 1 Column Family, 4 region servers. *
>
> All in all, I'd like to ask if things would improve if we upgraded to Hbase
> 0.98.+ Are there known benchmark numbers on expected snapshot performance
> for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> dynamically take a snapshot, run the job, delete/re-use the snapshot based
> on freshness. At the least, we need the snapshot to be fresh until the last
> hour.
>
> Also from what I understand in Hbase, scans are not consistent at the table
> level but are at the row level. Are there other ways I can query the online
> table without hurting the write throughput?
>
> Cheers,
> -Gautam.
>