You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bing Jiang <ji...@gmail.com> on 2015/02/05 04:26:14 UTC

Re: Hbase scan using TIMERANGE

hi, Ted.

Do you know whether there is optimization on scan with TimeRange?

Actually, if set a sparse TimeRange and large scan cache, it will cause rpc
time out sometimes.


Actually, want to know whether it requires scanning each KV for checking
its timestamp?

Thanks,
-Bing

2014-06-28 21:25 GMT+08:00 Ted Yu <yu...@gmail.com>:

> Have you looked at the following method in AggregationClient ?
>
>   long rowCount(final HTable table,
>
>       final ColumnInterpreter<R, S, P, Q, T> ci, final Scan scan) throws
> Throwable {
>
> You can specify timerange through scan parameter.
>
> See this method of Scan:
>
>   public Scan setTimeRange(long minStamp, long maxStamp)
>
> Cheers
>
>
> On Sat, Jun 28, 2014 at 3:42 AM, yogi <yo...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a requirement where I have to make a shell script using which i
> need
> > to scan some 6 huge hbase tables and get the count of records present in
> > them. Also i need the counts per day wise where i pass the date parameter
> > to
> > the shell script which calls these scan commands. I did find a way to
> > convert the date to epoch time and pass it to scan command but the scan
> > keeps running forever. Can some one help me in making this faster.
> >
> > Note: I am scanning the tables based on TIMERANGE as all the tables have
> > this field.
> >
> > Thanks,
> > Yogi
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/Hbase-scan-using-TIMERANGE-tp4060851.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: Hbase scan using TIMERANGE

Posted by Bing Jiang <ji...@gmail.com>.
Really thankful for Ted's points.

Yes, the tight time range will cause scan to be very slow to fill the cache.

I will investigate the hbase-5032 further, will report to you if there are
some progresses and improvements.

Thank you!

-Bing

2015-02-05 11:34 GMT+08:00 Ted Yu <yu...@gmail.com>:

> bq. set a sparse TimeRange
>
> You mean a TimeRange whose span is short ?
>
> bq. and large scan cache
>
> Can you try smaller number of rows for caching ?
>
> A preliminary search led me to HBASE-5032 'Add other DELETE type
> information into the delete bloom filter to optimize the time range query'
>
> Cheers
>
> On Wed, Feb 4, 2015 at 7:26 PM, Bing Jiang <ji...@gmail.com>
> wrote:
>
> > hi, Ted.
> >
> > Do you know whether there is optimization on scan with TimeRange?
> >
> > Actually, if set a sparse TimeRange and large scan cache, it will cause
> rpc
> > time out sometimes.
> >
> >
> > Actually, want to know whether it requires scanning each KV for checking
> > its timestamp?
> >
> > Thanks,
> > -Bing
> >
> > 2014-06-28 21:25 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > Have you looked at the following method in AggregationClient ?
> > >
> > >   long rowCount(final HTable table,
> > >
> > >       final ColumnInterpreter<R, S, P, Q, T> ci, final Scan scan)
> throws
> > > Throwable {
> > >
> > > You can specify timerange through scan parameter.
> > >
> > > See this method of Scan:
> > >
> > >   public Scan setTimeRange(long minStamp, long maxStamp)
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Jun 28, 2014 at 3:42 AM, yogi <yo...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a requirement where I have to make a shell script using which
> i
> > > need
> > > > to scan some 6 huge hbase tables and get the count of records present
> > in
> > > > them. Also i need the counts per day wise where i pass the date
> > parameter
> > > > to
> > > > the shell script which calls these scan commands. I did find a way to
> > > > convert the date to epoch time and pass it to scan command but the
> scan
> > > > keeps running forever. Can some one help me in making this faster.
> > > >
> > > > Note: I am scanning the tables based on TIMERANGE as all the tables
> > have
> > > > this field.
> > > >
> > > > Thanks,
> > > > Yogi
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/Hbase-scan-using-TIMERANGE-tp4060851.html
> > > > Sent from the HBase User mailing list archive at Nabble.com.
> > > >
> > >
> >
>



-- 
Bing Jiang

Re: Hbase scan using TIMERANGE

Posted by Ted Yu <yu...@gmail.com>.
bq. set a sparse TimeRange

You mean a TimeRange whose span is short ?

bq. and large scan cache

Can you try smaller number of rows for caching ?

A preliminary search led me to HBASE-5032 'Add other DELETE type
information into the delete bloom filter to optimize the time range query'

Cheers

On Wed, Feb 4, 2015 at 7:26 PM, Bing Jiang <ji...@gmail.com> wrote:

> hi, Ted.
>
> Do you know whether there is optimization on scan with TimeRange?
>
> Actually, if set a sparse TimeRange and large scan cache, it will cause rpc
> time out sometimes.
>
>
> Actually, want to know whether it requires scanning each KV for checking
> its timestamp?
>
> Thanks,
> -Bing
>
> 2014-06-28 21:25 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > Have you looked at the following method in AggregationClient ?
> >
> >   long rowCount(final HTable table,
> >
> >       final ColumnInterpreter<R, S, P, Q, T> ci, final Scan scan) throws
> > Throwable {
> >
> > You can specify timerange through scan parameter.
> >
> > See this method of Scan:
> >
> >   public Scan setTimeRange(long minStamp, long maxStamp)
> >
> > Cheers
> >
> >
> > On Sat, Jun 28, 2014 at 3:42 AM, yogi <yo...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a requirement where I have to make a shell script using which i
> > need
> > > to scan some 6 huge hbase tables and get the count of records present
> in
> > > them. Also i need the counts per day wise where i pass the date
> parameter
> > > to
> > > the shell script which calls these scan commands. I did find a way to
> > > convert the date to epoch time and pass it to scan command but the scan
> > > keeps running forever. Can some one help me in making this faster.
> > >
> > > Note: I am scanning the tables based on TIMERANGE as all the tables
> have
> > > this field.
> > >
> > > Thanks,
> > > Yogi
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/Hbase-scan-using-TIMERANGE-tp4060851.html
> > > Sent from the HBase User mailing list archive at Nabble.com.
> > >
> >
>