You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Liam Slusser <ls...@gmail.com> on 2014/06/30 23:59:22 UTC

first scan returns nothing and how big is big?

Hey Hbase list,

First question - It seems that the first time I do a scan with a few
filters the system returns nothing - it also takes a long time (20-30
seconds) - but I can run the exact same request over again and it goes much
quicker (2-3 seconds for a total scan, I figured things are cached the
second time which is fine) but the 2nd time around I get results.  It is
the exact same scan request.  I don't get any errors and nothing in the log
files...

Has anybody else noticed anything like this?  I'm running HBase
0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on
top of my scan.

Second question - how big is too big?  I am using my hbase database to
store parsed logs, currently I am breaking the logs into monthly tables.  I
am inputting around 350 million logs a day so near the end of the month
there is an estimated 8-10 billion rows per table.  All seems to be fine, I
am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an hour
of logs in about 10 seconds so the performance is still very decent.  Is
there any advantage to breaking the table up into separate days?  Is there
a best practices guide for tables this big?

thanks!
liam

Re: first scan returns nothing and how big is big?

Posted by Liam Slusser <ls...@gmail.com>.

I'll try to put together a unit test and report back.

thanks,
liam



On Mon, Jun 30, 2014 at 3:25 PM, Ted Yu <yu...@gmail.com> wrote:

> FuzzyRowFilter is an interesting filter around which there has been user
> feedback on various scenarios.
>
> If you can write a unit test which exhibits the problem in your first
> point, that would help us track down the root cause.
>
> I checked FuzzyRowFilter in 0.94 branch - last fix for FuzzyRowFilter
> was HBASE-7628
> which you already have in 0.94.15
>
> Cheers
>
>
> On Mon, Jun 30, 2014 at 2:59 PM, Liam Slusser <ls...@gmail.com> wrote:
>
> > Hey Hbase list,
> >
> > First question - It seems that the first time I do a scan with a few
> > filters the system returns nothing - it also takes a long time (20-30
> > seconds) - but I can run the exact same request over again and it goes
> much
> > quicker (2-3 seconds for a total scan, I figured things are cached the
> > second time which is fine) but the 2nd time around I get results.  It is
> > the exact same scan request.  I don't get any errors and nothing in the
> log
> > files...
> >
> > Has anybody else noticed anything like this?  I'm running HBase
> > 0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on
> > top of my scan.
> >
> > Second question - how big is too big?  I am using my hbase database to
> > store parsed logs, currently I am breaking the logs into monthly tables.
>  I
> > am inputting around 350 million logs a day so near the end of the month
> > there is an estimated 8-10 billion rows per table.  All seems to be
> fine, I
> > am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an
> hour
> > of logs in about 10 seconds so the performance is still very decent.  Is
> > there any advantage to breaking the table up into separate days?  Is
> there
> > a best practices guide for tables this big?
> >
> > thanks!
> > liam
> >
>

Re: first scan returns nothing and how big is big?

Posted by Ted Yu <yu...@gmail.com>.

FuzzyRowFilter is an interesting filter around which there has been user
feedback on various scenarios.

If you can write a unit test which exhibits the problem in your first
point, that would help us track down the root cause.

I checked FuzzyRowFilter in 0.94 branch - last fix for FuzzyRowFilter
was HBASE-7628
which you already have in 0.94.15

Cheers


On Mon, Jun 30, 2014 at 2:59 PM, Liam Slusser <ls...@gmail.com> wrote:

> Hey Hbase list,
>
> First question - It seems that the first time I do a scan with a few
> filters the system returns nothing - it also takes a long time (20-30
> seconds) - but I can run the exact same request over again and it goes much
> quicker (2-3 seconds for a total scan, I figured things are cached the
> second time which is fine) but the 2nd time around I get results.  It is
> the exact same scan request.  I don't get any errors and nothing in the log
> files...
>
> Has anybody else noticed anything like this?  I'm running HBase
> 0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on
> top of my scan.
>
> Second question - how big is too big?  I am using my hbase database to
> store parsed logs, currently I am breaking the logs into monthly tables.  I
> am inputting around 350 million logs a day so near the end of the month
> there is an estimated 8-10 billion rows per table.  All seems to be fine, I
> am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an hour
> of logs in about 10 seconds so the performance is still very decent.  Is
> there any advantage to breaking the table up into separate days?  Is there
> a best practices guide for tables this big?
>
> thanks!
> liam
>