You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/02/18 15:48:57 UTC

Scanning over key values > timestamp?

For search integration we need to, on server reboot scan over key
values since the last Lucene commit, and add them to the index.  Is
there an efficient way to do this?

Re: Scanning over key values > timestamp?

Posted by Jason Rutherglen <ja...@gmail.com>.

Ryan, thanks, I think a full scan'll be fine as it's a one time event
on startup/recovery, and I am curious either way.

On Fri, Feb 18, 2011 at 10:08 AM, Ryan Rawson <ry...@gmail.com> wrote:
> There is minimal/no underlying efficiency. It's basically a full
> table/region scan with a filter to discard the uninteresting values.
> We have various timestamp filtering techniques to avoid reading from
> files, eg: if you specify a time range [100,200) and a hfile only
> contains [0,50) we'll not include the file.  So perhaps in your case
> this might help.  Compactions will merge files and thus timestamp
> ranges together, and you'll lose some efficiency, assuming you COULD
> have done a query involving only the most recent HFiles.
>
>
>
> On Fri, Feb 18, 2011 at 10:02 AM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> Thanks Ted!  Is there some underlying efficiency to this, or will it
>> be scanning all of the rows underneath?
>>
>> On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu <yu...@gmail.com> wrote:
>>> From Scan.java:
>>>  * To only retrieve columns within a specific range of version timestamps,
>>>  * execute {@link #setTimeRange(long, long) setTimeRange}.
>>>
>>> On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen <
>>> jason.rutherglen@gmail.com> wrote:
>>>
>>>> For search integration we need to, on server reboot scan over key
>>>> values since the last Lucene commit, and add them to the index.  Is
>>>> there an efficient way to do this?
>>>>
>>>
>>
>

Re: Scanning over key values > timestamp?

Posted by Ryan Rawson <ry...@gmail.com>.

There is minimal/no underlying efficiency. It's basically a full
table/region scan with a filter to discard the uninteresting values.
We have various timestamp filtering techniques to avoid reading from
files, eg: if you specify a time range [100,200) and a hfile only
contains [0,50) we'll not include the file.  So perhaps in your case
this might help.  Compactions will merge files and thus timestamp
ranges together, and you'll lose some efficiency, assuming you COULD
have done a query involving only the most recent HFiles.

On Fri, Feb 18, 2011 at 10:02 AM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Thanks Ted!  Is there some underlying efficiency to this, or will it
> be scanning all of the rows underneath?
>
> On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu <yu...@gmail.com> wrote:
>> From Scan.java:
>>  * To only retrieve columns within a specific range of version timestamps,
>>  * execute {@link #setTimeRange(long, long) setTimeRange}.
>>
>> On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen <
>> jason.rutherglen@gmail.com> wrote:
>>
>>> For search integration we need to, on server reboot scan over key
>>> values since the last Lucene commit, and add them to the index.  Is
>>> there an efficient way to do this?
>>>
>>
>

Re: Scanning over key values > timestamp?

Posted by Jason Rutherglen <ja...@gmail.com>.

Thanks Ted!  Is there some underlying efficiency to this, or will it
be scanning all of the rows underneath?

On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu <yu...@gmail.com> wrote:
> From Scan.java:
>  * To only retrieve columns within a specific range of version timestamps,
>  * execute {@link #setTimeRange(long, long) setTimeRange}.
>
> On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> For search integration we need to, on server reboot scan over key
>> values since the last Lucene commit, and add them to the index.  Is
>> there an efficient way to do this?
>>
>

Re: Scanning over key values > timestamp?

Posted by Ted Yu <yu...@gmail.com>.

>From Scan.java:
 * To only retrieve columns within a specific range of version timestamps,
 * execute {@link #setTimeRange(long, long) setTimeRange}.

On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> For search integration we need to, on server reboot scan over key
> values since the last Lucene commit, and add them to the index.  Is
> there an efficient way to do this?
>