You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by S Ahmed <sa...@gmail.com> on 2013/01/13 16:24:41 UTC

best read path explanation

What is the best hbase read path explanation?

I understand that hbase stores data and doesn't allow for mutations, so I'm
confused as to how a read can get the latest data?

I'm guessing there are merges done between the immutable file stores, and
in-memory stores?

Re: best read path explanation

Posted by Anoop John <an...@gmail.com>.

At a read time, if there are more than one HFile for a store, HBase will
read that row from all the HFiles (check whether this row is there and if
so read) and also from memstore.  So it can get the latest data.

Also remember that there will be compaction happening for HFiles which will
merge more than one files into a single file.

-Anoop-

On Sun, Jan 13, 2013 at 8:54 PM, S Ahmed <sa...@gmail.com> wrote:

> What is the best hbase read path explanation?
>
> I understand that hbase stores data and doesn't allow for mutations, so I'm
> confused as to how a read can get the latest data?
>
> I'm guessing there are merges done between the immutable file stores, and
> in-memory stores?
>

Re: best read path explanation

Posted by lars hofhansl <la...@apache.org>.

This in enforced in the serverside scanner framework (ScanQueryMatcher called by StoreScanner).
So while expired KeyValues are only physically only removed once a compaction runs, they are logically hidden by the scanner framework.
In fact the same scanner framework is used to decide whether KeyValues are visible to a user scan or during a compaction.

As for Ahmed's question, you can run the tests locally by just applying the patch to a svn checkout (I doubt it will still apply, though).

-- Lars

________________________________
 From: Asaf Mesika <as...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Cc: lars hofhansl <la...@apache.org> 
Sent: Monday, January 14, 2013 10:47 AM
Subject: Re: best read path explanation

I have a follow up question here: 
A column family can be defined to have a maximum number of versions per column qualifier value. Is this enforced only by the client side code (HTable) or also by the InternalScanner implementations?

On Monday, January 14, 2013, S Ahmed  wrote:

Thanks Lars!
>
>Sort of a side question after following your proposed patch:
>https://issues.apache.org/jira/secure/attachment/12511771/5268-v5.txt
>
>Locally on your computer (laptop?), can those tests run in isolation or you
>need a fairly complicated setup to run them? (all the various hbase
>dependancies like zookeeper etc).
>
>
>On Sun, Jan 13, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:
>
>> Does this help:
>> http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html ?
>>
>>
>>
>>
>> ________________________________
>>  From: S Ahmed <sa...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Sunday, January 13, 2013 7:24 AM
>> Subject: best read path explanation
>>
>> What is the best hbase read path explanation?
>>
>> I understand that hbase stores data and doesn't allow for mutations, so I'm
>> confused as to how a read can get the latest data?
>>
>> I'm guessing there are merges done between the immutable file stores, and
>> in-memory stores?
>>
>

Re: best read path explanation

Posted by Asaf Mesika <as...@gmail.com>.

I have a follow up question here:
A column family can be defined to have a maximum number of versions per
column qualifier value. Is this enforced only by the client side code
(HTable) or also by the InternalScanner implementations?

On Monday, January 14, 2013, S Ahmed wrote:

> Thanks Lars!
>
> Sort of a side question after following your proposed patch:
> https://issues.apache.org/jira/secure/attachment/12511771/5268-v5.txt
>
> Locally on your computer (laptop?), can those tests run in isolation or you
> need a fairly complicated setup to run them? (all the various hbase
> dependancies like zookeeper etc).
>
>
> On Sun, Jan 13, 2013 at 9:33 PM, lars hofhansl <larsh@apache.org<javascript:;>>
> wrote:
>
> > Does this help:
> > http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html ?
> >
> >
> >
> >
> > ________________________________
> >  From: S Ahmed <sahmed1020@gmail.com <javascript:;>>
> > To: user@hbase.apache.org <javascript:;>
> > Sent: Sunday, January 13, 2013 7:24 AM
> > Subject: best read path explanation
> >
> > What is the best hbase read path explanation?
> >
> > I understand that hbase stores data and doesn't allow for mutations, so
> I'm
> > confused as to how a read can get the latest data?
> >
> > I'm guessing there are merges done between the immutable file stores, and
> > in-memory stores?
> >
>

Re: best read path explanation

Posted by S Ahmed <sa...@gmail.com>.

Thanks Lars!

Sort of a side question after following your proposed patch:
https://issues.apache.org/jira/secure/attachment/12511771/5268-v5.txt

Locally on your computer (laptop?), can those tests run in isolation or you
need a fairly complicated setup to run them? (all the various hbase
dependancies like zookeeper etc).


On Sun, Jan 13, 2013 at 9:33 PM, lars hofhansl <la...@apache.org> wrote:

> Does this help:
> http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html ?
>
>
>
>
> ________________________________
>  From: S Ahmed <sa...@gmail.com>
> To: user@hbase.apache.org
> Sent: Sunday, January 13, 2013 7:24 AM
> Subject: best read path explanation
>
> What is the best hbase read path explanation?
>
> I understand that hbase stores data and doesn't allow for mutations, so I'm
> confused as to how a read can get the latest data?
>
> I'm guessing there are merges done between the immutable file stores, and
> in-memory stores?
>

Re: best read path explanation

Posted by lars hofhansl <la...@apache.org>.

Does this help: http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html ?

________________________________
 From: S Ahmed <sa...@gmail.com>
To: user@hbase.apache.org 
Sent: Sunday, January 13, 2013 7:24 AM
Subject: best read path explanation

What is the best hbase read path explanation?

I understand that hbase stores data and doesn't allow for mutations, so I'm
confused as to how a read can get the latest data?

I'm guessing there are merges done between the immutable file stores, and
in-memory stores?