You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Weishung Chung <we...@gmail.com> on 2011/03/26 16:57:02 UTC

disk seek in range search

Dear fellow HBase developers,

Could someone educate me and let me know how to figure out the number of
disk seeks involved in a range search (startRow to endRow specified in
Scan). Also, could anyone give me the details of all the steps involved once
the Scan for range retrieval is called? I know somehow it needs to figure
out the regionservers used in hosting the rows but I still don't have a
clear understanding the whole steps involved :( :( Also, there is a data
index block in HFile, I was wondering how the index block is utilized in
figuring out the location of all the rows.

Thank you so much for satisfying my curiosity :)

Have a good weekend and enjoy :)

Wei Shung

Re: disk seek in range search

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This isn't "Iron Chef American Idol All Stars HBase Edition", showing
that you can actually provide some form of answer is already a #win by
itself.

J-D


On Mon, Mar 28, 2011 at 9:52 AM, Weishung Chung <we...@gmail.com> wrote:
> Thank you Jean for the reading materials, I've been reading the source codes
> and searching on the internet and have a very vague idea how everything is
> working. Give me a few more days(don't want to embarrass myself), I will
> check back with you guys to see if my understanding is correct or not :D
> Have a good day, thanks again :)
>
> On Mon, Mar 28, 2011 at 11:23 AM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
>>
>> I think you are asking for a bit too much :)
>>
>> Let's do it the other way, show us what you think are the answers to
>> your questions based on currently available documentation and by
>> looking at the source code, then I'm pretty sure someone will be happy
>> to verify it.
>>
>> Start by looking at the bigtable paper, then use Lars George's blog posts
>> like:
>>
>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>
>> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
>> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>>
>> And then try diving into the code.
>>
>> Good luck!
>>
>> J-D
>>
>> On Sat, Mar 26, 2011 at 8:57 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>> > Dear fellow HBase developers,
>> >
>> > Could someone educate me and let me know how to figure out the number of
>> > disk seeks involved in a range search (startRow to endRow specified in
>> > Scan). Also, could anyone give me the details of all the steps involved
>> > once
>> > the Scan for range retrieval is called? I know somehow it needs to
>> > figure
>> > out the regionservers used in hosting the rows but I still don't have a
>> > clear understanding the whole steps involved :( :( Also, there is a data
>> > index block in HFile, I was wondering how the index block is utilized in
>> > figuring out the location of all the rows.
>> >
>> > Thank you so much for satisfying my curiosity :)
>> >
>> > Have a good weekend and enjoy :)
>> >
>> > Wei Shung
>> >
>
>

Re: disk seek in range search

Posted by Weishung Chung <we...@gmail.com>.
Thank you Jean for the reading materials, I've been reading the source codes
and searching on the internet and have a very vague idea how everything is
working. Give me a few more days(don't want to embarrass myself), I will
check back with you guys to see if my understanding is correct or not :D

Have a good day, thanks again :)

On Mon, Mar 28, 2011 at 11:23 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I think you are asking for a bit too much :)
>
> Let's do it the other way, show us what you think are the answers to
> your questions based on currently available documentation and by
> looking at the source code, then I'm pretty sure someone will be happy
> to verify it.
>
> Start by looking at the bigtable paper, then use Lars George's blog posts
> like:
>
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>
> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>
> And then try diving into the code.
>
> Good luck!
>
> J-D
>
> On Sat, Mar 26, 2011 at 8:57 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > Dear fellow HBase developers,
> >
> > Could someone educate me and let me know how to figure out the number of
> > disk seeks involved in a range search (startRow to endRow specified in
> > Scan). Also, could anyone give me the details of all the steps involved
> once
> > the Scan for range retrieval is called? I know somehow it needs to figure
> > out the regionservers used in hosting the rows but I still don't have a
> > clear understanding the whole steps involved :( :( Also, there is a data
> > index block in HFile, I was wondering how the index block is utilized in
> > figuring out the location of all the rows.
> >
> > Thank you so much for satisfying my curiosity :)
> >
> > Have a good weekend and enjoy :)
> >
> > Wei Shung
> >
>

Re: disk seek in range search

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think you are asking for a bit too much :)

Let's do it the other way, show us what you think are the answers to
your questions based on currently available documentation and by
looking at the source code, then I'm pretty sure someone will be happy
to verify it.

Start by looking at the bigtable paper, then use Lars George's blog posts like:

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

And then try diving into the code.

Good luck!

J-D

On Sat, Mar 26, 2011 at 8:57 AM, Weishung Chung <we...@gmail.com> wrote:
> Dear fellow HBase developers,
>
> Could someone educate me and let me know how to figure out the number of
> disk seeks involved in a range search (startRow to endRow specified in
> Scan). Also, could anyone give me the details of all the steps involved once
> the Scan for range retrieval is called? I know somehow it needs to figure
> out the regionservers used in hosting the rows but I still don't have a
> clear understanding the whole steps involved :( :( Also, there is a data
> index block in HFile, I was wondering how the index block is utilized in
> figuring out the location of all the rows.
>
> Thank you so much for satisfying my curiosity :)
>
> Have a good weekend and enjoy :)
>
> Wei Shung
>