You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tao Xie <xi...@gmail.com> on 2011/01/13 04:51:16 UTC

Will all HFiles managed by a regionserver kept open

hi, I know generally regionserver manages HRegions and in the HDFS layer
data in HRegion are stored as HFile format. I want to know whether HFiles
are all open and things lke block index are all loaded first to improve
lookup performance? If so, what will happen if exceeding memory limit?

Thanks.

Re: Will all HFiles managed by a regionserver kept open

Posted by Jean-Daniel Cryans <jd...@apache.org>.
There should be as many seeks as there is store files in the region
that's serving the data. There's also the family dimension e.g. if you
read from only 1 family then only those store files are read.

So on average, I'd say you'll do 3 seeks since you do a minor
compaction once you reach 4 store files in a family.

What he meant by memory copying is just that the data has to be copied
from the socket when you read from HDFS and then into the outbound
socket for the client after the region server does whatever processing
it needs to do. I guess the more data you read to longer it takes to
copy in RAM?

J-D

On Fri, Jan 14, 2011 at 12:43 AM, Tao Xie <xi...@gmail.com> wrote:
> is hdfs seek the most dominant in retrieving data? If records are small
> (~1k) and most requests are random Gets,  how many seek will happen in
> average during a Get. Btw, what do you mean by memory copying?  when will it
> cause large overhead? thanks.
>
> 2011/1/13 Ryan Rawson <ry...@gmail.com>
>
>> retrieving data from disk is the most dominant element, until you are
>> fully cached in which case other factors inside the regionserver
>> become dominant. at this point copying memory, gc, algorithmic
>> complexity, etc become important.
>>

Re: Will all HFiles managed by a regionserver kept open

Posted by Tao Xie <xi...@gmail.com>.
is hdfs seek the most dominant in retrieving data? If records are small
(~1k) and most requests are random Gets,  how many seek will happen in
average during a Get. Btw, what do you mean by memory copying?  when will it
cause large overhead? thanks.

2011/1/13 Ryan Rawson <ry...@gmail.com>

> retrieving data from disk is the most dominant element, until you are
> fully cached in which case other factors inside the regionserver
> become dominant. at this point copying memory, gc, algorithmic
> complexity, etc become important.
>
> On Wed, Jan 12, 2011 at 10:54 PM, Tao Xie <xi...@gmail.com>
> wrote:
> > Thanks for your response, Stack. I have a further question when
> > understanding hbase.
> > In my minds, I think a get is executed in the following process.
> >
> > hbase client <=> RS <=> DN
> >
> > 1) hbase client finds the RS managing the key; 2) RS knows the hfile and
> > fetches data from DataNode, this may be a pread + scanning in the hbase
> data
> > block; 3) record result is returned to client.
> >
> > Is this correct? So the most expensive operation is step 2?  Any other
> > time-consuming places?
> >
> >
> > 2011/1/13 Stack <st...@duboce.net>
> >
> >> Yes, all files are opened on startup and kept open.  Open of an hbase
> >> storefile/hfile includes loading up of the file index and metadata.
> >> In our experience, this overhead has been small.  Its currently not
> >> accounted for in our general memory-counting.  We should for sure add
> >> it.
> >>
> >> St.Ack
> >>
> >> On Wed, Jan 12, 2011 at 7:51 PM, Tao Xie <xi...@gmail.com>
> wrote:
> >> > hi, I know generally regionserver manages HRegions and in the HDFS
> layer
> >> > data in HRegion are stored as HFile format. I want to know whether
> HFiles
> >> > are all open and things lke block index are all loaded first to
> improve
> >> > lookup performance? If so, what will happen if exceeding memory limit?
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Re: Will all HFiles managed by a regionserver kept open

Posted by Ryan Rawson <ry...@gmail.com>.
retrieving data from disk is the most dominant element, until you are
fully cached in which case other factors inside the regionserver
become dominant. at this point copying memory, gc, algorithmic
complexity, etc become important.

On Wed, Jan 12, 2011 at 10:54 PM, Tao Xie <xi...@gmail.com> wrote:
> Thanks for your response, Stack. I have a further question when
> understanding hbase.
> In my minds, I think a get is executed in the following process.
>
> hbase client <=> RS <=> DN
>
> 1) hbase client finds the RS managing the key; 2) RS knows the hfile and
> fetches data from DataNode, this may be a pread + scanning in the hbase data
> block; 3) record result is returned to client.
>
> Is this correct? So the most expensive operation is step 2?  Any other
> time-consuming places?
>
>
> 2011/1/13 Stack <st...@duboce.net>
>
>> Yes, all files are opened on startup and kept open.  Open of an hbase
>> storefile/hfile includes loading up of the file index and metadata.
>> In our experience, this overhead has been small.  Its currently not
>> accounted for in our general memory-counting.  We should for sure add
>> it.
>>
>> St.Ack
>>
>> On Wed, Jan 12, 2011 at 7:51 PM, Tao Xie <xi...@gmail.com> wrote:
>> > hi, I know generally regionserver manages HRegions and in the HDFS layer
>> > data in HRegion are stored as HFile format. I want to know whether HFiles
>> > are all open and things lke block index are all loaded first to improve
>> > lookup performance? If so, what will happen if exceeding memory limit?
>> >
>> > Thanks.
>> >
>>
>

Re: Will all HFiles managed by a regionserver kept open

Posted by Tao Xie <xi...@gmail.com>.
Thanks for your response, Stack. I have a further question when
understanding hbase.
In my minds, I think a get is executed in the following process.

hbase client <=> RS <=> DN

1) hbase client finds the RS managing the key; 2) RS knows the hfile and
fetches data from DataNode, this may be a pread + scanning in the hbase data
block; 3) record result is returned to client.

Is this correct? So the most expensive operation is step 2?  Any other
time-consuming places?


2011/1/13 Stack <st...@duboce.net>

> Yes, all files are opened on startup and kept open.  Open of an hbase
> storefile/hfile includes loading up of the file index and metadata.
> In our experience, this overhead has been small.  Its currently not
> accounted for in our general memory-counting.  We should for sure add
> it.
>
> St.Ack
>
> On Wed, Jan 12, 2011 at 7:51 PM, Tao Xie <xi...@gmail.com> wrote:
> > hi, I know generally regionserver manages HRegions and in the HDFS layer
> > data in HRegion are stored as HFile format. I want to know whether HFiles
> > are all open and things lke block index are all loaded first to improve
> > lookup performance? If so, what will happen if exceeding memory limit?
> >
> > Thanks.
> >
>

Re: Will all HFiles managed by a regionserver kept open

Posted by Stack <st...@duboce.net>.
Yes, all files are opened on startup and kept open.  Open of an hbase
storefile/hfile includes loading up of the file index and metadata.
In our experience, this overhead has been small.  Its currently not
accounted for in our general memory-counting.  We should for sure add
it.

St.Ack

On Wed, Jan 12, 2011 at 7:51 PM, Tao Xie <xi...@gmail.com> wrote:
> hi, I know generally regionserver manages HRegions and in the HDFS layer
> data in HRegion are stored as HFile format. I want to know whether HFiles
> are all open and things lke block index are all loaded first to improve
> lookup performance? If so, what will happen if exceeding memory limit?
>
> Thanks.
>