You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Iulia Zidaru <iu...@1and1.ro> on 2011/04/22 11:17:55 UTC

HDFS and HBase heap

  Hi all,

Supposing we have to constantly hit all data stored, which is a good 
report between the HDFS space used and the HBase heap size allocated per 
node? Do you calculate it somehow?
Also, is there a report between the hadoop heap size and the hbase heap 
size that we should take into consideration?

Thank you,
Iulia

Re: HDFS and HBase heap

Posted by Iulia Zidaru <iu...@1and1.ro>.

  Thank you J-D. It's much clear now.

On 04/26/2011 08:36 PM, Jean-Daniel Cryans wrote:
> This is the doc for that parameter:
>
> Percentage of maximum heap (-Xmx setting) to allocate to block cache
> used by HFile/StoreFile. Default of 0.2 means allocate 20%. Set to 0
> to disable.
>
> When reading, it first checks the block cache to see if the block is
> available and if not then it will go to HDFS and add it to the block
> cache (unless told not to by methods like setCaching(false) on Get and
> Scan). This is possible since the files are immutable.
>
> When doing Put, it will never read any block. When doing a Delete
> without specifying a timestamp, it does have to read so block caching
> helps there too.
>
> Hope this helps,
>
> J-D
>
> On Tue, Apr 26, 2011 at 12:36 AM, Iulia Zidaru<iu...@1and1.ro>  wrote:
>> Thank you very much J-D. I'll definitely try the patch.
>> Regarding the block cache, could you give me some details.
>> Is it controlled by hfile.block.cache.size parameter? I've checked most of
>> the parameters related to store&heap and this one I really don't understand.
>> For loading every block there is a disk access? If adding data in a
>> region(randomly), is the cache important or only for random get/ scan and
>> delete?
>>
>> Thank you,
>> Iulia
>>
>>
>> On 04/22/2011 09:21 PM, Jean-Daniel Cryans wrote:
>>
>> The datanodes don't consume much memory, we run ours with 1GB and give
>> the rest to the region servers.
>>
>> BTW if you want to serve the whole dataset, depending on your SLA, you
>> might want to try HDFS-347 since concurrent HDFS access is rather
>> slow. The other choice would be to make sure you can hold everything
>> in the block cache so that means very little data per region server.
>>
>> J-D
>>
>> On Fri, Apr 22, 2011 at 2:17 AM, Iulia Zidaru<iu...@1and1.ro>  wrote:
>>
>>   Hi all,
>>
>> Supposing we have to constantly hit all data stored, which is a good report
>> between the HDFS space used and the HBase heap size allocated per node? Do
>> you calculate it somehow?
>> Also, is there a report between the hadoop heap size and the hbase heap size
>> that we should take into consideration?
>>
>> Thank you,
>> Iulia
>>
>>
>>
>>
>>


-- 
Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
iulia.zidaru@1and1.ro
0040 31 223 9153

Re: HDFS and HBase heap

Posted by Jean-Daniel Cryans <jd...@apache.org>.

This is the doc for that parameter:

Percentage of maximum heap (-Xmx setting) to allocate to block cache
used by HFile/StoreFile. Default of 0.2 means allocate 20%. Set to 0
to disable.

When reading, it first checks the block cache to see if the block is
available and if not then it will go to HDFS and add it to the block
cache (unless told not to by methods like setCaching(false) on Get and
Scan). This is possible since the files are immutable.

When doing Put, it will never read any block. When doing a Delete
without specifying a timestamp, it does have to read so block caching
helps there too.

Hope this helps,

J-D

On Tue, Apr 26, 2011 at 12:36 AM, Iulia Zidaru <iu...@1and1.ro> wrote:
> Thank you very much J-D. I'll definitely try the patch.
> Regarding the block cache, could you give me some details.
> Is it controlled by hfile.block.cache.size parameter? I've checked most of
> the parameters related to store&heap and this one I really don't understand.
> For loading every block there is a disk access? If adding data in a
> region(randomly), is the cache important or only for random get/ scan and
> delete?
>
> Thank you,
> Iulia
>
>
> On 04/22/2011 09:21 PM, Jean-Daniel Cryans wrote:
>
> The datanodes don't consume much memory, we run ours with 1GB and give
> the rest to the region servers.
>
> BTW if you want to serve the whole dataset, depending on your SLA, you
> might want to try HDFS-347 since concurrent HDFS access is rather
> slow. The other choice would be to make sure you can hold everything
> in the block cache so that means very little data per region server.
>
> J-D
>
> On Fri, Apr 22, 2011 at 2:17 AM, Iulia Zidaru <iu...@1and1.ro> wrote:
>
>  Hi all,
>
> Supposing we have to constantly hit all data stored, which is a good report
> between the HDFS space used and the HBase heap size allocated per node? Do
> you calculate it somehow?
> Also, is there a report between the hadoop heap size and the hbase heap size
> that we should take into consideration?
>
> Thank you,
> Iulia
>
>
>
>
>

Re: HDFS and HBase heap

Posted by Iulia Zidaru <iu...@1and1.ro>.

  Thank you very much J-D. I'll definitely try the patch.
Regarding the block cache, could you give me some details.
Is it controlled by /hfile.block.cache.size /parameter? I've checked 
most of the parameters related to store&heap and this one I really don't 
understand. For loading every block there is a disk access? If adding 
data in a region(randomly), is the cache important or only for random 
get/ scan and delete?

Thank you,
Iulia

On 04/22/2011 09:21 PM, Jean-Daniel Cryans wrote:
> The datanodes don't consume much memory, we run ours with 1GB and give
> the rest to the region servers.
>
> BTW if you want to serve the whole dataset, depending on your SLA, you
> might want to try HDFS-347 since concurrent HDFS access is rather
> slow. The other choice would be to make sure you can hold everything
> in the block cache so that means very little data per region server.
>
> J-D
>
> On Fri, Apr 22, 2011 at 2:17 AM, Iulia Zidaru<iu...@1and1.ro>  wrote:
>>   Hi all,
>>
>> Supposing we have to constantly hit all data stored, which is a good report
>> between the HDFS space used and the HBase heap size allocated per node? Do
>> you calculate it somehow?
>> Also, is there a report between the hadoop heap size and the hbase heap size
>> that we should take into consideration?
>>
>> Thank you,
>> Iulia
>>
>>
>>

Re: HDFS and HBase heap

Posted by Jean-Daniel Cryans <jd...@apache.org>.

The datanodes don't consume much memory, we run ours with 1GB and give
the rest to the region servers.

BTW if you want to serve the whole dataset, depending on your SLA, you
might want to try HDFS-347 since concurrent HDFS access is rather
slow. The other choice would be to make sure you can hold everything
in the block cache so that means very little data per region server.

J-D

On Fri, Apr 22, 2011 at 2:17 AM, Iulia Zidaru <iu...@1and1.ro> wrote:
>  Hi all,
>
> Supposing we have to constantly hit all data stored, which is a good report
> between the HDFS space used and the HBase heap size allocated per node? Do
> you calculate it somehow?
> Also, is there a report between the hadoop heap size and the hbase heap size
> that we should take into consideration?
>
> Thank you,
> Iulia
>
>
>