You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/01/19 19:40:46 UTC

Tables "in memory", not cached, etc. - performance advice needed

Hi,

I've searched high and low (including on our own search-hadoop.com, Google, 
HBase wiki), but could not find the details around marking a table as being "in 
memory".  This recent ML response from Lars was the best I could find: 
http://search-hadoop.com/m/0S2mB1QDpIh

Also, my use-case is similar to what o.p. in that thread described:

* 1 big table with raw data that is constantly being written to and from which a 
MR job reads some small percentage of rows every N minutes.  The MR job never 
reads the same row twice - it reads only rows inserted after its last run.

* 1 smaller table that the MR job writes to every N minutes and from which data 
is read via scans by users.


It sounds like the main suggestions are:
1) don't foolishly waste precious RAM/heap on the big table whose rows are read 
just once
2) mark the smaller and frequently scanned table as "in memory"

Would anyone have any other performance-related advice that may be applicable to 
this specific setup?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/