You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/01/19 19:40:46 UTC
Tables "in memory", not cached, etc. - performance advice needed
Hi,
I've searched high and low (including on our own search-hadoop.com, Google,
HBase wiki), but could not find the details around marking a table as being "in
memory". This recent ML response from Lars was the best I could find:
http://search-hadoop.com/m/0S2mB1QDpIh
Also, my use-case is similar to what o.p. in that thread described:
* 1 big table with raw data that is constantly being written to and from which a
MR job reads some small percentage of rows every N minutes. The MR job never
reads the same row twice - it reads only rows inserted after its last run.
* 1 smaller table that the MR job writes to every N minutes and from which data
is read via scans by users.
It sounds like the main suggestions are:
1) don't foolishly waste precious RAM/heap on the big table whose rows are read
just once
2) mark the smaller and frequently scanned table as "in memory"
Would anyone have any other performance-related advice that may be applicable to
this specific setup?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/