You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2008/12/09 20:02:25 UTC

[Hadoop Wiki] Update of "Hbase/ColumnCaching" by jgray

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by jgray:
http://wiki.apache.org/hadoop/Hbase/ColumnCaching

New page:
Key/Val Cache

Overall goal: 
	To early out at as many places as possible.

Overall structure:
	The cache is implemented in Memcache.java as an extra step to find the requested columns
	Add early outs in HStore.java to check if the data needed is returned from memCache.
	
	Add logger for stats, to see hits and misses in the cache.

Structure of cache:
	cacheKey(row+column) -> Value
	//and maybe row:family -> fullFlag(to indicate that all columns of the latest timestamp is in cache)
	and a hashset with the rows:families that are full
	cacheKey class with byte[] row, byte[] column
	and compareTo functions, to sort the inputs, should look like HStoreKey, but minimized since there is
	no need for timestamp and versions and so on, to save memory.
	
	//Need a way to remove data from cache, when it is full, easiest probably to have a seperate list(FIFO) with just the
	//keys. 
	Start with soft references


Input From:
	get - memCache
		Check if key is in the cache, otherwise get it from mapfiles, put in cache on the way out
		Need to check the the number of versions wanted is 1, to use the k/V cache
	
	getFull(multiple columns) - memCache
		Check if keys are in cache, otherwise get them from mapfiles, put in cache in way out
		
	getFull(all columns) - memCache
		Check if fullFlag is set, get all columns from cache, if set abort, otherwise fetch remaining from mapfiles, put them in cache and set flag on way out 
	
	deleteAll - done via add in memCache
		Delete row from cache, return ?
	
	deleteFamily - done via add in memCache
		Delete row from cache, return ?	
	
	add - memCache
		Check fullFlag, if set apply all the changes to row, otherwise do nothing to cache.	
	
	batchUpdate - done via add in memCache
		Check fullFlag, if set apply all the changes to row, otherwise do nothing to cache.
		
		
		
		
		
		
Things to change: 
	In MemCache.getFull(), add early out if everything was found in memCache, do not need to check snapShot?