You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/20 17:50:29 UTC

MapFile inner workings

Hi All

I know this is a tall ask. I am going through the source code. But could
someone please tell me the intuition behind the design of the MapFile class.
If I were using the MapFile against the local file system are there any
limitations to the number of items I can store. I mean can I have a MapFile
on the local filesystem that has say 10GB of data. The reason I ask this is
because I did read in the documentation that it behooves one to keep the key
small since the index is completely kept in memory. Could someone please
enlighten me ?

Thanks
Avinash

Re: MapFile inner workings

Posted by Doug Cutting <cu...@apache.org>.
Every 128th key is held in memory.  So if you've got 1M keys in a 
MapFile, then opening a MapFile.Reader would read 10k keys into memory. 
  Binary search is used on these in-memory keys, so that a maximum of 
127 entries must be scanned per random access.

Doug

Phantom wrote:
> Hi All
> 
> I know this is a tall ask. I am going through the source code. But could
> someone please tell me the intuition behind the design of the MapFile 
> class.
> If I were using the MapFile against the local file system are there any
> limitations to the number of items I can store. I mean can I have a MapFile
> on the local filesystem that has say 10GB of data. The reason I ask this is
> because I did read in the documentation that it behooves one to keep the 
> key
> small since the index is completely kept in memory. Could someone please
> enlighten me ?
> 
> Thanks
> Avinash
>