You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by 曹楠楠 <mi...@gmail.com> on 2009/10/12 07:37:35 UTC

About the memory file system, any suggestions?

Hi all :
I try to use the memory file system in hadoop. the idea is very simple. I
want to use memory file system to  the map intermediate file. It is like
this; 1. the memory is limited, the data will be written into the disk. 2.If
the file in memory is deleted and there are space in memory, the data will
be prefetched by a thread into the memory.3.If the data is not in memory,
then read it directly from disk.

But when I try to implement it in hadoop. I find that when the tasktracker
receive a new map or reduce task, it will start a new process. If I use the
memory file system, the intermediate file will be written into map task
process address space. And task tracker can't access to it. So any
suggestions?

Thanks a lot :)

Re: About the memory file system, any suggestions?

Posted by Jason Venner <ja...@gmail.com>.
You could use the jvm reuse features, and static objects will persist across
tasks.
They will not persist across jobs.
In the prohadoop book example code, there is a jvm reuse example that
demonstrates this.
com.apress.hadoopbook.examples.advancedtechniques.JVMReuseAndStaticInitializers

On Sun, Oct 11, 2009 at 10:37 PM, 曹楠楠 <mi...@gmail.com> wrote:

> Hi all :
> I try to use the memory file system in hadoop. the idea is very simple. I
> want to use memory file system to  the map intermediate file. It is like
> this; 1. the memory is limited, the data will be written into the disk.
> 2.If
> the file in memory is deleted and there are space in memory, the data will
> be prefetched by a thread into the memory.3.If the data is not in memory,
> then read it directly from disk.
>
> But when I try to implement it in hadoop. I find that when the tasktracker
> receive a new map or reduce task, it will start a new process. If I use the
> memory file system, the intermediate file will be written into map task
> process address space. And task tracker can't access to it. So any
> suggestions?
>
> Thanks a lot :)
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals