You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Lin Ma <li...@gmail.com> on 2012/11/06 02:12:38 UTC

One mapper/reducer runs on a single JVM

Hello Hadoop experts,

I have a question in my mind for a long time. Supposing I am developing M-R
program, and it is Java based (Java UDF, implements mapper or reducer
interface). My question is, in this scenario, whether a mapper or a reducer
is a separate JVM process? E.g. supposing on a machine, there are 4
mappers, they are 4 individual processes? I am also wondering whether the
processes on a single machine will impact each other when each JVM wants to
get more memory to run faster?

thanks in advance,
Lin

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Mike.

1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them
share dedicated exclusive physical memory space for heap memory management
(which means if one process consuming too much memory which causes swap
will NOT impact others)? Or all the JVMs share the same physical memory
pool (which means if one process consuming too much memory which causes
swap will impact others)?
3. Any best practices to avoid swap in Hadoop and HBase use case?

regards,
Lin


On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <mi...@hotmail.com>wrote:

> If you exceed the amount of physical memory available, memory pages will
> be written to disk in a temp space. The act of 'swapping' the memory pages
> from memory to disk and back again is known as 'swap'.
>
> HBase is highly sensitive to the latency of swapping memory in and out of
> physical memory to disk. You need to avoid swap when running HBase.  It
> will crash a region server and ultimately you can end up with a cascading
> failure and HBase will go down.
>
> HTH
>
> -Mike
>
> On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:
>
> Thanks Michael,
>
> "If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it." -- could you give a bit more information about
> what do you mean swap and why forget for HBase?
>
> regards,
> Lin
>
>
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:
>
>> Mappers and Reducers are separate JVM processes.
>> And yes you need to take in to account the amount of memory the
>> machine(s) when you configure the number of slots.
>>
>> If you are running just Hadoop, you could have a little swap. Running
>> HBase, fuggit about it.
>>
>>
>> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hello Hadoop experts,
>> >
>> > I have a question in my mind for a long time. Supposing I am developing
>> M-R program, and it is Java based (Java UDF, implements mapper or reducer
>> interface). My question is, in this scenario, whether a mapper or a reducer
>> is a separate JVM process? E.g. supposing on a machine, there are 4
>> mappers, they are 4 individual processes? I am also wondering whether the
>> processes on a single machine will impact each other when each JVM wants to
>> get more memory to run faster?
>> >
>> > thanks in advance,
>> > Lin
>> >
>> >
>>
>>
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Mike.

1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them
share dedicated exclusive physical memory space for heap memory management
(which means if one process consuming too much memory which causes swap
will NOT impact others)? Or all the JVMs share the same physical memory
pool (which means if one process consuming too much memory which causes
swap will impact others)?
3. Any best practices to avoid swap in Hadoop and HBase use case?

regards,
Lin


On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <mi...@hotmail.com>wrote:

> If you exceed the amount of physical memory available, memory pages will
> be written to disk in a temp space. The act of 'swapping' the memory pages
> from memory to disk and back again is known as 'swap'.
>
> HBase is highly sensitive to the latency of swapping memory in and out of
> physical memory to disk. You need to avoid swap when running HBase.  It
> will crash a region server and ultimately you can end up with a cascading
> failure and HBase will go down.
>
> HTH
>
> -Mike
>
> On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:
>
> Thanks Michael,
>
> "If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it." -- could you give a bit more information about
> what do you mean swap and why forget for HBase?
>
> regards,
> Lin
>
>
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:
>
>> Mappers and Reducers are separate JVM processes.
>> And yes you need to take in to account the amount of memory the
>> machine(s) when you configure the number of slots.
>>
>> If you are running just Hadoop, you could have a little swap. Running
>> HBase, fuggit about it.
>>
>>
>> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hello Hadoop experts,
>> >
>> > I have a question in my mind for a long time. Supposing I am developing
>> M-R program, and it is Java based (Java UDF, implements mapper or reducer
>> interface). My question is, in this scenario, whether a mapper or a reducer
>> is a separate JVM process? E.g. supposing on a machine, there are 4
>> mappers, they are 4 individual processes? I am also wondering whether the
>> processes on a single machine will impact each other when each JVM wants to
>> get more memory to run faster?
>> >
>> > thanks in advance,
>> > Lin
>> >
>> >
>>
>>
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Mike.

1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them
share dedicated exclusive physical memory space for heap memory management
(which means if one process consuming too much memory which causes swap
will NOT impact others)? Or all the JVMs share the same physical memory
pool (which means if one process consuming too much memory which causes
swap will impact others)?
3. Any best practices to avoid swap in Hadoop and HBase use case?

regards,
Lin


On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <mi...@hotmail.com>wrote:

> If you exceed the amount of physical memory available, memory pages will
> be written to disk in a temp space. The act of 'swapping' the memory pages
> from memory to disk and back again is known as 'swap'.
>
> HBase is highly sensitive to the latency of swapping memory in and out of
> physical memory to disk. You need to avoid swap when running HBase.  It
> will crash a region server and ultimately you can end up with a cascading
> failure and HBase will go down.
>
> HTH
>
> -Mike
>
> On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:
>
> Thanks Michael,
>
> "If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it." -- could you give a bit more information about
> what do you mean swap and why forget for HBase?
>
> regards,
> Lin
>
>
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:
>
>> Mappers and Reducers are separate JVM processes.
>> And yes you need to take in to account the amount of memory the
>> machine(s) when you configure the number of slots.
>>
>> If you are running just Hadoop, you could have a little swap. Running
>> HBase, fuggit about it.
>>
>>
>> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hello Hadoop experts,
>> >
>> > I have a question in my mind for a long time. Supposing I am developing
>> M-R program, and it is Java based (Java UDF, implements mapper or reducer
>> interface). My question is, in this scenario, whether a mapper or a reducer
>> is a separate JVM process? E.g. supposing on a machine, there are 4
>> mappers, they are 4 individual processes? I am also wondering whether the
>> processes on a single machine will impact each other when each JVM wants to
>> get more memory to run faster?
>> >
>> > thanks in advance,
>> > Lin
>> >
>> >
>>
>>
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Mike.

1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 mappers run as 4 JVMs on one machine. Do each of them
share dedicated exclusive physical memory space for heap memory management
(which means if one process consuming too much memory which causes swap
will NOT impact others)? Or all the JVMs share the same physical memory
pool (which means if one process consuming too much memory which causes
swap will impact others)?
3. Any best practices to avoid swap in Hadoop and HBase use case?

regards,
Lin


On Wed, Nov 7, 2012 at 12:27 AM, Michael Segel <mi...@hotmail.com>wrote:

> If you exceed the amount of physical memory available, memory pages will
> be written to disk in a temp space. The act of 'swapping' the memory pages
> from memory to disk and back again is known as 'swap'.
>
> HBase is highly sensitive to the latency of swapping memory in and out of
> physical memory to disk. You need to avoid swap when running HBase.  It
> will crash a region server and ultimately you can end up with a cascading
> failure and HBase will go down.
>
> HTH
>
> -Mike
>
> On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:
>
> Thanks Michael,
>
> "If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it." -- could you give a bit more information about
> what do you mean swap and why forget for HBase?
>
> regards,
> Lin
>
>
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:
>
>> Mappers and Reducers are separate JVM processes.
>> And yes you need to take in to account the amount of memory the
>> machine(s) when you configure the number of slots.
>>
>> If you are running just Hadoop, you could have a little swap. Running
>> HBase, fuggit about it.
>>
>>
>> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hello Hadoop experts,
>> >
>> > I have a question in my mind for a long time. Supposing I am developing
>> M-R program, and it is Java based (Java UDF, implements mapper or reducer
>> interface). My question is, in this scenario, whether a mapper or a reducer
>> is a separate JVM process? E.g. supposing on a machine, there are 4
>> mappers, they are 4 individual processes? I am also wondering whether the
>> processes on a single machine will impact each other when each JVM wants to
>> get more memory to run faster?
>> >
>> > thanks in advance,
>> > Lin
>> >
>> >
>>
>>
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
If you exceed the amount of physical memory available, memory pages will be written to disk in a temp space. The act of 'swapping' the memory pages from memory to disk and back again is known as 'swap'. 

HBase is highly sensitive to the latency of swapping memory in and out of physical memory to disk. You need to avoid swap when running HBase.  It will crash a region server and ultimately you can end up with a cascading failure and HBase will go down. 

HTH

-Mike

On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:

> Thanks Michael,
> 
> "If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase?
> 
> regards,
> Lin
> 
> 
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com> wrote:
> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots.
> 
> If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it.
> 
> 
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
> 
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
If you exceed the amount of physical memory available, memory pages will be written to disk in a temp space. The act of 'swapping' the memory pages from memory to disk and back again is known as 'swap'. 

HBase is highly sensitive to the latency of swapping memory in and out of physical memory to disk. You need to avoid swap when running HBase.  It will crash a region server and ultimately you can end up with a cascading failure and HBase will go down. 

HTH

-Mike

On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:

> Thanks Michael,
> 
> "If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase?
> 
> regards,
> Lin
> 
> 
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com> wrote:
> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots.
> 
> If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it.
> 
> 
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
> 
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
If you exceed the amount of physical memory available, memory pages will be written to disk in a temp space. The act of 'swapping' the memory pages from memory to disk and back again is known as 'swap'. 

HBase is highly sensitive to the latency of swapping memory in and out of physical memory to disk. You need to avoid swap when running HBase.  It will crash a region server and ultimately you can end up with a cascading failure and HBase will go down. 

HTH

-Mike

On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:

> Thanks Michael,
> 
> "If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase?
> 
> regards,
> Lin
> 
> 
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com> wrote:
> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots.
> 
> If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it.
> 
> 
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
> 
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
If you exceed the amount of physical memory available, memory pages will be written to disk in a temp space. The act of 'swapping' the memory pages from memory to disk and back again is known as 'swap'. 

HBase is highly sensitive to the latency of swapping memory in and out of physical memory to disk. You need to avoid swap when running HBase.  It will crash a region server and ultimately you can end up with a cascading failure and HBase will go down. 

HTH

-Mike

On Nov 5, 2012, at 11:06 PM, Lin Ma <li...@gmail.com> wrote:

> Thanks Michael,
> 
> "If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it." -- could you give a bit more information about what do you mean swap and why forget for HBase?
> 
> regards,
> Lin
> 
> 
> On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com> wrote:
> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots.
> 
> If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it.
> 
> 
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
> 
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Michael,

"If you are running just Hadoop, you could have a little swap. Running
HBase, fuggit about it." -- could you give a bit more information about
what do you mean swap and why forget for HBase?

regards,
Lin


On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:

> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s)
> when you configure the number of slots.
>
> If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it.
>
>
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing
> M-R program, and it is Java based (Java UDF, implements mapper or reducer
> interface). My question is, in this scenario, whether a mapper or a reducer
> is a separate JVM process? E.g. supposing on a machine, there are 4
> mappers, they are 4 individual processes? I am also wondering whether the
> processes on a single machine will impact each other when each JVM wants to
> get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Michael,

"If you are running just Hadoop, you could have a little swap. Running
HBase, fuggit about it." -- could you give a bit more information about
what do you mean swap and why forget for HBase?

regards,
Lin


On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:

> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s)
> when you configure the number of slots.
>
> If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it.
>
>
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing
> M-R program, and it is Java based (Java UDF, implements mapper or reducer
> interface). My question is, in this scenario, whether a mapper or a reducer
> is a separate JVM process? E.g. supposing on a machine, there are 4
> mappers, they are 4 individual processes? I am also wondering whether the
> processes on a single machine will impact each other when each JVM wants to
> get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Michael,

"If you are running just Hadoop, you could have a little swap. Running
HBase, fuggit about it." -- could you give a bit more information about
what do you mean swap and why forget for HBase?

regards,
Lin


On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:

> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s)
> when you configure the number of slots.
>
> If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it.
>
>
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing
> M-R program, and it is Java based (Java UDF, implements mapper or reducer
> interface). My question is, in this scenario, whether a mapper or a reducer
> is a separate JVM process? E.g. supposing on a machine, there are 4
> mappers, they are 4 individual processes? I am also wondering whether the
> processes on a single machine will impact each other when each JVM wants to
> get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Lin Ma <li...@gmail.com>.
Thanks Michael,

"If you are running just Hadoop, you could have a little swap. Running
HBase, fuggit about it." -- could you give a bit more information about
what do you mean swap and why forget for HBase?

regards,
Lin


On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel <mi...@hotmail.com>wrote:

> Mappers and Reducers are separate JVM processes.
> And yes you need to take in to account the amount of memory the machine(s)
> when you configure the number of slots.
>
> If you are running just Hadoop, you could have a little swap. Running
> HBase, fuggit about it.
>
>
> On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:
>
> > Hello Hadoop experts,
> >
> > I have a question in my mind for a long time. Supposing I am developing
> M-R program, and it is Java based (Java UDF, implements mapper or reducer
> interface). My question is, in this scenario, whether a mapper or a reducer
> is a separate JVM process? E.g. supposing on a machine, there are 4
> mappers, they are 4 individual processes? I am also wondering whether the
> processes on a single machine will impact each other when each JVM wants to
> get more memory to run faster?
> >
> > thanks in advance,
> > Lin
> >
> >
>
>

Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
Mappers and Reducers are separate JVM processes.
And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. 

If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. 


On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:

> Hello Hadoop experts,
> 
> I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> 
> thanks in advance,
> Lin
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
Mappers and Reducers are separate JVM processes.
And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. 

If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. 


On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:

> Hello Hadoop experts,
> 
> I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> 
> thanks in advance,
> Lin
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
Mappers and Reducers are separate JVM processes.
And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. 

If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. 


On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:

> Hello Hadoop experts,
> 
> I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> 
> thanks in advance,
> Lin
> 
> 


Re: One mapper/reducer runs on a single JVM

Posted by Michael Segel <mi...@hotmail.com>.
Mappers and Reducers are separate JVM processes.
And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. 

If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. 


On Nov 5, 2012, at 7:12 PM, Lin Ma <li...@gmail.com> wrote:

> Hello Hadoop experts,
> 
> I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a machine, there are 4 mappers, they are 4 individual processes? I am also wondering whether the processes on a single machine will impact each other when each JVM wants to get more memory to run faster?
> 
> thanks in advance,
> Lin
> 
>