You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Chih-Hsien Wu <ch...@gmail.com> on 2013/11/25 15:52:42 UTC

Relationship between heap sizes and mapred.child.java.opt configuration

I'm learning about Hadoop configuration. What is the connection between the
datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does one
have to be exceeded to another?

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Adam Kawa <ka...@gmail.com>.

> Thanks for the reply. So what is the purpose of heap sizes for
> tasktrackers and datanodes then?
>

TaskTrackers and DataNodes are long-living daemons (written in Java)
running on slave nodes in separate JVMs. I usually give at least 1GB to
each of them in production clusters.

> In other words, if I want to speed up the map/reducing cycle, can I just
> minimize the heap size and maximize the "mapred.child.java.opts?" or will
> the minimizing heap sizes causing out of memory exception?
>

The higher memory in mapred.child.java.opts is, the less frequently your
tasks are spilling key-value pairs to disks (so they are a bit efficient -
read also about configuration property called "io.sort.mb"). However, the
higher memory in mapred.child.java.opts is, the fewer tasks you can run on
your slave node. It is a some kind of trade-off.

If you do not tune "mapred.child.java.opts" correctly, then you might get
Out-Of-Memory error (if your job consumes more memory than
mapred.child.java.opts allows). If you run too many tasks on your slave
node, and you exceeded the amount of physical memory available in a node,
then

1) you can start swapping (in Hadoop, heavy swapping usually means that
node is super slow and it often becomes considered "dead"),
2) or the kernel Out-Of-Memory Killer can start killing your TaskTrackers
and tasks started by it.

You can read about issues with swapping and OOM killer on my blog post:
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/

>
>
> On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:
>
>> mapred.child.java.opts are referring to the settings for the JVMs spawned
>> by the TaskTracker. This JVMs will actually run the tasks (mappers and
>> reducers)
>>
>> The heap sizes for TaskTrackers and DataNodes are unrelated to those.
>> They run in their own JVMs each.
>>
>> Kai
>>
>> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>>
>> I'm learning about Hadoop configuration. What is the connection between
>> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
>> one have to be exceeded to another?
>>
>>
>>   ------------------------------
>> *Kai Voigt* Am Germaniahafen 1 k@123.org
>>  24143 Kiel +49 160 96683050
>>  Germany @KaiVoigt
>>
>>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Adam Kawa <ka...@gmail.com>.

> Thanks for the reply. So what is the purpose of heap sizes for
> tasktrackers and datanodes then?
>

TaskTrackers and DataNodes are long-living daemons (written in Java)
running on slave nodes in separate JVMs. I usually give at least 1GB to
each of them in production clusters.

> In other words, if I want to speed up the map/reducing cycle, can I just
> minimize the heap size and maximize the "mapred.child.java.opts?" or will
> the minimizing heap sizes causing out of memory exception?
>

The higher memory in mapred.child.java.opts is, the less frequently your
tasks are spilling key-value pairs to disks (so they are a bit efficient -
read also about configuration property called "io.sort.mb"). However, the
higher memory in mapred.child.java.opts is, the fewer tasks you can run on
your slave node. It is a some kind of trade-off.

If you do not tune "mapred.child.java.opts" correctly, then you might get
Out-Of-Memory error (if your job consumes more memory than
mapred.child.java.opts allows). If you run too many tasks on your slave
node, and you exceeded the amount of physical memory available in a node,
then

1) you can start swapping (in Hadoop, heavy swapping usually means that
node is super slow and it often becomes considered "dead"),
2) or the kernel Out-Of-Memory Killer can start killing your TaskTrackers
and tasks started by it.

You can read about issues with swapping and OOM killer on my blog post:
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/

>
>
> On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:
>
>> mapred.child.java.opts are referring to the settings for the JVMs spawned
>> by the TaskTracker. This JVMs will actually run the tasks (mappers and
>> reducers)
>>
>> The heap sizes for TaskTrackers and DataNodes are unrelated to those.
>> They run in their own JVMs each.
>>
>> Kai
>>
>> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>>
>> I'm learning about Hadoop configuration. What is the connection between
>> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
>> one have to be exceeded to another?
>>
>>
>>   ------------------------------
>> *Kai Voigt* Am Germaniahafen 1 k@123.org
>>  24143 Kiel +49 160 96683050
>>  Germany @KaiVoigt
>>
>>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Adam Kawa <ka...@gmail.com>.

> Thanks for the reply. So what is the purpose of heap sizes for
> tasktrackers and datanodes then?
>

TaskTrackers and DataNodes are long-living daemons (written in Java)
running on slave nodes in separate JVMs. I usually give at least 1GB to
each of them in production clusters.

> In other words, if I want to speed up the map/reducing cycle, can I just
> minimize the heap size and maximize the "mapred.child.java.opts?" or will
> the minimizing heap sizes causing out of memory exception?
>

The higher memory in mapred.child.java.opts is, the less frequently your
tasks are spilling key-value pairs to disks (so they are a bit efficient -
read also about configuration property called "io.sort.mb"). However, the
higher memory in mapred.child.java.opts is, the fewer tasks you can run on
your slave node. It is a some kind of trade-off.

If you do not tune "mapred.child.java.opts" correctly, then you might get
Out-Of-Memory error (if your job consumes more memory than
mapred.child.java.opts allows). If you run too many tasks on your slave
node, and you exceeded the amount of physical memory available in a node,
then

1) you can start swapping (in Hadoop, heavy swapping usually means that
node is super slow and it often becomes considered "dead"),
2) or the kernel Out-Of-Memory Killer can start killing your TaskTrackers
and tasks started by it.

You can read about issues with swapping and OOM killer on my blog post:
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/

>
>
> On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:
>
>> mapred.child.java.opts are referring to the settings for the JVMs spawned
>> by the TaskTracker. This JVMs will actually run the tasks (mappers and
>> reducers)
>>
>> The heap sizes for TaskTrackers and DataNodes are unrelated to those.
>> They run in their own JVMs each.
>>
>> Kai
>>
>> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>>
>> I'm learning about Hadoop configuration. What is the connection between
>> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
>> one have to be exceeded to another?
>>
>>
>>   ------------------------------
>> *Kai Voigt* Am Germaniahafen 1 k@123.org
>>  24143 Kiel +49 160 96683050
>>  Germany @KaiVoigt
>>
>>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Adam Kawa <ka...@gmail.com>.

> Thanks for the reply. So what is the purpose of heap sizes for
> tasktrackers and datanodes then?
>

TaskTrackers and DataNodes are long-living daemons (written in Java)
running on slave nodes in separate JVMs. I usually give at least 1GB to
each of them in production clusters.

> In other words, if I want to speed up the map/reducing cycle, can I just
> minimize the heap size and maximize the "mapred.child.java.opts?" or will
> the minimizing heap sizes causing out of memory exception?
>

The higher memory in mapred.child.java.opts is, the less frequently your
tasks are spilling key-value pairs to disks (so they are a bit efficient -
read also about configuration property called "io.sort.mb"). However, the
higher memory in mapred.child.java.opts is, the fewer tasks you can run on
your slave node. It is a some kind of trade-off.

If you do not tune "mapred.child.java.opts" correctly, then you might get
Out-Of-Memory error (if your job consumes more memory than
mapred.child.java.opts allows). If you run too many tasks on your slave
node, and you exceeded the amount of physical memory available in a node,
then

1) you can start swapping (in Hadoop, heavy swapping usually means that
node is super slow and it often becomes considered "dead"),
2) or the kernel Out-Of-Memory Killer can start killing your TaskTrackers
and tasks started by it.

You can read about issues with swapping and OOM killer on my blog post:
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/

>
>
> On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:
>
>> mapred.child.java.opts are referring to the settings for the JVMs spawned
>> by the TaskTracker. This JVMs will actually run the tasks (mappers and
>> reducers)
>>
>> The heap sizes for TaskTrackers and DataNodes are unrelated to those.
>> They run in their own JVMs each.
>>
>> Kai
>>
>> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>>
>> I'm learning about Hadoop configuration. What is the connection between
>> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
>> one have to be exceeded to another?
>>
>>
>>   ------------------------------
>> *Kai Voigt* Am Germaniahafen 1 k@123.org
>>  24143 Kiel +49 160 96683050
>>  Germany @KaiVoigt
>>
>>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Chih-Hsien Wu <ch...@gmail.com>.

Thanks for the reply. So what is the purpose of heap sizes for tasktrackers
and datanodes then? In other words, if I want to speed up the map/reducing
cycle, can I just minimize the heap size and maximize the
"mapred.child.java.opts?" or will the minimizing heap sizes causing out of
memory exception?


On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:

> mapred.child.java.opts are referring to the settings for the JVMs spawned
> by the TaskTracker. This JVMs will actually run the tasks (mappers and
> reducers)
>
> The heap sizes for TaskTrackers and DataNodes are unrelated to those. They
> run in their own JVMs each.
>
> Kai
>
> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>
> I'm learning about Hadoop configuration. What is the connection between
> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
> one have to be exceeded to another?
>
>
> ------------------------------
> *Kai Voigt* Am Germaniahafen 1 k@123.org
> 24143 Kiel +49 160 96683050
> Germany @KaiVoigt
>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Chih-Hsien Wu <ch...@gmail.com>.

Thanks for the reply. So what is the purpose of heap sizes for tasktrackers
and datanodes then? In other words, if I want to speed up the map/reducing
cycle, can I just minimize the heap size and maximize the
"mapred.child.java.opts?" or will the minimizing heap sizes causing out of
memory exception?


On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:

> mapred.child.java.opts are referring to the settings for the JVMs spawned
> by the TaskTracker. This JVMs will actually run the tasks (mappers and
> reducers)
>
> The heap sizes for TaskTrackers and DataNodes are unrelated to those. They
> run in their own JVMs each.
>
> Kai
>
> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>
> I'm learning about Hadoop configuration. What is the connection between
> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
> one have to be exceeded to another?
>
>
> ------------------------------
> *Kai Voigt* Am Germaniahafen 1 k@123.org
> 24143 Kiel +49 160 96683050
> Germany @KaiVoigt
>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Chih-Hsien Wu <ch...@gmail.com>.

Thanks for the reply. So what is the purpose of heap sizes for tasktrackers
and datanodes then? In other words, if I want to speed up the map/reducing
cycle, can I just minimize the heap size and maximize the
"mapred.child.java.opts?" or will the minimizing heap sizes causing out of
memory exception?


On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:

> mapred.child.java.opts are referring to the settings for the JVMs spawned
> by the TaskTracker. This JVMs will actually run the tasks (mappers and
> reducers)
>
> The heap sizes for TaskTrackers and DataNodes are unrelated to those. They
> run in their own JVMs each.
>
> Kai
>
> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>
> I'm learning about Hadoop configuration. What is the connection between
> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
> one have to be exceeded to another?
>
>
> ------------------------------
> *Kai Voigt* Am Germaniahafen 1 k@123.org
> 24143 Kiel +49 160 96683050
> Germany @KaiVoigt
>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Chih-Hsien Wu <ch...@gmail.com>.

Thanks for the reply. So what is the purpose of heap sizes for tasktrackers
and datanodes then? In other words, if I want to speed up the map/reducing
cycle, can I just minimize the heap size and maximize the
"mapred.child.java.opts?" or will the minimizing heap sizes causing out of
memory exception?


On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k...@123.org> wrote:

> mapred.child.java.opts are referring to the settings for the JVMs spawned
> by the TaskTracker. This JVMs will actually run the tasks (mappers and
> reducers)
>
> The heap sizes for TaskTrackers and DataNodes are unrelated to those. They
> run in their own JVMs each.
>
> Kai
>
> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:
>
> I'm learning about Hadoop configuration. What is the connection between
> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
> one have to be exceeded to another?
>
>
> ------------------------------
> *Kai Voigt* Am Germaniahafen 1 k@123.org
> 24143 Kiel +49 160 96683050
> Germany @KaiVoigt
>
>

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Kai Voigt <k...@123.org>.

mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers)

The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each.

Kai

Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:

> I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does one have to be exceeded to another? 

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel						+49 160 96683050
					Germany						@KaiVoigt

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Kai Voigt <k...@123.org>.

mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers)

The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each.

Kai

Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:

> I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does one have to be exceeded to another? 

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel						+49 160 96683050
					Germany						@KaiVoigt

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Kai Voigt <k...@123.org>.

mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers)

The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each.

Kai

Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:

> I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does one have to be exceeded to another? 

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel						+49 160 96683050
					Germany						@KaiVoigt

Re: Relationship between heap sizes and mapred.child.java.opt configuration

Posted by Kai Voigt <k...@123.org>.

mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers)

The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each.

Kai

Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <ch...@gmail.com>:

> I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does one have to be exceeded to another? 

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel						+49 160 96683050
					Germany						@KaiVoigt