You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Tudor Vlad (JIRA)" <ji...@apache.org> on 2010/05/10 21:18:30 UTC
[jira] Created: (MAPREDUCE-1781) option "-D
mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is
bigger than no of nodes - always spawns 2 mapers/node
option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node
--------------------------------------------------------------------------------------------------------------------------------------------
Key: MAPREDUCE-1781
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.20.2
Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
Reporter: Tudor Vlad
Hello
I am a new user of Hadoop and I have some trouble using Hadoop Streaming and the "-D mapred.tasktracker.map.tasks.maximum" option.
I'm experimenting with an unmanaged application (C++) which I want to run over several nodes in 2 scenarios
1) the number of maps (input splits) is equal to the number of nodes
2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...
Initially, when running the tests in scenario 1 I would sometimes get 2 process/node on half the nodes. However I fixed this by adding the optin "-D mapred.tasktracker.map.tasks.maximum=1", so everything works fine.
In the case of scenario 2 (more maps than nodes) this directive no longer works, always obtaining 2 processes/node. I tested the even with putting maximum=5 and I still get 2 processes/node.
The entire command I use is:
/usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F |\t-ContxtSwitch:\t%w" \
/opt/hadoop/bin/hadoop jar /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D mapred.tasktracker.map.tasks.maximum=1 \
-D mapred.map.tasks=30 \
-D mapred.reduce.tasks=0 \
-D io.file.buffer.size=5242880 \
-libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
-input input/test \
-output out1 \
-mapper "/opt/jobdata/script_1k" \
-inputformat "me.MyInputFormat"
Why is this happening and how can I make it work properly (i.e. be able to limit exactly how many mappers I can have at 1 time per node)?
Thank you in advance
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1781) option "-D
mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is
bigger than no of nodes - always spawns 2 mapers/node
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu resolved MAPREDUCE-1781.
------------------------------------------------
Resolution: Invalid
bq. Regarding the initial problem, I think it would help a lot of people (especially new users) to specify in the config page[ http://hadoop.apache.org/common/docs/current/mapred-default.html ] which parameters are set at startup and which at job runtime.
In branch 0.21, the configuration names are standardized through MAPREDUCE-849. The configuration names with prefix as mapreduce.cluster/mapreduce.jobtracker/mapreduce.tasktracker are server level configurations and need to be setup before the cluster is brought up. The other configurations with prefix mapreduce.job/mapreduce.task/mapreduce.map/mapreduce.reduce are job level configurations.
Documenting all of them in mapred-default is being tracked in MAPREDUCE-1021.
Closing this as invalid.
> option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.20.2
> Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
> Reporter: Tudor Vlad
>
> Hello
> I am a new user of Hadoop and I have some trouble using Hadoop Streaming and the "-D mapred.tasktracker.map.tasks.maximum" option.
> I'm experimenting with an unmanaged application (C++) which I want to run over several nodes in 2 scenarios
> 1) the number of maps (input splits) is equal to the number of nodes
> 2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...
> Initially, when running the tests in scenario 1 I would sometimes get 2 process/node on half the nodes. However I fixed this by adding the optin "-D mapred.tasktracker.map.tasks.maximum=1", so everything works fine.
> In the case of scenario 2 (more maps than nodes) this directive no longer works, always obtaining 2 processes/node. I tested the even with putting maximum=5 and I still get 2 processes/node.
> The entire command I use is:
> /usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F |\t-ContxtSwitch:\t%w" \
> /opt/hadoop/bin/hadoop jar /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
> -D mapred.tasktracker.map.tasks.maximum=1 \
> -D mapred.map.tasks=30 \
> -D mapred.reduce.tasks=0 \
> -D io.file.buffer.size=5242880 \
> -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
> -input input/test \
> -output out1 \
> -mapper "/opt/jobdata/script_1k" \
> -inputformat "me.MyInputFormat"
> Why is this happening and how can I make it work properly (i.e. be able to limit exactly how many mappers I can have at 1 time per node)?
> Thank you in advance
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.