You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2011/07/06 13:00:16 UTC

[jira] [Created] (MAPREDUCE-2647) Memory sharing across all the Tasks in the Task Tracker to improve the job performance

Memory sharing across all the Tasks in the Task Tracker to improve the job performance
--------------------------------------------------------------------------------------

Key: MAPREDUCE-2647
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2647
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: tasktracker
Reporter: Devaraj K
Assignee: Devaraj K

If all the tasks (maps/reduces) are using (working with) the same additional data to execute the map/reduce task, each task should load the data into memory individually and read the data. It is the additional effort for all the tasks to do the same job. Instead of loading the data by each task, data can be loaded into main memory and it can be used to execute all the tasks.

h5.Proposed Solution:
1. Provide a mechanism to load the data into shared memory and to read that data from main memory.
2. We can provide a java API, which internally uses the native implementation to read the data from the memory. All the maps/reducers can this API for reading the data from the main memory.

h5.Example:
Suppose in a map task, ip address is a key and it needs to get location of the ip address from a local file. In this case each map task should load the file into main memory and read from it and close it. It takes some time to open, read from the file and process every time. Instead of this, we can load the file in the task tracker memory and each task can read from the memory directly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira