You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/04/26 05:53:02 UTC

[jira] Created: (HADOOP-167) reducing the number of Configuration & JobConf objects created

reducing the number of Configuration & JobConf objects created
--------------------------------------------------------------

         Key: HADOOP-167
         URL: http://issues.apache.org/jira/browse/HADOOP-167
     Project: Hadoop
        Type: Improvement

  Components: conf  
    Versions: 0.1.1    
    Reporter: Owen O'Malley
 Assigned to: Owen O'Malley 
     Fix For: 0.2


Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-167) reducing the number of Configuration & JobConf objects created

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-167?page=all ]
     
Doug Cutting resolved HADOOP-167:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Owen!

> reducing the number of Configuration & JobConf objects created
> --------------------------------------------------------------
>
>          Key: HADOOP-167
>          URL: http://issues.apache.org/jira/browse/HADOOP-167
>      Project: Hadoop
>         Type: Improvement

>   Components: conf
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: remove-confs.patch
>
> Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376516 ] 

Michel Tourn commented on HADOOP-167:
-------------------------------------

Avoiding the multiple config-loading messages is a good thing.
This could also be controlled with a verbosity  / logging level setting.

Please don't remove the JobConf(Configuration) constructor.

This is the only mechanism available to programatically change your Configuration.
We rely on this in two ways: 
1. to select from multiple XML config files that correspond to multiple Hadoop systems.
2. to make some properties (paths) user-dependant.

Ex for 1.
    config_ = new Configuration();
    config_.addFinalResource(getHadoopAliasConfFile());
    jobConf_ = new JobConf(config_);

In fact, it is important for Hadoop to maintain this property:
 ALL uses of a JobConf must be configurable at the outset by the caller by passing in a Configuration object.
 Common examples of such top-level Hadoop entry points: 
     Job submission, MapRed in local mode, DFS client calls.

In general we should make sure that we don't FORCE 
a long lifetime for a 'cached' JobConf object:
There are applications that need to use new JobConf-s along the way:
1. bec. they must first discover properties of the Hadoop cluster (list files, then submit job)
2. bec. they talk to multiple Hadoop systems (import / export files)


> reducing the number of Configuration & JobConf objects created
> --------------------------------------------------------------
>
>          Key: HADOOP-167
>          URL: http://issues.apache.org/jira/browse/HADOOP-167
>      Project: Hadoop
>         Type: Improvement

>   Components: conf
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: remove-confs.patch
>
> Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376520 ] 

Doug Cutting commented on HADOOP-167:
-------------------------------------

Can we stop the extra reads caused by addFinalResource() and 'new JobConf(Configuration)' by re-using the hash table instead of re-reading the files?  addFinalResource could simply read that single file, rather than re-read everything.  And 'new JobConf(Configuration)' could clone the contents of the configuration rather than re-reading it, no?  Or even use nested Properties...

One feature that's currently supported is that Configuration.write() only writes things that differ from the defaults.  This isn't essential, but it's nice.  The way it distinguishes is that defaults are always in a nested properties and non-defaults are always in the top-level properties.

> reducing the number of Configuration & JobConf objects created
> --------------------------------------------------------------
>
>          Key: HADOOP-167
>          URL: http://issues.apache.org/jira/browse/HADOOP-167
>      Project: Hadoop
>         Type: Improvement

>   Components: conf
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: remove-confs.patch
>
> Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-167) reducing the number of Configuration & JobConf objects created

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-167?page=comments#action_12376517 ] 

Owen O'Malley commented on HADOOP-167:
--------------------------------------

Your example is exactly the same as:

jobConf = new JobConf();
jobConf.addFinalResource(getHadoopAliasConfFile());

just without reading the xml files an extra time.

> reducing the number of Configuration & JobConf objects created
> --------------------------------------------------------------
>
>          Key: HADOOP-167
>          URL: http://issues.apache.org/jira/browse/HADOOP-167
>      Project: Hadoop
>         Type: Improvement

>   Components: conf
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: remove-confs.patch
>
> Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-167) reducing the number of Configuration & JobConf objects created

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-167?page=all ]

Owen O'Malley updated HADOOP-167:
---------------------------------

    Attachment: remove-confs.patch

This patch removes a lot of the extra JobConfs from the TaskTracker. In particular, it does not read the config files for each map output as it is transfered to a reduce. It also adds a new constructor for JobConf(Class) and makes the old JobConf() constructor read the mapred-defaults.xml. Unless Nutch makes heavy use of the JobConf(Configuration) and JobConf(Configuration,Class) constructors, I think we should depriciate them. There really isn't any advantage to having a separate Configuration object.

> reducing the number of Configuration & JobConf objects created
> --------------------------------------------------------------
>
>          Key: HADOOP-167
>          URL: http://issues.apache.org/jira/browse/HADOOP-167
>      Project: Hadoop
>         Type: Improvement

>   Components: conf
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: remove-confs.patch
>
> Currently, Configuration and JobConf objects are created many times during executing a job. In particular, the Task Tracker creates a lot of them. They both clutter up the logs and parse the xml config files over and over again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira