You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2008/04/21 08:03:22 UTC

[jira] Created: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Being able to set default job configuration values on the jobtracker
--------------------------------------------------------------------

Key: HADOOP-3287
URL: https://issues.apache.org/jira/browse/HADOOP-3287
Project: Hadoop Core
Issue Type: Bug
Components: conf, mapred
Environment: all
Reporter: Alejandro Abdelnur
Priority: Critical

The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.

There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:

* enabling/disabling compression
* type of compression, record/block
* number of task retries
* block replication factor
* job priority
* tasks JVM options

The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.

Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.

There are situations where this is not practical:

* In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
* In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.

The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.

According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'

If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.

This is a generalization of the problem mentioned in Hadoop-3171.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598167#action_12598167 ] 

Alejandro Abdelnur commented on HADOOP-3287:
--------------------------------------------

Having an heterogeneous network would not be affected by the proposed solution.

The TT configuration is not affected by this. The proposed solution is only about what configuration properties are sent by a client when submitting a job, instead sending all, just to send what has been explicitly set and the rest to be resolved using the default values set in the JT. 



> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.
> There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.
> Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597988#action_12597988 ] 

Allen Wittenauer commented on HADOOP-3287:
------------------------------------------

But what if I'm in a heterogeneous network such that some machines have eight cores and others have two cores?  The TaskTracker config will play a part there, correct?

> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.
> There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.
> Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur resolved HADOOP-3287.
----------------------------------------

    Resolution: Duplicate

Hadoop-3730 enables the functionality, in a much simpler way, this issue was about

> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.
> There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.
> Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591059#action_12591059 ] 

Owen O'Malley commented on HADOOP-3287:
---------------------------------------

-1 As I wrote on HADOOP-3171, these semantics lead to very hard to debug cases. In particular, what was happening was:

client:
  conf 1

job tracker:
  conf 2

task tracker:
  conf 3..n

and depending on which part of the framework looked at the particular value, they would take the value from conf 1..n. It was *very* difficult to debug and lead to wasted days of developer time.

> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.
> There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.
> Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3287) Being able to set default job configuration values on the jobtracker

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591198#action_12591198 ] 

Alejandro Abdelnur commented on HADOOP-3287:
--------------------------------------------

Why are you bringing the TT 3..N configurations into the equation here? They shouldn't play a role in the job settings. Only the JT should.





> Being able to set default job configuration values on the jobtracker
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3287
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf, mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>
> The jobtracker hadoop-site.xml carries custom configuration for the cluster and the 'final' flag allows to fix a value ignoring any override by a client when submitting a job.
> There are several properties for which a cluster may want to set some default values (different from the ones in the hadoop-default.xml), for example:
>  * enabling/disabling compression
>  * type of compression, record/block
>  * number of task retries
>  * block replication factor
>  * job priority
>  * tasks JVM options
> The cluster default values should apply to submitted jobs when the job submitter does not care about those values. When the job submitter cares, it should include its preferred values. Using the final flag on the jobtracker hadoop-site.xml will lock the value ignoring the value set in the client jobconf.
> Currently the only way of doing this is to distribute the jobtracker hadoop-site.xml to all clients and make sure they use it when creating the job configuration.
> There are situations where this is not practical:
>  * In a shared cluster with several clients submitting jobs. It requires redistributing the hadoop-site.xml to all clients.
>  * In a cluster where the jobs are dispatched by a webapp application. It requires rebundling and redeploying the webapp.
> The current behavior happens because the jobconf when serialized, to be sent to the jobtracker, sends all the values found in the hadoop-default.xml bundled with the hadoop JAR file. On the jobtracker side, all those values override all but the 'final' properties of the jobtracker hadoop-site.xml.
> According to the javadocs of the Configuration.write(OutpuStream) this should not happen ' Writes non-default properties in this configuration.'
> If taken the javadocs as the proper behavior this is a bug in the current implementation and it could be easily fixed by avoiding writing default values on write.
> This is a generalization of the problem mentioned in Hadoop-3171.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.