You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2012/09/20 19:32:07 UTC

[jira] [Created] (ACCUMULO-769) Mapreduce API should not use Configuration to set Job state at submission time (ambiguous semantics)

Christopher Tubbs created ACCUMULO-769:
------------------------------------------

Summary: Mapreduce API should not use Configuration to set Job state at submission time (ambiguous semantics)
Key: ACCUMULO-769
URL: https://issues.apache.org/jira/browse/ACCUMULO-769
Project: Accumulo
Issue Type: Bug
Components: client
Affects Versions: 1.4.0, 1.4.1
Reporter: Christopher Tubbs
Assignee: Billie Rinaldi
Priority: Minor
Fix For: 1.5.0

ACCUMULO-267 made this change, but I think it was the wrong way to go about it.

>From the comments on ACCUMULO-267:

his is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.

While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).

It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-769) MapReduce API should not use Configuration to set Job state at submission time (ambiguous semantics)

Posted by "Pradeep Gollakota (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Gollakota updated ACCUMULO-769:
---------------------------------------

    Summary: MapReduce API should not use Configuration to set Job state at submission time (ambiguous semantics)  (was: Mapreduce API should not use Configuration to set Job state at submission time (ambiguous semantics))
    
> MapReduce API should not use Configuration to set Job state at submission time (ambiguous semantics)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-769
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-769
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: Christopher Tubbs
>            Assignee: Billie Rinaldi
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> ACCUMULO-267 made this change, but I think it was the wrong way to go about it.
> From the comments on ACCUMULO-267:
> This is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.
> While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).
> It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-769) Mapreduce API should not use Configuration to set Job state at submission time (ambiguous semantics)

Posted by "Pradeep Gollakota (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Gollakota updated ACCUMULO-769:
---------------------------------------

    Description: 
ACCUMULO-267 made this change, but I think it was the wrong way to go about it.

>From the comments on ACCUMULO-267:

This is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.

While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).

It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

  was:
ACCUMULO-267 made this change, but I think it was the wrong way to go about it.

>From the comments on ACCUMULO-267:

his is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.

While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).

It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

    
> Mapreduce API should not use Configuration to set Job state at submission time (ambiguous semantics)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-769
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-769
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: Christopher Tubbs
>            Assignee: Billie Rinaldi
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> ACCUMULO-267 made this change, but I think it was the wrong way to go about it.
> From the comments on ACCUMULO-267:
> This is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.
> While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).
> It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-769) MapReduce API should not use Configuration to set Job state at submission time (ambiguous semantics)

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ACCUMULO-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459813#comment-13459813 ] 

Keith Turner commented on ACCUMULO-769:
---------------------------------------

I am not a big fan of the churn in the API this change would make.  However I do agree with the goal of this ticket.  This leaves me a bit conflicted on this issue.  Passing in a Job removes the possibility that the user will do something wrong.  With the new API in 1.4 the user can write the following code that is incorrect.  

{code:java}
  @Override
  public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    Job job = new Job(conf, "Test MR 1");
      .
      .
      .
    job.setOutputFormatClass(AccumuloOutputFormat.class);
    AccumuloOutputFormat.setOutputInfo(conf, "root", "secret".getBytes(), false, "nodes");  //WRONG, modifying conf will not setup the job
  }
{code}

Instead the user needs to write the following with the new API

{code:java}
  @Override
  public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    Job job = new Job(conf, "Test MR 1");
      .
      .
      .
    job.setOutputFormatClass(AccumuloOutputFormat.class);
    AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), "root", "secret".getBytes(), false, "nodes");  //CORRECT, modifying the job conf will setup the job
  }
{code}

If the user just passed in the job object they would not have to think about it and understand that job creates a copy of the config they passed.    


                
> MapReduce API should not use Configuration to set Job state at submission time (ambiguous semantics)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-769
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-769
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: Christopher Tubbs
>            Assignee: Billie Rinaldi
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> ACCUMULO-267 made this change, but I think it was the wrong way to go about it.
> From the comments on ACCUMULO-267:
> This is the wrong way to go about doing this fix. The reason why it took a JobContext is so that it could accept a "Job" object. This was modeled after the pattern Hadoop was using for FileOutputFormat, which is somewhat the standard for conventions in configuring MR jobs.
> While JobContext does specifically state that's what it's purpose is, it is a base class, and Job extends JobContext, and includes a comment that describes it as holding the state of the job at submission time. This API should really be taking a "Job" object, rather than a "JobContext" object. Further, because Job is the only JobContext that actually works as intended here, the change from JobContext to Job does not require any deprecation, because Job will still work, and any other JobContext that isn't a Job will still fail. (We would have to deprecate the ones that were added in 1.4 that took a Configuration object, though... because those were never "correct", if we are going off of the conventions set by Hadoop's provided OutputFormats).
> It is somewhat annoying to deprecate something in 1.5 that was added in 1.4... especially since it allows people to go back to what they were doing before. But, I think it might be worth it to be consistent with the established conventions, and to clarify the semantics of the methods (we are, after all, modifying the state of a job we are about to submit, and not just an arbitrary configuration, which is used for all sorts of things).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira