You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Mostafa Elhemali (JIRA)" <ji...@apache.org> on 2012/12/02 20:15:58 UTC

[jira] [Created] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Mostafa Elhemali created MAPREDUCE-4840:
-------------------------------------------

             Summary: Delete dead code and deprecate public API related to skipping bad records
                 Key: MAPREDUCE-4840
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: trunk
            Reporter: Mostafa Elhemali
            Priority: Minor


It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?

Dead code I'm talking about:
1. Task class: skipping, skipRanges, writeSkipRecs
2. MapTask class:  SkippingRecordReader inner class
3. ReduceTask class: SkippingReduceValuesIterator inner class
4. Tests: TestBadRecords

Public API:
1. SkipBadRecords class


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508352#comment-13508352 ] 

Harsh J commented on MAPREDUCE-4840:
------------------------------------

I agree. Would you be willing to provide a patch that deprecates this set of code?
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Mostafa Elhemali (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508384#comment-13508384 ] 

Mostafa Elhemali commented on MAPREDUCE-4840:
---------------------------------------------

Ah OK looks like I was the one confused. I don't believe it works, though as mentioned above I can't really test because of unrelated Windows problems. There's this code in TaskAttemptListenerImpl though:

{code}
  @Override
  public void reportNextRecordRange(TaskAttemptID taskAttemptID, Range range)
      throws IOException {
    // This is used when the feature of skipping records is enabled.

    // This call exists as a hadoop mapreduce legacy wherein all changes in
    // counters/progress/phase/output-size are reported through statusUpdate()
    // call but not the next record range information.
    throw new IOException("Not yet implemented.");
  }
{code}

So I guess the right thing to do is fix the implementation? Not sure if there's a JIRA tracking that.
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508366#comment-13508366 ] 

Harsh J commented on MAPREDUCE-4840:
------------------------------------

Thanks Mostafa!

I actually got a bit confused earlier.

# The Skipping feature does work with the Old API presently, correct? Have you observed otherwise? MAPREDUCE-1932 was for supporting it in New API.
# The idea is to deprecate the feature, cause it won't be added to the new API and the concept is to be unsupported, in favor of user-end logic. The patch removes it directly without giving a deprecation first.
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Mostafa Elhemali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Elhemali updated MAPREDUCE-4840:
----------------------------------------

    Attachment: MAPREDUCE-4840.patch

Patch attached. Disclaimer: the code compiles fine, but I didn't fully test it since I wrote this on a Windows box and trunk isn't really good with Windows these days.
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-4840:
-------------------------------

    Affects Version/s:     (was: trunk)
                       2.0.0-alpha
    
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Mostafa Elhemali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Elhemali updated MAPREDUCE-4840:
----------------------------------------

    Fix Version/s: trunk
           Status: Patch Available  (was: Open)
    
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Mostafa Elhemali (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508386#comment-13508386 ] 

Mostafa Elhemali commented on MAPREDUCE-4840:
---------------------------------------------

Note that the test for it (TestBadRecords) was disabled in MAPREDUCE-3582
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508357#comment-13508357 ] 

Hadoop QA commented on MAPREDUCE-4840:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12555682/MAPREDUCE-4840.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

      {color:red}-1 javac{color}.  The applied patch generated 2035 javac compiler warnings (more than the trunk's current 2013 warnings).

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3091//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3091//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3091//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3091//console

This message is automatically generated.
                
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-4840:
-------------------------------

    Fix Version/s:     (was: trunk)
    
> Delete dead code and deprecate public API related to skipping bad records
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira