You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/01/12 20:20:46 UTC

[jira] Created: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

IFile reader closes stream and compressor in wrong order
--------------------------------------------------------

                 Key: MAPREDUCE-2258
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: task
    Affects Versions: 0.20.4, 0.22.0
            Reporter: Todd Lipcon
             Fix For: 0.22.0


In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-2258:
-----------------------------------

    Attachment: mapreduce-2258.txt

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036258#comment-13036258 ] 

Hudson commented on MAPREDUCE-2258:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #684 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/684/])
    MAPREDUCE-2258. IFile reader closes stream and compressor in wrong order. Contributed by Todd Lipcon.

tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1124383
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/IFile.java


> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-2258:
-----------------------------------

    Status: Patch Available  (was: Open)

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-2258:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Todd!

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980871#action_12980871 ] 

Todd Lipcon commented on MAPREDUCE-2258:
----------------------------------------

The following is a unit test I wrote on the hadoop-lzo side that does the same behavior as IFile.Reader.close():

https://github.com/toddlipcon/hadoop-lzo/commit/a5af3b93f52f55828dfc05e7503d38383eec9dc5

It fails reliably since some threads only manage to read part of the data in the file.

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon reassigned MAPREDUCE-2258:
--------------------------------------

    Assignee: Todd Lipcon

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003637#comment-13003637 ] 

Tom White commented on MAPREDUCE-2258:
--------------------------------------

+1

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980885#action_12980885 ] 

Todd Lipcon commented on MAPREDUCE-2258:
----------------------------------------

Hong: I agree LzoCodec is preferable to LzopCodec for use in intermediate compression, but I think the bug you referenced is no longer the case. LzopCodecs can now be pooled properly with Chris's patch you referenced plus changes on the lzo side.

Just to clarify, you agree this code in IFile is wrong and should be fixed, right?

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999797#comment-12999797 ] 

Hadoop QA commented on MAPREDUCE-2258:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12468714/mapreduce-2258.txt
  against trunk revision 1074251.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/37//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/37//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/37//console

This message is automatically generated.

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980882#action_12980882 ] 

Hong Tang commented on MAPREDUCE-2258:
--------------------------------------

Such pattern would in general affect all CompressionCodec's and is similar to a bug I filed earlier: HADOOP-4195.

On the other hand, as explained by Chris D in HADOOP-4162, LzopCodec cannot be safely reused in Hadoop, and thus the problem you described actually should not happen. In fact, repeatedly getting LzopCodec from CodecPool is likely get you into OOM.

> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035600#comment-13035600 ] 

Hudson commented on MAPREDUCE-2258:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #682 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/682/])
    MAPREDUCE-2258. IFile reader closes stream and compressor in wrong order. Contributed by Todd Lipcon.

tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1124383
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/IFile.java


> IFile reader closes stream and compressor in wrong order
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2258
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.4, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>         Attachments: mapreduce-2258.txt
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira