You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2008/09/11 22:59:44 UTC

[jira] Created: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
-----------------------------------------------------------------------------

                 Key: HADOOP-4162
                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Hong Tang


CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.

This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630404#action_12630404 ] 

Chris Douglas commented on HADOOP-4162:
---------------------------------------

bq. Shouldn't this patch get applied to 0.18 if one wanted to use 0.18 to do experiments with compression of transient data at large scale?

Intermediate data is compressed using LzoCodec, not LzopCodec. LzopCodec provides compatibility for users reading streams compressed by the [lzop|http://www.lzop.org] tool.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4162:
----------------------------------

    Attachment: HADOOP-4162_0_20080911.patch

Hong, can you please try this patch and let me know? Thanks!

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630403#action_12630403 ] 

Chris Douglas commented on HADOOP-4162:
---------------------------------------

This is probably more correct, but what is the use case? LzopDecompressor tracks and verifies block checksums for lzop streams, but without being initialized by enums to which only LzopCodec has access, it cannot be distinguished from LzoDecopressor. Using LzopCodec as anything but a stream doesn't make sense.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Amir Youssefi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630451#action_12630451 ] 

Amir Youssefi commented on HADOOP-4162:
---------------------------------------

+1 for the patch. 

Arun, thanks for the patch and rapid turn-around.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-4162:
-------------------------------------

    Assignee: Arun C Murthy

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4162:
----------------------------------

    Status: Patch Available  (was: Open)

Thanks for helping to test this Amir, marking this PA.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Rong-En Fan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648525#action_12648525 ] 

Rong-En Fan commented on HADOOP-4162:
-------------------------------------

Is there any progress on this? 

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631142#action_12631142 ] 

Hadoop QA commented on HADOOP-4162:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389973/HADOOP-4162_0_20080911.patch
  against trunk revision 695569.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3261/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3261/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3261/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3261/console

This message is automatically generated.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630413#action_12630413 ] 

Chris Douglas commented on HADOOP-4162:
---------------------------------------

bq. Using LzopCodec as anything but a stream doesn't make sense.

I should probably be clearer. Reusing the decompressor between streams makes sense, but using LzopDecompressor like LzoDecompressor or ZlibDecompressor to effect block compression for a structured file format is not going to work, or at least is unlikely to match the intent. I'm assuming this is related to HADOOP-3315, which- like SequenceFile- shouldn't use LzopCodec.

The patch is good, but I'm concerned about possible (mis)uses.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4162:
----------------------------------

    Status: Open  (was: Patch Available)

Chris points out that we need to reset state in LzopDecompressor in the 'reset call...

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630441#action_12630441 ] 

Hong Tang commented on HADOOP-4162:
-----------------------------------

Thanks for the information. The problem was found in the development of TFIle (Hadoop-3315). The way we use Hadoop Compression in TFile is to take each compression block as a separate compression stream (each block writes conclude with compressor.finish()). It makes no assumption of any internals of compression algorithm. The tests show both LZOP and LZO work fine.

Also, based on the information you provided, it seems that existence of LzopDecompressor is to read lzop compressed data. So I changed to use LZO instead of LZOP internally for TFile now.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4162:
--------------------------------

    Fix Version/s:     (was: 0.19.0)

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630387#action_12630387 ] 

Hong Tang commented on HADOOP-4162:
-----------------------------------

Green from me.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Amir Youssefi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630385#action_12630385 ] 

Amir Youssefi commented on HADOOP-4162:
---------------------------------------

Before patch:
2008-09-11 21:22:33,169 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:33,500 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:33,696 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:33,904 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:34,089 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:34,277 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:34,465 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:34,700 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:34,887 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor
2008-09-11 21:22:35,074 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor


After patch:

2008-09-11 21:20:21,454 INFO  compress.CodecPool (CodecPool.java:getDecompressor(121)) - Got brand-new decompressor 



> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630469#action_12630469 ] 

Chris Douglas commented on HADOOP-4162:
---------------------------------------

bq. The way we use Hadoop Compression in TFile is to take each compression block as a separate compression stream (each block writes conclude with compressor.finish()). It makes no assumption of any internals of compression algorithm. The tests show both LZOP and LZO work fine.
LZOP works because the streams are generated by LzopCodec, which disables all the block checksums (assuming its target will be HDFS, which keeps its own checksums). In that case, the LzopDecompresor is a passthrough to LzoDecompressor. If someone were to pick up a LzopDecompressor and use it on a stream with block checksums, it would fail if that decompressor were reused to open a TFile. Until LzopDecompressors can be reused without errors (i.e. initHeaderFlags clears the checksum flags before setting them for the next stream), I'm \-1 on making them reusable through CodecPool.

bq. it seems that existence of LzopDecompressor is to read lzop compressed data. So I changed to use LZO instead of LZOP internally for TFile now.
That sounds exactly right. Unless one wants to support a the C tool, LzoCodec should always be preferred.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630400#action_12630400 ] 

Christian Kunz commented on HADOOP-4162:
----------------------------------------

Shouldn't this patch get applied to 0.18 if one wanted to use 0.18 to do experiments with compression of transient data at large scale?

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630413#action_12630413 ] 

chris.douglas edited comment on HADOOP-4162 at 9/11/08 4:13 PM:
----------------------------------------------------------------

bq. Using LzopCodec as anything but a stream doesn't make sense.

I should probably be clearer. Reusing the decompressor between streams makes sense, but using LzopDecompressor like LzoDecompressor or ZlibDecompressor to effect block compression for a structured file format is not going to work, or at least is unlikely to match the intent. I'm assuming this is related to HADOOP-3315, which- like SequenceFile- shouldn't use LzopCodec.

As written, an LzopDecompressor instance can't be reused between streams. The checksums aren't reset. LzopDecopressor should clear its checksum maps in initHeaderFlags before adding new ones.

      was (Author: chris.douglas):
    bq. Using LzopCodec as anything but a stream doesn't make sense.

I should probably be clearer. Reusing the decompressor between streams makes sense, but using LzopDecompressor like LzoDecompressor or ZlibDecompressor to effect block compression for a structured file format is not going to work, or at least is unlikely to match the intent. I'm assuming this is related to HADOOP-3315, which- like SequenceFile- shouldn't use LzopCodec.

The patch is good, but I'm concerned about possible (mis)uses.
  
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4162:
----------------------------------

    Affects Version/s: 0.18.0
        Fix Version/s: 0.19.0

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4162_0_20080911.patch
>
>
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.