You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/06/12 22:26:01 UTC

[jira] [Comment Edited] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

    [ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584008#comment-14584008 ] 

Rohini Palaniswamy edited comment on PIG-3251 at 6/12/15 8:25 PM:
------------------------------------------------------------------

Hadoop 2.x's splittable bzip implementation seems to be more stable with bug fixes for bzip2. I think we can make it default for Hadoop2 (switching back if not splittable like native bz2 implementation and a config setting to switch back in case of issues) and leave it at Pig's Bzip2TextInputFormat  for Hadoop 1.x. 


was (Author: rohini):
I think we can make it default for Hadoop2 (with setting to switch back in case of issues) and leave it at Pig's Bzip2TextInputFormat  for Hadoop 1.x. Hadoop 2.x's seems to be more stable with bug fixes for bzip2.

> Bzip2TextInputFormat requires double the memory of maximum record size
> ----------------------------------------------------------------------
>
>                 Key: PIG-3251
>                 URL: https://issues.apache.org/jira/browse/PIG-3251
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch, pig-3251-trunk-v03.patch, pig-3251-trunk-v04.patch, pig-3251-trunk-v05.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and Text was 160MBytes.  
> We can probably eliminate one of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)