You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2008/11/27 20:00:44 UTC

[jira] Created: (HIVE-85) separate compression options for different output types

separate compression options for different output types
-------------------------------------------------------

                 Key: HIVE-85
                 URL: https://issues.apache.org/jira/browse/HIVE-85
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Joydeep Sen Sarma


currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
1. intermediate output files for next map-reduce job
2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
3. files written to user local directories (downloading results)

the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-85) separate compression options for different output types

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653714#action_12653714 ] 

Zheng Shao commented on HIVE-85:
--------------------------------

Committed. svn revision 723687.
Thanks Joydeep.


> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, hive-85.patch.5, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Status: Patch Available  (was: Open)

two new options are provided:  hive.exec.compress.output and hive.exec.compress.intermediate. documentation is included in conf/hive-default.xml

patch includes some testing related changes as well (which i found necessary for this stuff):
- update to QTestUtil to only overwrite files on -Doverwrite=true iff the files actually differ
- update to SemanticAnalyzer to display boolean fields in explain plan. this is causing some additional items to show up in explain plan outputs that were not previously.

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.3

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-85) separate compression options for different output types

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652600#action_12652600 ] 

Ashish Thusoo commented on HIVE-85:
-----------------------------------

When I try to apply  the patch I get the following error

patch: **** malformed patch at line 93: \ No newline at end of fil

I think this is because there is some binary data in the patch. Can you upload those separately?

Thanks

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma reassigned HIVE-85:
-------------------------------------

    Assignee: Joydeep Sen Sarma

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.2

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.3

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-85:
------------------------------

    Component/s: Query Processor

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-85:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

HIVE-85. New compression options for Hive. (Joydeep Sarma through zshao)

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, hive-85.patch.5, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-85) separate compression options for different output types

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653086#action_12653086 ] 

Ashish Thusoo commented on HIVE-85:
-----------------------------------

+1


> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment:     (was: hive-85.patch.3)

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.5

once more - with MORE javadocs

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, hive-85.patch.5, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652980#action_12652980 ] 

Joydeep Sen Sarma commented on HIVE-85:
---------------------------------------

attached new version of the last patch that does not have the entry for the binary file. 

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.1

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: lt100.txt.deflate

this should go in data/files/lt100.txt.deflate

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-85:
-------------------------------

    Fix Version/s: 0.3.0

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.3.0
>
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, hive-85.patch.5, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-85) separate compression options for different output types

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-85:
----------------------------------

    Attachment: hive-85.patch.4

yet another one after resolving stuff from Namit's changes to genmapredtasks. can we please review/commit this today?

> separate compression options for different output types
> -------------------------------------------------------
>
>                 Key: HIVE-85
>                 URL: https://issues.apache.org/jira/browse/HIVE-85
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-85.patch.1, hive-85.patch.2, hive-85.patch.3, hive-85.patch.4, lt100.txt.deflate
>
>
> currently hive uses mapred.output.compress to determine compression for all output files. however not all files are final output. at least three different kinds of output files are generated:
> 1. intermediate output files for next map-reduce job
> 2. files targeted for result hdfs directories or hive tables/partitions (which are just hdfs dirs)
> 3. files written to user local directories (downloading results)
> the plan is to provide three separate options for controlling 1,2,3 separately. we may want to split (2) in case compression is determined by table metadata (and not session options).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.