You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/07/16 21:02:50 UTC

[jira] Created: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate

intermediate data produced for select queries ignores hive.exec.compress.intermediate
-------------------------------------------------------------------------------------

                 Key: HIVE-1468
                 URL: https://issues.apache.org/jira/browse/HIVE-1468
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


> set hive.exec.compress.intermediate=false;
> explain extended select xxx from yyy;
    ...

            File Output Operator
              compressed: true
              GlobalTableId: 0

looks like we only intermediate locations identified during splitting mr tasks follow this directive. this should be fixed because this forces clients to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1468:
------------------------------------

    Issue Type: Bug  (was: Improvement)

> intermediate data produced for select queries ignores hive.exec.compress.intermediate
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-1468
>                 URL: https://issues.apache.org/jira/browse/HIVE-1468
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>
> > set hive.exec.compress.intermediate=false;
> > explain extended select xxx from yyy;
>     ...
>             File Output Operator
>               compressed: true
>               GlobalTableId: 0
> looks like we only intermediate locations identified during splitting mr tasks follow this directive. this should be fixed because this forces clients to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889579#action_12889579 ] 

Joydeep Sen Sarma commented on HIVE-1468:
-----------------------------------------

yes - it does make sense to differentiate result data from intermediate. if anything - there's probably a good argument to be made that we don't need a separate option for intermediate compression. it should default to whatever policy is being applied for map-reduce intermediate traffic. (that would be a better default than either true or false - that way admins have one less option to get right).

interestingly - result data also needs minimal replication. the client is single threaded and cannot exploit multiple replicas for bandwidth purposes. also - the data is temporary in nature and doesn't need reliability.



> intermediate data produced for select queries ignores hive.exec.compress.intermediate
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-1468
>                 URL: https://issues.apache.org/jira/browse/HIVE-1468
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>
> > set hive.exec.compress.intermediate=false;
> > explain extended select xxx from yyy;
>     ...
>             File Output Operator
>               compressed: true
>               GlobalTableId: 0
> looks like we only intermediate locations identified during splitting mr tasks follow this directive. this should be fixed because this forces clients to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889528#action_12889528 ] 

Zheng Shao commented on HIVE-1468:
----------------------------------

"select queries" means "SELECT" without "INSERT", correct?

I agree that we should treat these queries differently, specifically, no compression (or maybe use lzo to same bandwidth - clients can be in other data centers) will be a big win.


> intermediate data produced for select queries ignores hive.exec.compress.intermediate
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-1468
>                 URL: https://issues.apache.org/jira/browse/HIVE-1468
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>
> > set hive.exec.compress.intermediate=false;
> > explain extended select xxx from yyy;
>     ...
>             File Output Operator
>               compressed: true
>               GlobalTableId: 0
> looks like we only intermediate locations identified during splitting mr tasks follow this directive. this should be fixed because this forces clients to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.