You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2008/09/14 09:43:44 UTC

[jira] Created: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

'compressed' keyword in DDL syntax misleading and does not compress
-------------------------------------------------------------------

                 Key: HADOOP-4169
                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/hive
            Reporter: Joydeep Sen Sarma


Hive two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.

Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HADOOP-4169:
--------------------------------------

    Status: Open  (was: Patch Available)

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HADOOP-4169:
--------------------------------------

    Description: 
Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.

Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

  was:
Hive two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.

Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.


> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632425#action_12632425 ] 

Ashish Thusoo commented on HADOOP-4169:
---------------------------------------

sorry my mistake. The patch is ok.

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HADOOP-4169:
--------------------------------------

    Attachment: 4169-1.txt

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632431#action_12632431 ] 

Ashish Thusoo commented on HADOOP-4169:
---------------------------------------

+1

looks fine to me.

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632781#action_12632781 ] 

Zheng Shao commented on HADOOP-4169:
------------------------------------

This patch is included in http://issues.apache.org/jira/browse/HADOOP-4205


> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632424#action_12632424 ] 

Ashish Thusoo commented on HADOOP-4169:
---------------------------------------

Can you generate the patch from hadoop root. This one is generated from hive root.

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao resolved HADOOP-4169.
--------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

The patch is included and committed with http://issues.apache.org/jira/browse/HADOOP-4205

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HADOOP-4169:
--------------------------------------

    Fix Version/s: 0.19.0
           Status: Patch Available  (was: Open)

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.19.0
>
>         Attachments: 4169-1.txt
>
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4169) 'compressed' keyword in DDL syntax misleading and does not compress

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma reassigned HADOOP-4169:
-----------------------------------------

    Assignee: Joydeep Sen Sarma

> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Hive two types of data files - flat files and sequencefiles. Syntax should reflect this. Currently the 'compressed' keyword is used to choose sequencefile format - but does not actually compress the files. this is misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression options should be applied from standard hadoop way of specifying whether output should be compressed (''mapred.output.compress') - ie. session options. (session options will also define codec etc.). default file format and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.