You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2009/11/22 09:34:39 UTC

[jira] Created: (HIVE-947) Add run length encoding into RCFile's block header

Add run length encoding into RCFile's block header 
---------------------------------------------------

                 Key: HIVE-947
                 URL: https://issues.apache.org/jira/browse/HIVE-947
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: He Yongqiang
            Assignee: He Yongqiang
            Priority: Minor


When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-947:
------------------------------

    Attachment: hive-947-2009-11-24.patch

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch, hive-947-2009-11-24.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-947:
------------------------------

    Attachment: hive-947-2009-11-22.patch

hive-947-2009-11-22.patch implemented the run length encoding. This is only a minor improvement, so no test added.

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-947.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781748#action_12781748 ] 

Namit Jain commented on HIVE-947:
---------------------------------

got a diff in running test partition_wise_fileformat.q



> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-947:
------------------------------

    Attachment:     (was: hive-947-2009-11-24.patch)

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-947:
--------------------------------

    Fix Version/s: 0.5.0
      Component/s: Serializers/Deserializers

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>             Fix For: 0.5.0
>
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781669#action_12781669 ] 

Ning Zhang commented on HIVE-947:
---------------------------------

Yongqiang, can you profile a simple query that we talked the other day and see how much CPU this can save? We should test on columns that are mostly the same length (e.g., type int) and variable lengths (string). 

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-947) Add run length encoding into RCFile's block header

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781310#action_12781310 ] 

Namit Jain commented on HIVE-947:
---------------------------------

Since I am not very familiar with this, let us go over it together sometime 

> Add run length encoding into RCFile's block header 
> ---------------------------------------------------
>
>                 Key: HIVE-947
>                 URL: https://issues.apache.org/jira/browse/HIVE-947
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>            Priority: Minor
>         Attachments: hive-947-2009-11-22.patch
>
>
> When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. 
> This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.