You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2009/09/12 22:16:57 UTC

[jira] Updated: (HIVE-819) Add lazy decompress ability to RCFile

     [ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-819:
------------------------------

    Attachment: hive-819-2009-9-12.patch

A draft version adding a call back to do lazy decompression. Need to do more profile.
One experiment on a 115M compressed input file uservisits,
"SELECT sourceip, desturl, visitdate, useragent, countrycode, duration FROM uservisits_rc where duration >9;" was reduced from 20+s to 14seconds.
However, after changing filter condition from 9 to 8, the execution time is increased by 4s. That's too bad, and need to do more profile to find out.

> Add lazy decompress ability to RCFile
> -------------------------------------
>
>                 Key: HIVE-819
>                 URL: https://issues.apache.org/jira/browse/HIVE-819
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where a>1;' we only need to decompress the block data of b,c columns when one row's column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.