You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (JIRA)" <ji...@apache.org> on 2011/08/22 22:02:29 UTC

[jira] [Reopened] (HIVE-2350) Improve RCFile Read Speed

     [ https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach reopened HIVE-2350:
----------------------------------


@Tim: Yes, looks like closing this was a mistake on my part. Your latest patch looks good, but you forgot to click the box that gives license rights to the ASF. Can you please attach the patch again and this time click the box? Thanks.

> Improve RCFile Read Speed
> -------------------------
>
>                 Key: HIVE-2350
>                 URL: https://issues.apache.org/jira/browse/HIVE-2350
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, rcfile_opt_2011-08-05b.diff, rcfile_opt_2011-08-11.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> By tweaking the RCFile$Reader implementation to allow more efficient memory access I was able to reduce CPU usage.  I measured the speed required to scan a gzipped RCFile, decompress and assemble into records.  CPU time was reduced by about 7% for a full table scan,  An improvement of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira