You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2016/05/02 19:37:13 UTC

[jira] [Comment Edited] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

    [ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267043#comment-15267043 ] 

Sergey Shelukhin edited comment on HIVE-9660 at 5/2/16 5:36 PM:
----------------------------------------------------------------

This is exactly what this patch does, except the coordination will move into each of the RL writers instead of the central place.
So I don't really understand the difference in approach.

Note that the run length blocks finish before CBs (ie RL first, then CB containing the RL), so the callbacks are actually reversed.

For uncompressed, the main concern is that for exact boundaries, there will be too many calls.


was (Author: sershe):
This is exactly what this patch does, except the coordination will move into each of the RL writers instead of the central place.
So I don't really understand the difference in approach.

> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
>                 Key: HIVE-9660
>                 URL: https://issues.apache.org/jira/browse/HIVE-9660
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of compressed buffers for each RG, or end offset, or something, to remove this estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)