You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/09/06 03:46:00 UTC

[jira] [Commented] (ORC-1264) [C++] Add a writer option to align compression block with row group boundary

    [ https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600566#comment-17600566 ] 

Dongjoon Hyun commented on ORC-1264:
------------------------------------

Thank you for dev mailing discussion and filing this JIRA, [~wgtmac] .

> [C++] Add a writer option to align compression block with row group boundary
> ----------------------------------------------------------------------------
>
>                 Key: ORC-1264
>                 URL: https://issues.apache.org/jira/browse/ORC-1264
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Gang Wu
>            Assignee: Gang Wu
>            Priority: Major
>
> To reduce unnecessary I/O and decompression when PPD is in effect, we can enforce the compression block to be aligned with the row group boundary. It can help avoid unnecessary I/O and decompression of the filtered row groups before the survived row group within the same compression block. This implementation does not break the format specs and should be transparent to any downstream implementation. The caveat may be worse file size which depends on the data distribution and applied compression algorithm. Therefore we should make it optional and enable it per the user's choice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)