You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Zoltan Ivanfi (JIRA)" <ji...@apache.org> on 2018/01/19 12:33:00 UTC

[jira] [Commented] (PARQUET-922) Add index pages to the format to support efficient page skipping

    [ https://issues.apache.org/jira/browse/PARQUET-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332194#comment-16332194 ] 

Zoltan Ivanfi commented on PARQUET-922:
---------------------------------------

I was looking for a JIRA for the actual implementation in parquet-mr, but couldn't find it. Does such a JIRA already exist?

> Add index pages to the format to support efficient page skipping
> ----------------------------------------------------------------
>
>                 Key: PARQUET-922
>                 URL: https://issues.apache.org/jira/browse/PARQUET-922
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Julien Le Dem
>            Assignee: Marcel Kornacker
>            Priority: Major
>             Fix For: format-2.4.0
>
>
> When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
> The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
> We'd require the pages to be aligned accross columns.
> [~marcelk] will add a link to the google doc to discuss the spec



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)