You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Nezih Yigitbasi (JIRA)" <ji...@apache.org> on 2015/02/16 19:47:12 UTC

[jira] [Commented] (PARQUET-131) Supporting Vectorized APIs in Parquet

    [ https://issues.apache.org/jira/browse/PARQUET-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323131#comment-14323131 ] 

Nezih Yigitbasi commented on PARQUET-131:
-----------------------------------------

Hi all,
Some time ago I have sent a message to the parquet dev mailing list about our efforts regarding vector support. I also want to share it here in case some of you have missed it. Even though it's still early work-in-progress any feedback is welcome: https://github.com/zhenxiao/incubator-parquet-mr/pull/1

Thanks

> Supporting Vectorized APIs in Parquet
> -------------------------------------
>
>                 Key: PARQUET-131
>                 URL: https://issues.apache.org/jira/browse/PARQUET-131
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>         Attachments: Parquet-Vectorized-APIs.pdf, ParquetInPresto.pdf
>
>
> Vectorized Query Execution could have big performance improvement for SQL engines like Hive, Drill, and Presto. Instead of processing one row at a time, Vectorized Query Execution could streamline operations by processing a batch of rows at a time. Within one batch, each column is represented as a vector of a primitive data type. SQL engines could apply predicates very efficiently on these vectors, avoiding a single row going through all the operators before the next row can be processed.
> As an efficient columnar data representation, it would be nice if Parquet could support Vectorized APIs, so that all SQL engines could read vectors from Parquet files, and do vectorized execution for Parquet File Format.
>  
> Detail proposal:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)