You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Nezih Yigitbasi (JIRA)" <ji...@apache.org> on 2015/08/18 07:32:46 UTC

[jira] [Assigned] (PARQUET-333) [Vectorized Reader] Add attributes in ColumnVector and RowBatch

     [ https://issues.apache.org/jira/browse/PARQUET-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nezih Yigitbasi reassigned PARQUET-333:
---------------------------------------

    Assignee: Nezih Yigitbasi

> [Vectorized Reader] Add attributes in ColumnVector and RowBatch
> ---------------------------------------------------------------
>
>                 Key: PARQUET-333
>                 URL: https://issues.apache.org/jira/browse/PARQUET-333
>             Project: Parquet
>          Issue Type: Sub-task
>          Components: parquet-mr
>            Reporter: Dong Chen
>            Assignee: Nezih Yigitbasi
>
> As discussed in HIVE-8128, we want to add some attributes in vector.
> * In {{ColumnVector}}, add two attributes: one is {{boolean noNulls}}, which indicates whether the whole column vector has no null value. The other is {{boolean isRepeating}}, which indicates whether the same value repeats for whole column vector. They could be calculated at the same time when we read a vector. SQL engines (like Hive) can check these attribute to skip some values. 
> * In {{RowBatch}}, add one attribute {{int size}}, which indicates the number of rows in this batch. This is just for easy usage. Its value should be the same as {{RowBatch.columns\[0\].numValues}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)