You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Gabor Szadovszky (JIRA)" <ji...@apache.org> on 2018/04/21 12:39:05 UTC

[jira] [Updated] (PARQUET-341) Improve write performance with wide schema sparse data

     [ https://issues.apache.org/jira/browse/PARQUET-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabor Szadovszky updated PARQUET-341:
-------------------------------------
    Fix Version/s: 1.8.2

> Improve write performance with wide schema sparse data
> ------------------------------------------------------
>
>                 Key: PARQUET-341
>                 URL: https://issues.apache.org/jira/browse/PARQUET-341
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Tianshuo Deng
>            Assignee: Tianshuo Deng
>            Priority: Major
>             Fix For: 1.9.0, 1.8.2
>
>
> In write path, when there are tons of sparse data, most of time is spent on writing nulls.
> Currently writing nulls has the same code path as writing values, which is reclusive traverse all the leaves when a group is null.
> Due to the fact that when a group is null all the leaves beneath it should be written with null value with the same repetition level and definition level, we can eliminate the recursion call to get the leaves



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)