You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2021/08/16 18:34:00 UTC

[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

     [ https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chao Sun updated SPARK-36528:
-----------------------------
    Description: Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. However, it may be more efficient to directly operate on encoded data (e.g., when the data is using RLE encoding). This can also potentially work with encodings in Parquet v2 format, such as DELTA_BYTE_ARRAY.  (was: Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. However, it may be more efficient to directly operate on encoded data (e.g., when the data is using RLE encoding).)

> Implement lazy decoding for the vectorized Parquet reader
> ---------------------------------------------------------
>
>                 Key: SPARK-36528
>                 URL: https://issues.apache.org/jira/browse/SPARK-36528
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Chao Sun
>            Priority: Major
>
> Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. However, it may be more efficient to directly operate on encoded data (e.g., when the data is using RLE encoding). This can also potentially work with encodings in Parquet v2 format, such as DELTA_BYTE_ARRAY.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org