You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/03/15 22:50:00 UTC

[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

     [ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin updated HUDI-3639:
----------------------------------
    Fix Version/s: 0.12.0

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
> -------------------------------------------------------------------------------------
>
>                 Key: HUDI-3639
>                 URL: https://issues.apache.org/jira/browse/HUDI-3639
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.12.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on `ParquetFileReader` to do record-level filtering of the records that don't belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)