You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2023/02/08 01:03:00 UTC

[jira] [Updated] (HUDI-5557) Wrong candidate files found in metadata table

     [ https://issues.apache.org/jira/browse/HUDI-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin updated HUDI-5557:
----------------------------------
    Fix Version/s: 0.13.1

> Wrong candidate files found in metadata table 
> ----------------------------------------------
>
>                 Key: HUDI-5557
>                 URL: https://issues.apache.org/jira/browse/HUDI-5557
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata, spark-sql
>    Affects Versions: 0.12.1
>            Reporter: ruofan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.13.1
>
>
> Suppose the hudi table has five fields, but only two fields are indexed. When part of the filter condition in SQL comes from index fields and the other part comes from non-index fields, the candidate files queried from the metadata table are wrong.
> For example following hudi table schema
> {code:java}
> name: varchar(128)
> age: int
> addr: varchar(128)
> city: varchar(32)
> job: varchar(32) {code}
> table properties
> {code:java}
> hoodie.table.type=MERGE_ON_READ
> hoodie.metadata.enable=true
> hoodie.metadata.index.column.stats.enable=true
> hoodie.metadata.index.column.stats.column.list='name,city'
> hoodie.enable.data.skipping=true {code}
> sql
> {code:java}
> select * from hudi_table where name='tom' and age=18;  {code}
> if we set hoodie.enable.data.skipping=false, the data can be found. But if we set hoodie.enable.data.skipping=true, we can't find the expected data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)