You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ruofan (Jira)" <ji...@apache.org> on 2023/01/14 03:36:00 UTC

[jira] [Created] (HUDI-5557) Wrong candidate files found in metadata table

ruofan created HUDI-5557:
----------------------------

             Summary: Wrong candidate files found in metadata table 
                 Key: HUDI-5557
                 URL: https://issues.apache.org/jira/browse/HUDI-5557
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata, spark-sql
    Affects Versions: 0.12.1
            Reporter: ruofan


Suppose the hudi table has five fields, but only two fields are indexed. When part of the filter condition in SQL comes from index fields and the other part comes from non-index fields, the candidate files queried from the metadata table are wrong.

For example following hudi table schema
{code:java}
name: varchar(128)
age: int
addr: varchar(128)
city: varchar(32)
job: varchar(32) {code}
table properties
{code:java}
hoodie.table.type=MERGE_ON_READ
hoodie.metadata.enable=true
hoodie.metadata.index.column.stats.enable=true
hoodie.metadata.index.column.stats.column.list='name,city'
hoodie.enable.data.skipping=true {code}
sql
{code:java}
select * from hudi_table where name='tom' and age=18;  {code}
if we set hoodie.enable.data.skipping=false, the data can be found. But if we set hoodie.enable.data.skipping=true, we can't find the expected data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)