You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/06/13 21:00:01 UTC

[jira] [Created] (HUDI-4245) Support nested fields in Column Stats Index

Alexey Kudinkin created HUDI-4245:
-------------------------------------

             Summary: Support nested fields in Column Stats Index
                 Key: HUDI-4245
                 URL: https://issues.apache.org/jira/browse/HUDI-4245
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin


Currently only root-level fields are supported in the Column Stats Index, while there's no reason for us not to be able to support nested fields given that columnar file formats store nested fields as _nested columns,_ ie as columns with a name of the field and corresponding struct it attributes to. 

 

For example following schema: 
{code:java}
c1: StringType
c2: StructType(Seq(StructField("foo", StringType))){code}
Would be stored in Parquet as "c1: string", "c2.foo: string", entailing that Parquet actually already collects statistics for all the nested fields and we just need to make sure we're propagating them into Column Stats Index



--
This message was sent by Atlassian Jira
(v8.20.7#820007)