You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/02/18 23:15:00 UTC

[jira] [Created] (HUDI-3456) Revisit Properties/Config Defaults handling

Alexey Kudinkin created HUDI-3456:
-------------------------------------

             Summary: Revisit Properties/Config Defaults handling
                 Key: HUDI-3456
                 URL: https://issues.apache.org/jira/browse/HUDI-3456
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin


Right now, whenever we compose a configuration we essentially follow the formula below:

We take user-input, add {+}defaults for missing properties{+}, and seal it as complete set of configs.

 

The problem with this approach is that consumer of the configuration has no way to tell whether the config has been User-provided or set from defaults. Such shading creates quite some issues in places where consumer wants to know whether User provided any input for particular property or not (right now it's simply impossible).

 

Take PRECOMBINE_FIELD_NAME as an example: by default it falls back to "ts". But PRECOMBINE_FIELD_NAME is not a _required_ configuration (since User might opt in for custom payload merging) and such shading makes it impossible for ex for Spark Relation to know whether this column was specified by User, and it has to be present in the schema OR whether it's a default value (we assumed) and there's no guarantee that it would be present.

 

This leads to some places actually over-correcting this behavior and injecting empty strings "" as the means to suppress fallback to default (since null, would be assumed as the condition to fallback)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)