You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Brandon Scheller (Jira)" <ji...@apache.org> on 2019/11/08 01:23:00 UTC

[jira] [Created] (HUDI-327) Introduce "null" supporting ComplexKeyGenerator

Brandon Scheller created HUDI-327:
-------------------------------------

             Summary: Introduce "null" supporting ComplexKeyGenerator
                 Key: HUDI-327
                 URL: https://issues.apache.org/jira/browse/HUDI-327
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
            Reporter: Brandon Scheller


Customers have been running into issues where they would like to use a record_key from columns that can contain null values. Currently, this will cause Hudi to crash and throw a cryptic exception.(improving error messaging is a separate but related issue)

We would like to propose a new KeyGenerator based on ComplexKeyGenerator that allows for null record_keys.

At a basic level, using the key generator without any options would essentially allow a null record_key to be accepted. (It can be replaced with an empty string, null, or some predefined "null" string representation)

This comes with the negative side effect that all records with a null record_key would then be associated together. To work around this, you would be able to specify a secondary record_key to be used in the case that the first one is null. You would specify this in the same way that you do for the ComplexKeyGenerator as a comma separated list of record_keys. In this case, when the first key is seen as null then the second key will be used instead. We could support any arbitrary limit of record_keys here.

While we are aware there are many alternatives to avoid using a null record_key. We believe this will act as a usability improvement so that new users are not forced to clean/update their data in order to use Hudi.

We are hoping to get some feedback on the idea

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)