You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Brandon Scheller (Jira)" <ji...@apache.org> on 2019/11/08 01:23:00 UTC
[jira] [Created] (HUDI-327) Introduce "null" supporting
ComplexKeyGenerator
Brandon Scheller created HUDI-327:
-------------------------------------
Summary: Introduce "null" supporting ComplexKeyGenerator
Key: HUDI-327
URL: https://issues.apache.org/jira/browse/HUDI-327
Project: Apache Hudi (incubating)
Issue Type: Improvement
Reporter: Brandon Scheller
Customers have been running into issues where they would like to use a record_key from columns that can contain null values. Currently, this will cause Hudi to crash and throw a cryptic exception.(improving error messaging is a separate but related issue)
We would like to propose a new KeyGenerator based on ComplexKeyGenerator that allows for null record_keys.
At a basic level, using the key generator without any options would essentially allow a null record_key to be accepted. (It can be replaced with an empty string, null, or some predefined "null" string representation)
This comes with the negative side effect that all records with a null record_key would then be associated together. To work around this, you would be able to specify a secondary record_key to be used in the case that the first one is null. You would specify this in the same way that you do for the ComplexKeyGenerator as a comma separated list of record_keys. In this case, when the first key is seen as null then the second key will be used instead. We could support any arbitrary limit of record_keys here.
While we are aware there are many alternatives to avoid using a null record_key. We believe this will act as a usability improvement so that new users are not forced to clean/update their data in order to use Hudi.
We are hoping to get some feedback on the idea
--
This message was sent by Atlassian Jira
(v8.3.4#803005)