You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "leesf (Jira)" <ji...@apache.org> on 2019/11/26 10:47:00 UTC

[jira] [Closed] (HUDI-327) Introduce "null" supporting KeyGenerator

     [ https://issues.apache.org/jira/browse/HUDI-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

leesf closed HUDI-327.
----------------------
    Fix Version/s: 0.5.1
       Resolution: Fixed

Fixed via master: 60fed21dc7e4cb66b154ae9be77dfada0f3071a5

> Introduce "null" supporting KeyGenerator
> ----------------------------------------
>
>                 Key: HUDI-327
>                 URL: https://issues.apache.org/jira/browse/HUDI-327
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: Brandon Scheller
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Customers have been running into issues where they would like to use a record_key from columns that can contain null values. Currently, this will cause Hudi to crash and throw a cryptic exception.(improving error messaging is a separate but related issue)
> We would like to propose a new KeyGenerator based on ComplexKeyGenerator that allows for null record_keys.
> At a basic level, using the key generator without any options would essentially allow a null record_key to be accepted. (It can be replaced with an empty string, null, or some predefined "null" string representation)
> This comes with the negative side effect that all records with a null record_key would then be associated together. To work around this, you would be able to specify a secondary record_key to be used in the case that the first one is null. You would specify this in the same way that you do for the ComplexKeyGenerator as a comma separated list of record_keys. In this case, when the first key is seen as null then the second key will be used instead. We could support any arbitrary limit of record_keys here.
> While we are aware there are many alternatives to avoid using a null record_key. We believe this will act as a usability improvement so that new users are not forced to clean/update their data in order to use Hudi.
> We are hoping to get some feedback on the idea
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)