You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vamshi Gudavarthi (Jira)" <ji...@apache.org> on 2022/10/20 05:34:00 UTC

[jira] [Updated] (HUDI-5001) Sanitize avro column names for RowSource

     [ https://issues.apache.org/jira/browse/HUDI-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vamshi Gudavarthi updated HUDI-5001:
------------------------------------
    Description: This issue is within the scope of row sources. The actual issue is that if the names of the columns in the row sources contain invalid avro characters ref [here|https://avro.apache.org/docs/1.10.2/spec.html#names] then using configuration set we can sanitize the column names both in the schema and actual data and the data ingestion to hudi isn't failed. The schema provider is scoped out to filebasedschemaregistry as other schema registries might not allow to register invalid schema in the first place.

> Sanitize avro column names for RowSource
> ----------------------------------------
>
>                 Key: HUDI-5001
>                 URL: https://issues.apache.org/jira/browse/HUDI-5001
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Vamshi Gudavarthi
>            Assignee: Vamshi Gudavarthi
>            Priority: Major
>
> This issue is within the scope of row sources. The actual issue is that if the names of the columns in the row sources contain invalid avro characters ref [here|https://avro.apache.org/docs/1.10.2/spec.html#names] then using configuration set we can sanitize the column names both in the schema and actual data and the data ingestion to hudi isn't failed. The schema provider is scoped out to filebasedschemaregistry as other schema registries might not allow to register invalid schema in the first place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)