You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "kazdy (Jira)" <ji...@apache.org> on 2023/02/21 14:16:00 UTC

[jira] [Comment Edited] (HUDI-5608) Support decimals w/ precision > 30 in Column Stats

    [ https://issues.apache.org/jira/browse/HUDI-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691615#comment-17691615 ] 

kazdy edited comment on HUDI-5608 at 2/21/23 2:15 PM:
------------------------------------------------------

Hi [~rchertara]  and [~alexey.kudinkin],
One of my teammates stumbled upon a similar issue in Hudi 0.12.1, so I wanted to share it with you.
First write to Hudi was with DecimalType(4,0), second write with DecimalType(2,0).
So it's not always the case with high precision.
We got the same exception as in mentioned GH issue: AvroTypeException("Cannot encode decimal with precision 4 as max precision 2).

Another thing you might want to consider is how Spark behaves by default if you infer schema. 
Say you're reading from json and want to write to Hudi and infer json schema (pretty common usecase, data producers usually don't provide schemas for json files), then it will set the decimal to be DecimalType(38, 18), if you only supported precision up to 30 it will be breaking some pipelines that rely on schema inference.

[~rchertara]  can this be solved by adding DecimalWrapperV2 as well?


was (Author: JIRAUSER284048):
Hi [~rchertara]  and [~alexey.kudinkin],
One of my teammates stumbled upon a similar issue in Hudi 0.12.1, so I wanted to share it with you.
First write to Hudi was with DecimalType(4,0), second write with DecimalType(2,0).
So it's not always the case with high precision.
We got the same exception as in mentioned GH issue: AvroTypeException("Cannot encode decimal with precision 4 as max precision 2).

[~rchertara]  can this be solved by adding DecimalWrapperV2 as well?

> Support decimals w/ precision > 30 in Column Stats
> --------------------------------------------------
>
>                 Key: HUDI-5608
>                 URL: https://issues.apache.org/jira/browse/HUDI-5608
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.12.2
>            Reporter: Alexey Kudinkin
>            Priority: Critical
>             Fix For: 0.13.1
>
>
> As reported in: [https://github.com/apache/hudi/issues/7732]
>  
> Currently we've limited precision of the supported decimals at 30 assuming that this number is reasonably high to cover 99% of use-cases, but it seems like there's still a demand for even larger Decimals.
> The challenge is however to balance the need to support longer Decimals vs storage space we have to provision for each one of them.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)