You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/02/23 16:13:00 UTC

[jira] [Assigned] (HUDI-3490) Timestamp conversion (parquet)

     [ https://issues.apache.org/jira/browse/HUDI-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan reassigned HUDI-3490:
-----------------------------------------

    Assignee: sivabalan narayanan

> Timestamp conversion (parquet)
> ------------------------------
>
>                 Key: HUDI-3490
>                 URL: https://issues.apache.org/jira/browse/HUDI-3490
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Istvan Darvas
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Hi Guys!
>  
> My Env is Hudi 0.8.0 AWS EMR 6.4
>  
> It seems timestamp conversion is very confusing and not deterministic across the tools.
> 1.) for me it seems Delta Streamer default is TIMESTAMP_MILLIS
> 2.) PySpark/HUDI API is TIMESTAMP_MICROS
>  
> but the real issue for me is, I cannot control this.
>  
> Neither in DeltaStremer:
>  --hoodie-conf hoodie.parquet.outputtimestamptype=TIMESTAMP_MICROS
> Nor in PySpark
> {"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MILLIS"}
>  
> So I am not able to set a default for me accross systems. ofcourse I can convert it myself and I will do it as a workaround, but it would be greate to have this convenient feture.
>  
> One more suggestion / idea:
> I do not know it is possible or not, but maybe this parameter (hoodie.parquet.outputtimestamptype) could be removed from everywhere, and the framework could use the high level contract from the spark framework. Wich is
>    spark.sql.parquet.outputTimestampType = TIMESTAMP_MILLIS / TIMESTAMP_MICROS
>    the storage is INT96, which is not compatible with avro, but here I think you could do some atomatic conversion which would be well documented :)
>  
> Summarized, I am confused and I am not able to use the automatic conversion of the timestamps across the systems. So this should be standardized.
>  
> Thanks,
>  Darvi
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)