You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Istvan Darvas (Jira)" <ji...@apache.org> on 2022/05/12 11:40:00 UTC

[jira] [Created] (HUDI-4091) Timestamp micro handling

Istvan Darvas created HUDI-4091:
-----------------------------------

             Summary: Timestamp micro handling
                 Key: HUDI-4091
                 URL: https://issues.apache.org/jira/browse/HUDI-4091
             Project: Apache Hudi
          Issue Type: Bug
    Affects Versions: 0.10.1
         Environment: AWS EMR
            Reporter: Istvan Darvas
         Attachments: b97b9e55-58a4-417b-b71c-f6b2d3860da0-0_0-26-1663_20220512111505310.parquet, before-save.png, example-code.txt

Hi Guys!
 
I am not able to use timestamp micro columns save with HUDI. 
I would like to save it keeping microsec granularity, but it only keeps milisec.
 
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond precision and unfortunately, I need the microsec in some case, because with this I run into a Schrödinger's cat situation !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can someone enlighten me what should I do?
 
Before save everything is fine!

Darvi
SLACK Thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)