You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Sagar Sumit (Jira)" <ji...@apache.org> on 2022/01/11 03:18:00 UTC

[jira] [Commented] (HUDI-2971) Timestamp values being corrupted when using BULK INSERT with row writing enabled

    [ https://issues.apache.org/jira/browse/HUDI-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472431#comment-17472431 ] 

Sagar Sumit commented on HUDI-2971:
-----------------------------------

Fixed by https://github.com/apache/hudi/pull/4203

> Timestamp values being corrupted when using BULK INSERT with row writing enabled
> --------------------------------------------------------------------------------
>
>                 Key: HUDI-2971
>                 URL: https://issues.apache.org/jira/browse/HUDI-2971
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Ryan Pifer
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> We found that after performing bulk inserts with data that included Timestamps that after performing other write operations on the table that the Timestamps of records from the initial load were all corrupted. We narrowed this down to when row writing is enabled which uses Spark Datasource V2. In Hudi 0.9.0 row writing is enabled by default.
> Performing 2 inserts on new table `ts_ts` match in both records (expected results)
> {code:java}
> scala> spark.read.format("hudi").load("s3://ryanpife-emr-dev/hudi/data/hudi090/timestamp/2/").show()
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name| id|version|partition|          ts_string|              ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> |     20211022233434|  20211022233434_0_1|               101|                  2019|0db6c29d-5291-4f7...|101|      1|     2019|2021-05-07 00:00:00|2021-05-07 00:00:00|
> |     20211022233556|  20211022233556_0_1|               102|                  2019|0db6c29d-5291-4f7...|102|      2|     2019|2021-05-07 00:00:00|2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+-------------------+
> {code}
>  
> Performing bulk insert, then insert `ts_ts` do not match in records (corrupted result)
> {code:java}
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name| id|version|partition|          ts_string|               ts_ts|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+
> |     20211022232152|  20211022232152_0_1|               104|                  2019|dbdc2dd9-e870-4cf...|104|      4|     2019|2021-05-07 00:00:00|1970-01-19 18:05:...|
> |     20211022232441|  20211022232441_0_1|               105|                  2019|dbdc2dd9-e870-4cf...|105|      5|     2019|2021-05-07 00:00:00| 2021-05-07 00:00:00|
> +-------------------+--------------------+------------------+----------------------+--------------------+---+-------+---------+-------------------+--------------------+{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)