You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pratyaksh Sharma (Jira)" <ji...@apache.org> on 2019/09/04 13:15:00 UTC

[jira] [Closed] (HUDI-207) Introduce secondary source ordering field for breaking ties while writing

     [ https://issues.apache.org/jira/browse/HUDI-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pratyaksh Sharma closed HUDI-207.
---------------------------------
    Resolution: Invalid

No code work is needed. The mentioned functionality can be achieved using Transformer. 

> Introduce secondary source ordering field for breaking ties while writing
> -------------------------------------------------------------------------
>
>                 Key: HUDI-207
>                 URL: https://issues.apache.org/jira/browse/HUDI-207
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: patch
>
> When building CDC pipelines for capturing data changes in SQL, we need to read SQL's bin log file for fetching all the modifications made to a particular table. However in production environment where we are handling hundreds of transactions per second (TPS), it is possible to have the same table row getting modified multiple times within a second. 
> Here comes the problem with Mysql binlog as it has 32 bit timestamp upto seconds resolution. If we build CDC pipeline on top of such a table with huge TPS, then breaking ties between records with the same Hoodie key will not be possible with a single source-ordering-field (mentioned in HoodieDeltaStreamer.Config). 
> Example - [https://github.com/zendesk/maxwell/issues/925]
> The proposal is to add one secondary-source-ordering-field for breaking ties among incoming records in such cases.  For example, we could have ingestion_timestamp or binlog_position as the secondary field.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)