You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2020/10/23 07:26:00 UTC

[jira] [Closed] (FLINK-19706) Add WARN logs when hive table partition has existed before commit in `MetastoreCommitPolicy`

     [ https://issues.apache.org/jira/browse/FLINK-19706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee closed FLINK-19706.
--------------------------------
    Resolution: Fixed

master (1.12): 7b04b29e182c6245298b2f032dcbbaf25fc7dbe2

> Add WARN logs when hive table partition has existed before commit in `MetastoreCommitPolicy`   
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19706
>                 URL: https://issues.apache.org/jira/browse/FLINK-19706
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, Connectors / Hive, Table SQL / Runtime
>            Reporter: Lsw_aka_laplace
>            Assignee: Lsw_aka_laplace
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>         Attachments: image-2020-10-19-16-47-39-354.png, image-2020-10-19-16-57-02-661.png, image-2020-10-19-17-00-27-255.png, image-2020-10-19-17-03-21-558.png, image-2020-10-19-18-16-35-083.png
>
>
> dfHi all,
>       Recently we have been devoted to using Hive Streaming Writing to accelerate our data-sync of Data Warehouse based on Hive, and eventually we made it. 
>        For producing purpose, a lot of metrics/logs/measures were added in order to help us analyze running info or fix some unexpected problems. Among these mentioned above, we found that Checking Repeated Partition Commit is the most important one. So here, we are willing to make a contribution of introducing this backwards to Community.
>      If this proposal is meaning, I am happy to introduce my design and implementation.
>  
> Looking forward to ANY opinion~
>  
>  
> ----UPDATE ----
>  
> Our user(using our own platform to build his own Flink job)raised some Requests. One of the requests is that once the parition is commited, the data in this partitio is regarded as frozen or completed. [Commiting partition] seem like a gurantee(but we all know it is hard to be a promise) in some way which tells us this partition is completed. Certainly, we make a lot of measures try to achieve that [partition-commit means completed]. So if a partition is committed twice or more times, for us, there must be sth wrong or our measures are insufficent.  On the other hand, it also inform us to do sth to make up to avoid data-loss or data-incompletion.  
>  
> So first of all, it is important to let us or help us know that certain partition is committed repeatedly. So that we can do the following things ASAP
>    1. analyze the reason or the cause 
>    2. do some trade-off operations
>    3. improve our code/measures
>  
> — Design and Implementation--- 
> There are basically two ways, both of them have been used in prod-env
> Approach1
> Add measures in CommitPolicy and be called before partition commit
> !image-2020-10-19-16-47-39-354.png|width=576,height=235!
> //{color:#ffab00}Newly posted, see here{color}
> !image-2020-10-19-18-16-35-083.png|width=725,height=313!
>  1.1 As the pic shows, add `checkPartitionExists` and implement it in sub-class
>   !image-2020-10-19-17-03-21-558.png|width=1203,height=88!
>  1.2 call checkPartitionExists before partition commit
> --- 
> Approach2
> Build a bounded cache of committed partitions and check it everytime before partition commit 
> (actually this cache supposed to be a operator state)
> !image-2020-10-19-16-57-02-661.png|width=1298,height=57!
>   2.1 build a cache
> !image-2020-10-19-17-00-27-255.png|width=1235,height=116!
>   2.2 check before commit 
>  
>  
> — UPDATE —
> After discussed with [~lzljs3620320], `Repeated partition check` seems  a little misleading in semantics, so only some WARN logs will be added in `MetastoreCommitPolicy` in aware of repeated commit 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)