You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Lsw_aka_laplace (Jira)" <ji...@apache.org> on 2020/10/19 08:20:00 UTC

[jira] [Updated] (FLINK-19706) Introduce `Repeated Partition Commit Check` in `org.apache.flink.table.filesystem.PartitionCommitPolicy`

     [ https://issues.apache.org/jira/browse/FLINK-19706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lsw_aka_laplace updated FLINK-19706:
------------------------------------
    Description: 
Hi all,

      Recently we have been devoted to using Hive Streaming Writing to accelerate our data-sync of Data Warehouse based on Hive, and eventually we made it. 

       For producing purpose, a lot of metrics/logs/measures were added in order to help us analyze running info or fix some unexpected problems. Among these mentioned above, we found that Checking Repeated Partition Commit is the most important one. So here, we are willing to make a contribution of introducing this backwards to Community.

     If this proposal is meaning, I am happy to introduce my design and implementation.

 

Looking forward to ANY opinion~

 

 

----UPDATE ----

 

Our user(using our own platform to build his own Flink job)raised some Requests. One of the requests is that once the parition is commited, the data in this partitio is regarded as frozen or completed. [Commiting partition] seem like a gurantee(but we all know it is hard to be a promise) in some way which tells us this partition is completed. Certainly, we make a lot of measures try to achieve that [partition-commit means completed]. So if a partition is committed twice or more times, for us, there must be sth wrong or our measures are insufficent.  On the other hand, it also inform us to do sth to make up to avoid data-loss or data-incompletion.  

 

So first of all, it is important to let us or help us know that certain partition is committed repeatedly. So that we can do the following things ASAP

   1. analyze the reason or the cause 

   2. do some trade-off operations

   3. improve our code/measures

 

 

  was:
Hi all,

      Recently we have been devoted to using Hive Streaming Writing to accelerate our data-sync of Data Warehouse based on Hive, and eventually we made it. 

       For producing purpose, a lot of metrics/logs/measures were added in order to help us analyze running info or fix some unexpected problems. Among these mentioned above, we found that Checking Repeated Partition Commit is the most important one. So here, we are willing to make a contribution of introducing this backwards to Community.

     If this proposal is meaning, I am happy to introduce my design and implementation.

 

Looking forward to ANY opinion~


> Introduce `Repeated Partition Commit Check` in `org.apache.flink.table.filesystem.PartitionCommitPolicy` 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19706
>                 URL: https://issues.apache.org/jira/browse/FLINK-19706
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, Connectors / Hive, Table SQL / Runtime
>            Reporter: Lsw_aka_laplace
>            Priority: Minor
>
> Hi all,
>       Recently we have been devoted to using Hive Streaming Writing to accelerate our data-sync of Data Warehouse based on Hive, and eventually we made it. 
>        For producing purpose, a lot of metrics/logs/measures were added in order to help us analyze running info or fix some unexpected problems. Among these mentioned above, we found that Checking Repeated Partition Commit is the most important one. So here, we are willing to make a contribution of introducing this backwards to Community.
>      If this proposal is meaning, I am happy to introduce my design and implementation.
>  
> Looking forward to ANY opinion~
>  
>  
> ----UPDATE ----
>  
> Our user(using our own platform to build his own Flink job)raised some Requests. One of the requests is that once the parition is commited, the data in this partitio is regarded as frozen or completed. [Commiting partition] seem like a gurantee(but we all know it is hard to be a promise) in some way which tells us this partition is completed. Certainly, we make a lot of measures try to achieve that [partition-commit means completed]. So if a partition is committed twice or more times, for us, there must be sth wrong or our measures are insufficent.  On the other hand, it also inform us to do sth to make up to avoid data-loss or data-incompletion.  
>  
> So first of all, it is important to let us or help us know that certain partition is committed repeatedly. So that we can do the following things ASAP
>    1. analyze the reason or the cause 
>    2. do some trade-off operations
>    3. improve our code/measures
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)