You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2022/05/24 05:09:00 UTC

[jira] [Commented] (HUDI-4138) Fix the concurrency modification of hoodie table config for flink

    [ https://issues.apache.org/jira/browse/HUDI-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541272#comment-17541272 ] 

Danny Chen commented on HUDI-4138:
----------------------------------

Fixed via master branch: 676d5cefe0294f495db04ec6476afdc00cbf0dd2

> Fix the concurrency modification of hoodie table config for flink
> -----------------------------------------------------------------
>
>                 Key: HUDI-4138
>                 URL: https://issues.apache.org/jira/browse/HUDI-4138
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>            Reporter: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.1
>
>
> From [GH|[https://github.com/apache/hudi/issues/5553]] (by [~danny0405]):
> ---------
> Have fired a fix for flink here: [#5660|https://github.com/apache/hudi/pull/5660]
> https://issues.apache.org/jira/browse/HUDI-3782 and
> https://issues.apache.org/jira/browse/HUDI-4138 may cause this bug.
> The {{HoodieTable#getMetadataWriter}} is used by many async table service such as cleaning, compaction, clustering and so on, this method now would try to modify the table config each time it is called no matter whether metadata table is enabled/disabled.
> In general, we should never make any side effect in the read code path of hoodie table config.
> And hoodie table metadata writer.
> I'm not sure how to fix this on Spark side, have two ways to fix on my mind:
>  # make table config concurrency safe (not suggested because it is too heavy for a config)
>  # make sure the metadata cleaning only happens once for the whole Job lifetime (still risky because there may be multiple jobs, but with very small probability). I would suggest this way from my side.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)