You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "chenshzh (via GitHub)" <gi...@apache.org> on 2023/04/04 10:50:33 UTC

[GitHub] [hudi] chenshzh opened a new pull request, #8379: Fix async compact/clustering serdes thread unsafe problems in multi versions Flink

chenshzh opened a new pull request, #8379:
URL: https://github.com/apache/hudi/pull/8379

   ### Change Logs
   
   Fix remained ser/des concurrency conflicts when aysnc compaction/clustering enabled.
   
   #### 1.  fix conflicts caused by ```WatermarkStatus```
   
   Conflicts will result from ```Watermark```, ```LatencyMarker``` and ```WatermarkStatus```, the community has fix other issues except ```WatermarkStatus```. We will fix it in this PR.
       
   flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/AbstractStreamTaskNetworkInput.java
   
   ```
       private void processElement(StreamElement recordOrMark, DataOutput<T> output) throws Exception {
           if (recordOrMark.isRecord()) {
               output.emitRecord(recordOrMark.asRecord());
           } else if (recordOrMark.isWatermark()) {
               statusWatermarkValve.inputWatermark(
                       recordOrMark.asWatermark(), flattenedChannelIndices.get(lastChannel), output);
           } else if (recordOrMark.isLatencyMarker()) {
               output.emitLatencyMarker(recordOrMark.asLatencyMarker());
           } else if (recordOrMark.isWatermarkStatus()) {
               statusWatermarkValve.inputWatermarkStatus(
                       recordOrMark.asWatermarkStatus(),
                       flattenedChannelIndices.get(lastChannel),
                       output);
           } else {
               throw new UnsupportedOperationException("Unknown type of StreamElement");
           }
       }
   ```
   
   ####  2. Since that ```WatermarkStatus```  is introduced since Flink1.14, this PR will add ```SafeAsyncOutputAdapter``` in multi modules, to coordinate Flink1.13 and Flink1.14 above
   
   The ```Output```  is actually the root cause of concurrency conflicts, so let's just refactor it but not the compact/cluster operator to minimize the impacts.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   fix bugs in multi Flink versions.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1495780163

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ec51fe8c615982b08d19a8d52c392db42e239494 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: [HUDI-6038] Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1498495997

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ec51fe8c615982b08d19a8d52c392db42e239494 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124) 
   * bd1756075ddbd3ef0a8121c24579fb098b509217 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: [HUDI-6038] Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1498500995

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16160",
       "triggerID" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ec51fe8c615982b08d19a8d52c392db42e239494 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124) 
   * bd1756075ddbd3ef0a8121c24579fb098b509217 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16160) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8379: Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8379:
URL: https://github.com/apache/hudi/pull/8379#discussion_r1158052521


##########
hudi-flink-datasource/hudi-flink1.13.x/src/main/java/org/apache/hudi/adapter/SafeAsyncOutputAdapter.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.adapter;
+
+import org.apache.flink.streaming.api.operators.Output;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.streamrecord.LatencyMarker;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.util.OutputTag;
+
+/** Adapter class for {@code Output} to handle async compaction/clustering service thread safe issues */
+public class SafeAsyncOutputAdapter<OUT> implements Output<StreamRecord<OUT>> {
+

Review Comment:
   `SafeAsyncOutputAdapter` -> `MaskingOutputAdapter` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1496649173

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ec51fe8c615982b08d19a8d52c392db42e239494 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8379: [HUDI-6038] Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8379:
URL: https://github.com/apache/hudi/pull/8379


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1495771010

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ec51fe8c615982b08d19a8d52c392db42e239494 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #8379: [HUDI-6038] Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "chenshzh (via GitHub)" <gi...@apache.org>.
chenshzh commented on code in PR #8379:
URL: https://github.com/apache/hudi/pull/8379#discussion_r1159274010


##########
hudi-flink-datasource/hudi-flink1.13.x/src/main/java/org/apache/hudi/adapter/SafeAsyncOutputAdapter.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.adapter;
+
+import org.apache.flink.streaming.api.operators.Output;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.streamrecord.LatencyMarker;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.util.OutputTag;
+
+/** Adapter class for {@code Output} to handle async compaction/clustering service thread safe issues */
+public class SafeAsyncOutputAdapter<OUT> implements Output<StreamRecord<OUT>> {
+

Review Comment:
   @danny0405 renamed to ```MaskingOutputAdapter```, pls review it once more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on pull request #8379: Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "chenshzh (via GitHub)" <gi...@apache.org>.
chenshzh commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1495841429

   @danny0405 would you pls take a review for this PR? 
   
   It resolves the remained cause ```WatermarkStatus``` of async compaction/clustering concurrency conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8379: [HUDI-6038] Fix async compact/clustering serdes conflicts caused by WatermarkStatus in multi versions Flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8379:
URL: https://github.com/apache/hudi/pull/8379#issuecomment-1498795668

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16124",
       "triggerID" : "ec51fe8c615982b08d19a8d52c392db42e239494",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16160",
       "triggerID" : "bd1756075ddbd3ef0a8121c24579fb098b509217",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bd1756075ddbd3ef0a8121c24579fb098b509217 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16160) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org