You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/13 16:26:41 UTC

[GitHub] [hudi] yihua opened a new pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

yihua opened a new pull request #3472:
URL: https://github.com/apache/hudi/pull/3472


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727) 
   * 220c4dd017bdb5bc214af6eb883163369e5d194a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731) 
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78195e83630f886fd005af73643dd02addb6fdd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718) 
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688834527



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +97,37 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker paths.
+   *
+   * @param instant     instant of interest to rollback
+   * @param parallelism parallelism to use
+   * @return a list of all markers
+   * @throws IOException
+   */
+  private List<String> getAllMarkerPaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());

Review comment:
       you can always do table.getMetaClient().getFs() to get this object. its more standard across teh code.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {

Review comment:
       why is this class needed in `hudi-common`?  markers are strictly used by writing correct




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1744",
       "triggerID" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * e72e261a2222faff8936581d749c6ca559b162d9 UNKNOWN
   * 105b56118726d4cf97622b6996c6030d2984aa5f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1744) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1744",
       "triggerID" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733) 
   * e72e261a2222faff8936581d749c6ca559b162d9 UNKNOWN
   * 105b56118726d4cf97622b6996c6030d2984aa5f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1744) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688694788



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
##########
@@ -173,6 +180,7 @@ private String translateMarkerToDataPath(String markerPath) {
     try {
       if (!fs.exists(dirPath)) {
         fs.mkdirs(dirPath); // create a new partition as needed.
+        MarkerUtils.writeMarkerTypeToFile(MarkerType.DIRECT, fs, markerDirPath.toString());

Review comment:
       same here.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
##########
@@ -158,7 +162,10 @@ private String translateMarkerToDataPath(String markerPath) {
     Set<String> markerFiles = new HashSet<>();
     if (doesMarkerDirExist()) {
       FSUtils.processFiles(fs, markerDirPath.toString(), fileStatus -> {
-        markerFiles.add(stripMarkerFolderPrefix(fileStatus.getPath().toString()));
+        String filePathStr = fileStatus.getPath().toString();
+        if (!filePathStr.contains(MARKER_TYPE_FILENAME)) {

Review comment:
       Since we agreed that we don't need marker type file with Direct, do we still need this check ?

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +96,40 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker file paths
+   *
+   * @param instant
+   * @param parallelism
+   * @return
+   * @throws IOException
+   */
+  private List<String> getAllMarkerFilePaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());
+    Option<MarkerType> markerTypeOption = MarkerUtils.readMarkerType(fileSystem, markerDir);
+
+    if (!markerTypeOption.isPresent()) {
+      WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+      return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+    }
+
+    switch (markerTypeOption.get()) {
+      case DIRECT:
+        WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+        return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+      case TIMELINE_SERVER_BASED:
+        // Reads all markers written by the timeline server
+        Map<String, Set<String>> markersMap =
+            MarkerUtils.readTimelineServerBasedMarkersFromFileSystem(
+                markerDir, fileSystem, context, parallelism);
+        List<String> markers = new ArrayList<>();
+        markersMap.forEach((key, value) -> markers.addAll(value));

Review comment:
       can we do this in one line? 
   markersMap.values(). .. collect() or something. 

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +96,40 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker file paths
+   *
+   * @param instant
+   * @param parallelism
+   * @return
+   * @throws IOException
+   */
+  private List<String> getAllMarkerFilePaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());
+    Option<MarkerType> markerTypeOption = MarkerUtils.readMarkerType(fileSystem, markerDir);
+
+    if (!markerTypeOption.isPresent()) {
+      WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+      return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+    }
+
+    switch (markerTypeOption.get()) {
+      case DIRECT:

Review comment:
       guess we need to fix this as well

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  public static final String MARKER_TYPE_FILENAME = MARKERS_FILENAME_PREFIX + ".type";
+  private static final Logger LOG = LogManager.getLogger(MarkerUtils.class);
+
+  /**
+   * Reads the marker type from `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   * @return the marker type, or empty if the marker type file does not exist.
+   */
+  public static Option<MarkerType> readMarkerType(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataInputStream fsDataInputStream = null;
+    String content = null;

Review comment:
       we can do Option<String> content = Option.empty();
   overide in line 78. 
   and so we can remove lines 86 to 88.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/WriteMarkersFactory.java
##########
@@ -20,6 +20,7 @@
 
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.fs.StorageSchemes;
+import org.apache.hudi.common.table.marker.MarkerType;

Review comment:
       guess this import is unused




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 220c4dd017bdb5bc214af6eb883163369e5d194a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731) 
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78195e83630f886fd005af73643dd02addb6fdd1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78195e83630f886fd005af73643dd02addb6fdd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733) 
   * e72e261a2222faff8936581d749c6ca559b162d9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727) 
   * 220c4dd017bdb5bc214af6eb883163369e5d194a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 220c4dd017bdb5bc214af6eb883163369e5d194a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731) 
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 220c4dd017bdb5bc214af6eb883163369e5d194a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731) 
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688844805



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {

Review comment:
       The utility methods are shared across `WriteMarkers` classes in `hudi-client-common` and `MarkerDirState` in `hudi-timeline-service` so I keep the `MarkerUtils` in `hudi-common` for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688844613



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +97,37 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker paths.
+   *
+   * @param instant     instant of interest to rollback
+   * @param parallelism parallelism to use
+   * @return a list of all markers
+   * @throws IOException
+   */
+  private List<String> getAllMarkerPaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());

Review comment:
       Right.  Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688825961



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
##########
@@ -173,6 +180,7 @@ private String translateMarkerToDataPath(String markerPath) {
     try {
       if (!fs.exists(dirPath)) {
         fs.mkdirs(dirPath); // create a new partition as needed.
+        MarkerUtils.writeMarkerTypeToFile(MarkerType.DIRECT, fs, markerDirPath.toString());

Review comment:
       Removed.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
##########
@@ -158,7 +162,10 @@ private String translateMarkerToDataPath(String markerPath) {
     Set<String> markerFiles = new HashSet<>();
     if (doesMarkerDirExist()) {
       FSUtils.processFiles(fs, markerDirPath.toString(), fileStatus -> {
-        markerFiles.add(stripMarkerFolderPrefix(fileStatus.getPath().toString()));
+        String filePathStr = fileStatus.getPath().toString();
+        if (!filePathStr.contains(MARKER_TYPE_FILENAME)) {

Review comment:
       This is no longer needed.  Removed.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/WriteMarkersFactory.java
##########
@@ -20,6 +20,7 @@
 
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.fs.StorageSchemes;
+import org.apache.hudi.common.table.marker.MarkerType;

Review comment:
       Previously `MarkerType` is under the same package as `WriteMarkersFactory` so there is no import here.  As I changed the package for `MarkerType`, we need the import here.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  public static final String MARKER_TYPE_FILENAME = MARKERS_FILENAME_PREFIX + ".type";
+  private static final Logger LOG = LogManager.getLogger(MarkerUtils.class);
+
+  /**
+   * Reads the marker type from `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   * @return the marker type, or empty if the marker type file does not exist.
+   */
+  public static Option<MarkerType> readMarkerType(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataInputStream fsDataInputStream = null;
+    String content = null;

Review comment:
       Fixed.

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +96,40 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker file paths
+   *
+   * @param instant
+   * @param parallelism
+   * @return
+   * @throws IOException
+   */
+  private List<String> getAllMarkerFilePaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());
+    Option<MarkerType> markerTypeOption = MarkerUtils.readMarkerType(fileSystem, markerDir);
+
+    if (!markerTypeOption.isPresent()) {
+      WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+      return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+    }
+
+    switch (markerTypeOption.get()) {
+      case DIRECT:

Review comment:
       Removed.

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -91,4 +96,40 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, JavaRDD<HoodieRecord<T>>,
         FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
         .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
   }
+
+  /**
+   * Gets all marker file paths
+   *
+   * @param instant
+   * @param parallelism
+   * @return
+   * @throws IOException
+   */
+  private List<String> getAllMarkerFilePaths(String instant, int parallelism) throws IOException {
+    String markerDir = table.getMetaClient().getMarkerFolderPath(instant);
+    FileSystem fileSystem = FSUtils.getFs(markerDir, context.getHadoopConf().newCopy());
+    Option<MarkerType> markerTypeOption = MarkerUtils.readMarkerType(fileSystem, markerDir);
+
+    if (!markerTypeOption.isPresent()) {
+      WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+      return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+    }
+
+    switch (markerTypeOption.get()) {
+      case DIRECT:
+        WriteMarkers writeMarkers = WriteMarkersFactory.get(MarkerType.DIRECT, table, instant);
+        return new ArrayList<>(writeMarkers.allMarkerFilePaths());
+      case TIMELINE_SERVER_BASED:
+        // Reads all markers written by the timeline server
+        Map<String, Set<String>> markersMap =
+            MarkerUtils.readTimelineServerBasedMarkersFromFileSystem(
+                markerDir, fileSystem, context, parallelism);
+        List<String> markers = new ArrayList<>();
+        markersMap.forEach((key, value) -> markers.addAll(value));

Review comment:
       Changed to this:
   ```
   return markersMap.values().stream().flatMap(Collection::stream)
               .collect(Collectors.toCollection(ArrayList::new));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727) 
   * 220c4dd017bdb5bc214af6eb883163369e5d194a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78195e83630f886fd005af73643dd02addb6fdd1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1731",
       "triggerID" : "220c4dd017bdb5bc214af6eb883163369e5d194a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de8e0b42af8530b5f3b13dfaf303c6be6f542330",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733",
       "triggerID" : "0e0fb964cc77bad99fddc1b96dda9e0c645c9908",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e72e261a2222faff8936581d749c6ca559b162d9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "105b56118726d4cf97622b6996c6030d2984aa5f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de8e0b42af8530b5f3b13dfaf303c6be6f542330 UNKNOWN
   * 0e0fb964cc77bad99fddc1b96dda9e0c645c9908 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1733) 
   * e72e261a2222faff8936581d749c6ca559b162d9 UNKNOWN
   * 105b56118726d4cf97622b6996c6030d2984aa5f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#issuecomment-898582490


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718",
       "triggerID" : "78195e83630f886fd005af73643dd02addb6fdd1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727",
       "triggerID" : "2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78195e83630f886fd005af73643dd02addb6fdd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1718) 
   * 2b92dc9f42cf8c6fbb18fa0ad68ff420f718e61f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1727) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688834040



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  public static final String MARKER_TYPE_FILENAME = MARKERS_FILENAME_PREFIX + ".type";
+  private static final Logger LOG = LogManager.getLogger(MarkerUtils.class);
+
+  /**
+   * Reads the marker type from `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   * @return the marker type, or empty if the marker type file does not exist.
+   */
+  public static Option<MarkerType> readMarkerType(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataInputStream fsDataInputStream = null;
+    Option<MarkerType> content = Option.empty();
+    try {
+      if (!fileSystem.exists(markerTypeFilePath)) {
+        return Option.empty();
+      }
+      fsDataInputStream = fileSystem.open(markerTypeFilePath);
+      content = Option.of(MarkerType.valueOf(FileIOUtils.readAsUTFString(fsDataInputStream)));
+    } catch (IOException e) {
+      throw new HoodieIOException("Cannot read marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    } finally {
+      closeQuietly(fsDataInputStream);
+    }
+    return content;
+  }
+
+  /**
+   * Writes the marker type to the file `MARKERS.type`.
+   *
+   * @param markerType marker type.
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   */
+  public static void writeMarkerTypeToFile(MarkerType markerType, FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataOutputStream fsDataOutputStream = null;
+    BufferedWriter bufferedWriter = null;
+    try {
+      fsDataOutputStream = fileSystem.create(markerTypeFilePath, false);
+      bufferedWriter = new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8));
+      bufferedWriter.write(markerType.toString());
+    } catch (IOException e) {
+      throw new HoodieException("Failed to create marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    } finally {
+      closeQuietly(bufferedWriter);
+      closeQuietly(fsDataOutputStream);
+    }
+  }
+
+  /**
+   * Deletes `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   */
+  public static void deleteMarkerTypeFile(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    try {
+      fileSystem.delete(markerTypeFilePath, false);
+    } catch (IOException e) {
+      throw new HoodieIOException("Cannot delete marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    }
+  }
+
+  /**
+   * Reads files containing the markers written by timeline-server-based marker mechanism.
+   *
+   * @param markerDir   marker directory.
+   * @param fileSystem  file system to use.
+   * @param context     instance of {@link HoodieEngineContext} to use
+   * @param parallelism parallelism to use
+   * @return A {@code Map} of file name to the set of markers stored in the file.
+   */
+  public static Map<String, Set<String>> readTimelineServerBasedMarkersFromFileSystem(
+      String markerDir, FileSystem fileSystem, HoodieEngineContext context, int parallelism) {
+    Path dirPath = new Path(markerDir);
+    try {
+      if (fileSystem.exists(dirPath)) {
+        FileStatus[] fileStatuses = fileSystem.listStatus(dirPath);
+        Predicate<String> prefixFilter = pathStr -> pathStr.contains(MARKERS_FILENAME_PREFIX);
+        Predicate<String> markerTypeFilter = pathStr -> !pathStr.equals(MARKER_TYPE_FILENAME);

Review comment:
       I thought you wanted to change this to contains.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan merged pull request #3472:
URL: https://github.com/apache/hudi/pull/3472


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #3472: [HUDI-2305] Add MARKERS.type and fix marker-based rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #3472:
URL: https://github.com/apache/hudi/pull/3472#discussion_r688844885



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.marker.MarkerType;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.FileIOUtils.closeQuietly;
+
+/**
+ * A utility class for marker related operations.
+ */
+public class MarkerUtils {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  public static final String MARKER_TYPE_FILENAME = MARKERS_FILENAME_PREFIX + ".type";
+  private static final Logger LOG = LogManager.getLogger(MarkerUtils.class);
+
+  /**
+   * Reads the marker type from `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   * @return the marker type, or empty if the marker type file does not exist.
+   */
+  public static Option<MarkerType> readMarkerType(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataInputStream fsDataInputStream = null;
+    Option<MarkerType> content = Option.empty();
+    try {
+      if (!fileSystem.exists(markerTypeFilePath)) {
+        return Option.empty();
+      }
+      fsDataInputStream = fileSystem.open(markerTypeFilePath);
+      content = Option.of(MarkerType.valueOf(FileIOUtils.readAsUTFString(fsDataInputStream)));
+    } catch (IOException e) {
+      throw new HoodieIOException("Cannot read marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    } finally {
+      closeQuietly(fsDataInputStream);
+    }
+    return content;
+  }
+
+  /**
+   * Writes the marker type to the file `MARKERS.type`.
+   *
+   * @param markerType marker type.
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   */
+  public static void writeMarkerTypeToFile(MarkerType markerType, FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    FSDataOutputStream fsDataOutputStream = null;
+    BufferedWriter bufferedWriter = null;
+    try {
+      fsDataOutputStream = fileSystem.create(markerTypeFilePath, false);
+      bufferedWriter = new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8));
+      bufferedWriter.write(markerType.toString());
+    } catch (IOException e) {
+      throw new HoodieException("Failed to create marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    } finally {
+      closeQuietly(bufferedWriter);
+      closeQuietly(fsDataOutputStream);
+    }
+  }
+
+  /**
+   * Deletes `MARKERS.type` file.
+   *
+   * @param fileSystem file system to use.
+   * @param markerDir  marker directory.
+   */
+  public static void deleteMarkerTypeFile(FileSystem fileSystem, String markerDir) {
+    Path markerTypeFilePath = new Path(markerDir, MARKER_TYPE_FILENAME);
+    try {
+      fileSystem.delete(markerTypeFilePath, false);
+    } catch (IOException e) {
+      throw new HoodieIOException("Cannot delete marker type file " + markerTypeFilePath.toString()
+          + "; " + e.getMessage(), e);
+    }
+  }
+
+  /**
+   * Reads files containing the markers written by timeline-server-based marker mechanism.
+   *
+   * @param markerDir   marker directory.
+   * @param fileSystem  file system to use.
+   * @param context     instance of {@link HoodieEngineContext} to use
+   * @param parallelism parallelism to use
+   * @return A {@code Map} of file name to the set of markers stored in the file.
+   */
+  public static Map<String, Set<String>> readTimelineServerBasedMarkersFromFileSystem(
+      String markerDir, FileSystem fileSystem, HoodieEngineContext context, int parallelism) {
+    Path dirPath = new Path(markerDir);
+    try {
+      if (fileSystem.exists(dirPath)) {
+        FileStatus[] fileStatuses = fileSystem.listStatus(dirPath);
+        Predicate<String> prefixFilter = pathStr -> pathStr.contains(MARKERS_FILENAME_PREFIX);
+        Predicate<String> markerTypeFilter = pathStr -> !pathStr.equals(MARKER_TYPE_FILENAME);

Review comment:
       Right.  I added a more strict check to prevent accidental matching:
   ```
   Predicate<String> markerTypeFilter =
               pathStr -> !stripMarkerFolderPrefix(pathStr, markerDir).equals(MARKER_TYPE_FILENAME);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org