You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/21 20:23:48 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

nsivabalan opened a new pull request #1858:
URL: https://github.com/apache/hudi/pull/1858


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467619791



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       This is what I was trying to convey. As part of this call, (2 commits), already upgrade step would have been executed. But, we also reset hoodie table version and call upgrade explicitly after this. So, should be fine. Not sure if you are aware of this. Will let you take a call. If you are ok, patch is good to be merged.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467608824



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -96,6 +97,8 @@ public HoodieTableConfig(FileSystem fs, String metaPath, String payloadClassName
       throw new HoodieIOException("Could not load Hoodie properties from " + propertyPath, e);
     }
     this.props = props;
+    ValidationUtils.checkArgument(props.containsKey(HOODIE_TABLE_TYPE_PROP_NAME) && props.containsKey(HOODIE_TABLE_NAME_PROP_NAME),

Review comment:
       these properties are written always. so this indicates a corrupted file (with high probability) . Actually what we discussed is that there wont be a corrupted/partial hoodie.properties file given even s3 (or gcs) guarantees partial writes won't be visible. So this is just additional safety




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r461265990



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
##########
@@ -100,8 +108,13 @@ public CopyOnWriteRollbackActionExecutor(JavaSparkContext jsc,
   }
 
   @Override
-  protected List<HoodieRollbackStat> executeRollbackUsingFileListing(HoodieInstant instantToRollback) {
+  protected List<HoodieRollbackStat> executeRollbackUsingFileListing(HoodieInstant instantToRollback, boolean doDelete) {
     List<ListingBasedRollbackRequest> rollbackRequests = generateRollbackRequestsByListing();
-    return new ListingBasedRollbackHelper(table.getMetaClient(), config).performRollback(jsc, instantToRollback, rollbackRequests);
+    ListingBasedRollbackHelper listingBasedRollbackHelper = new ListingBasedRollbackHelper(table.getMetaClient(), config);
+    if(doDelete) {
+      return listingBasedRollbackHelper.performRollback(jsc, instantToRollback, rollbackRequests);

Review comment:
       as discussed, we can just call the `collectRollbackStats` directly, assuming listing based rollback strategy.

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -68,34 +69,38 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
    * Performs all rollback actions that we have collected in parallel.
    */
   public List<HoodieRollbackStat> performRollback(JavaSparkContext jsc, HoodieInstant instantToRollback, List<ListingBasedRollbackRequest> rollbackRequests) {
-    SerializablePathFilter filter = (path) -> {

Review comment:
       ack

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
               1L
           );
           return new Tuple2<>(rollbackRequest.getPartitionPath(),
-                  HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-                          .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+              HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+                  .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
         }
         default:
           throw new IllegalStateException("Unknown Rollback action " + rollbackRequest);
       }
-    }).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+    });
   }
 
 
-
   /**
    * Common method used for cleaning out base files under a partition path during rollback of a set of commits.
    */
-  private Map<FileStatus, Boolean> deleteCleanedFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
-                                                      String partitionPath, PathFilter filter) throws IOException {
+  private Map<FileStatus, Boolean> deleteBaseAndLogFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,

Review comment:
       its hard. but would nt MERGE work for both in terms of actually performing a correct rollback? 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
               1L
           );
           return new Tuple2<>(rollbackRequest.getPartitionPath(),
-                  HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-                          .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+              HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+                  .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
         }
         default:
           throw new IllegalStateException("Unknown Rollback action " + rollbackRequest);
       }
-    }).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+    });
   }
 
 
-
   /**
    * Common method used for cleaning out base files under a partition path during rollback of a set of commits.
    */
-  private Map<FileStatus, Boolean> deleteCleanedFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
-                                                      String partitionPath, PathFilter filter) throws IOException {
+  private Map<FileStatus, Boolean> deleteBaseAndLogFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
+      String commit, String partitionPath, boolean doDelete) throws IOException {
     LOG.info("Cleaning path " + partitionPath);
+    String basefileExtension = metaClient.getTableConfig().getBaseFileFormat().getFileExtension();
+    SerializablePathFilter filter = (path) -> {
+      if (path.toString().endsWith(basefileExtension)) {
+        String fileCommitTime = FSUtils.getCommitTime(path.getName());
+        return commit.equals(fileCommitTime);
+      } else if (FSUtils.isLogFile(path)) {
+        // Since the baseCommitTime is the only commit for new log files, it's okay here
+        String fileCommitTime = FSUtils.getBaseCommitTimeFromLogPath(path);
+        return commit.equals(fileCommitTime);
+      }
+      return false;
+    };
+
     final Map<FileStatus, Boolean> results = new HashMap<>();
     FileSystem fs = metaClient.getFs();
     FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
     for (FileStatus file : toBeDeleted) {
-      boolean success = fs.delete(file.getPath(), false);
-      results.put(file, success);
-      LOG.info("Delete file " + file.getPath() + "\t" + success);
+      if(doDelete) {
+        boolean success = fs.delete(file.getPath(), false);
+        results.put(file, success);
+        LOG.info("Delete file " + file.getPath() + "\t" + success);
+      } else{
+        results.put(file, true);

Review comment:
       lets add a test , that has inflight commit, with few marker files deleted. and then we show that the code can correctly upgrade and perform a rollback using marker based strategy 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671023129


   @nsivabalan updated the PR with changes. 
   
   See `UpgradeDowngrade.java` for the new protocol. I added code to actually write the table version into hoodie.properties for new tables. Other than that, its mostly code cleanup. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467608104



##########
File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.testutils.AbstractShellIntegrationTest;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Properties;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests {@link UpgradeOrDowngradeCommand}.
+ */
+public class TestUpgradeDowngradeCommand extends AbstractShellIntegrationTest {
+
+  private String tablePath;
+
+  @BeforeEach
+  public void init() throws IOException {
+    String tableName = "test_table";
+    tablePath = basePath + File.separator + tableName;
+    new TableCommand().createTable(
+        tablePath, tableName, HoodieTableType.COPY_ON_WRITE.name(),
+        "", TimelineLayoutVersion.VERSION_1, "org.apache.hudi.common.model.HoodieAvroPayload");
+
+    //Create some commits files and parquet files
+    String commitTime1 = "100";
+    String commitTime2 = "101";
+    HoodieTestDataGenerator.writePartitionMetadata(fs, HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS, tablePath);

Review comment:
       not following. In general, this test can be much simplified. see `TestMarkerBasedRollbackStrategy` . I did not have time to fix this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671080326


   @nsivabalan the concerns you raised around the tests . Are these tests in current form not testing anything? is that what you are pointing out? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467613417



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade
+   * Step3 : If version are different, perform upgrade/downgrade.
+   * Step4 : Copy hoodie.properties -> hoodie.properties.upgraded with the version updated
+   * Step6 : Rename hoodie.properties.updated to hoodie.properties
+   * </p>
+   *
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @param toVersion version to which upgrade or downgrade has to be done.
+   * @param config instance of {@link HoodieWriteConfig} to use.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param instantTime current instant time that should not be touched.
+   */
+  public static void run(HoodieTableMetaClient metaClient, HoodieTableVersion toVersion, HoodieWriteConfig config,

Review comment:
       you could use. but at first sight, I got confused that we can calling the same method within run(). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r465844781



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngradeUtil.java
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeUtil {

Review comment:
       why feel apprehensive about upgrading? I mean, why do you want to upgrade to 0.6.0 but want to avoid the upgrade step? IMO, we should not have such knobs. But we do have knobs though, the marker based rollbacks. If you disable marker based rollback, this upgrade step may not take place. But this part of code is yet to be reviewed which I coded it up. Wanted to see if @bvaradar and other reviewers agrees that the upgrade step should be guarded by the marker based rollback config. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r460343368



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
         : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+    if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+      String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+      if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+        return HoodieTableVersion.ZERO_SIX_ZERO;
+      }
+    }
+    return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+    // TODO: fetch current version dynamically

Review comment:
       `HoodieTableVersion` or someplace we need to ahve a `CURR_VERSION` variable that gets bumped to 0.6.1 . 
   
   More I think about this. I think its better to name the versions 0,1,2... and so on, instead of release numbers. we may not bump this up every release . only when upgrade/downgrade is necessary. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-668286733


   @nsivabalan  can you squash and rebase against master? Also please add the great description you have on the PR body into the commit message itself. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r464714682



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -186,10 +188,14 @@ public HoodieMetrics getMetrics() {
    * Get HoodieTable and init {@link Timer.Context}.
    *
    * @param operationType write operation type
+   * @param instantTime current inflight instant time
    * @return HoodieTable
    */
-  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
+  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType, String instantTime) {
     HoodieTableMetaClient metaClient = createMetaClient(true);
+    if (config.shouldRollbackUsingMarkers()) {

Review comment:
       we should do this no matter, whether rollback using markers is on /off

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/RollbackUtils.java
##########
@@ -63,4 +84,156 @@ static HoodieRollbackStat mergeRollbackStat(HoodieRollbackStat stat1, HoodieRoll
     return new HoodieRollbackStat(stat1.getPartitionPath(), successDeleteFiles, failedDeleteFiles, commandBlocksCount);
   }
 
+  /**

Review comment:
       is this just moving code in bulk?

##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
##########
@@ -329,9 +341,34 @@ private static int deleteSavepoint(JavaSparkContext jsc, String savepointTime, S
     }
   }
 
+  /**
+   * Upgrade or downgrade hoodie table.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param basePath base path of the dataset.
+   * @param toVersion version to which upgrade/downgrade to be done.
+   * @return 0 if success, else -1.
+   * @throws Exception
+   */
+  protected static int upgradeOrDowngradeHoodieDataset(JavaSparkContext jsc, String basePath, String toVersion) throws Exception {
+    HoodieWriteConfig config = getWriteConfig(basePath);
+    HoodieTableMetaClient metaClient = ClientUtils.createMetaClient(jsc.hadoopConfiguration(), config, false);
+    try {
+      UpgradeDowngradeUtil.doUpgradeOrDowngrade(metaClient, HoodieTableVersion.valueOf(toVersion), config, jsc, null);

Review comment:
       rename: `UpgradeDowngradeUtil.migrate(..)`

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java
##########
@@ -279,6 +287,23 @@ public static String createDataFile(String basePath, String partitionPath, Strin
     return fileID;
   }
 
+  public static void createMarkerFile(String basePath, String partitionPath, String instantTime, String dataFileName) throws IOException {

Review comment:
       we should keep these to just HoodieClientTestUtils. since markers are just an artifact of the client




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1858: [1014] Adding Upgrade or downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-666064032


   @n3nash @vinothchandar : you folks can review the patch. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467617490



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";

Review comment:
       minor: there are only 2 args. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1858:
URL: https://github.com/apache/hudi/pull/1858


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467603967



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);
+
+    HoodieTable<?> table = this.getHoodieTable(metaClient, cfg);
+    HoodieInstant commitInstant = table.getPendingCommitTimeline().lastInstant().get();
+
+    // delete one of the marker files in 2nd commit if need be.
+    MarkerFiles markerFiles = new MarkerFiles(table, commitInstant.getTimestamp());
+    List<String> markerPaths = markerFiles.allMarkerFilePaths();
+    if (deletePartialMarkerFiles) {
+      String toDeleteMarkerFile = markerPaths.get(0);
+      table.getMetaClient().getFs().delete(new Path(table.getMetaClient().getTempFolderPath() + "/" + commitInstant.getTimestamp() + "/" + toDeleteMarkerFile));
+      markerPaths.remove(toDeleteMarkerFile);
+    }
+
+    // set hoodie.table.version to 0 in hoodie.properties file
+    metaClient.getTableConfig().setTableVersion(HoodieTableVersion.ZERO);
+
+    // if induce residues are set, copy property file to orig file.
+    if (induceResiduesFromPrevUpgrade) {
+      createResidualFile();
+    }
+
+    // should re-create marker files for 2nd commit since its pending. If there was any residues, no upgrade steps should happen except for updating the hoodie.table.version

Review comment:
       comments need fixing. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade
+   * Step3 : If version are different, perform upgrade/downgrade.
+   * Step4 : Copy hoodie.properties -> hoodie.properties.upgraded with the version updated
+   * Step6 : Rename hoodie.properties.updated to hoodie.properties
+   * </p>
+   *
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @param toVersion version to which upgrade or downgrade has to be done.
+   * @param config instance of {@link HoodieWriteConfig} to use.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param instantTime current instant time that should not be touched.
+   */
+  public static void run(HoodieTableMetaClient metaClient, HoodieTableVersion toVersion, HoodieWriteConfig config,

Review comment:
       why use same name for this method and for the other method too? 

##########
File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.testutils.AbstractShellIntegrationTest;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Properties;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Tests {@link UpgradeOrDowngradeCommand}.
+ */
+public class TestUpgradeDowngradeCommand extends AbstractShellIntegrationTest {
+
+  private String tablePath;
+
+  @BeforeEach
+  public void init() throws IOException {
+    String tableName = "test_table";
+    tablePath = basePath + File.separator + tableName;
+    new TableCommand().createTable(
+        tablePath, tableName, HoodieTableType.COPY_ON_WRITE.name(),
+        "", TimelineLayoutVersion.VERSION_1, "org.apache.hudi.common.model.HoodieAvroPayload");
+
+    //Create some commits files and parquet files
+    String commitTime1 = "100";
+    String commitTime2 = "101";
+    HoodieTestDataGenerator.writePartitionMetadata(fs, HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS, tablePath);

Review comment:
       probably you might have to follow something like this in TestUpgradeDowngrade. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade

Review comment:
       updated/upgraded. Lets use same terminology everywhere. Ignore addressing renaming/java docs/refactoring comments for now. Let's get the patch in for now. But leaving comments so that I can take it up after 0.6.0 release. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -96,6 +97,8 @@ public HoodieTableConfig(FileSystem fs, String metaPath, String payloadClassName
       throw new HoodieIOException("Could not load Hoodie properties from " + propertyPath, e);
     }
     this.props = props;
+    ValidationUtils.checkArgument(props.containsKey(HOODIE_TABLE_TYPE_PROP_NAME) && props.containsKey(HOODIE_TABLE_NAME_PROP_NAME),

Review comment:
       sorry, I don't get why we need this here. If properties contain table type and table name, why bail out? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467485368



##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
##########
@@ -329,9 +341,34 @@ private static int deleteSavepoint(JavaSparkContext jsc, String savepointTime, S
     }
   }
 
+  /**
+   * Upgrade or downgrade hoodie table.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param basePath base path of the dataset.
+   * @param toVersion version to which upgrade/downgrade to be done.
+   * @return 0 if success, else -1.
+   * @throws Exception
+   */
+  protected static int upgradeOrDowngradeHoodieDataset(JavaSparkContext jsc, String basePath, String toVersion) throws Exception {
+    HoodieWriteConfig config = getWriteConfig(basePath);
+    HoodieTableMetaClient metaClient = ClientUtils.createMetaClient(jsc.hadoopConfiguration(), config, false);
+    try {
+      UpgradeDowngradeUtil.doUpgradeOrDowngrade(metaClient, HoodieTableVersion.valueOf(toVersion), config, jsc, null);

Review comment:
       not really. migrate is a general term. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467543150



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -186,10 +188,14 @@ public HoodieMetrics getMetrics() {
    * Get HoodieTable and init {@link Timer.Context}.
    *
    * @param operationType write operation type
+   * @param instantTime current inflight instant time
    * @return HoodieTable
    */
-  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
+  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType, String instantTime) {
     HoodieTableMetaClient metaClient = createMetaClient(true);
+    if (config.shouldRollbackUsingMarkers()) {

Review comment:
       tests actually pass. I have changes where this solely gated by the table version and not any other config. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467485250



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngradeUtil.java
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeUtil {

Review comment:
       We should upgrade no matter what - regardless of marker based rollback being on/off, regardless of timeline layout version. as of now, timeline layout version is a config that can be controlled manually. in a future version, say 1->2 we can force migrate timeline line layout if need be. 
   
   Meta point is: making the decision to upgrade/downgrade based on write config is a problematic thing IMO. It makes reasoning with actions done during upgrade be based on configs, which can be changed in subsequent writes. We need to be able to reason about state of dataset after upgrade, and convince ourselves that any config change after that point would work well. 
   i.e even if markerBasedRollback=disabled during upgrade, we prepare the dataset such that it can be turned on at any given point from there on. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458365915



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -190,6 +192,7 @@ public HoodieMetrics getMetrics() {
    */
   protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
     HoodieTableMetaClient metaClient = createMetaClient(true);
+    mayBeUpradeOrDowngrade(metaClient);

Review comment:
       @vinothchandar : is this the right place to call upgrade/downgrade. If not, please advise. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java
##########
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeHelper {
+
+  public static final String HOODIE_ORIG_PROPERTY_FILE = "hoodie.properties.orig";
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie verion was upgraded to 0.6.0, there are some upgrade steps need
+   * to be executed before doing any writes.
+   * Similarly, if a dataset was created using 0.6.0 and then hoodie was downgraded, some downgrade steps need to be executed before proceeding w/ any writes.
+   * On a high level, these are the steps performed
+   * Step1 : Understand current hoodie version and table version from hoodie.properties file
+   * Step2 : Fix any residues from previous upgrade/downgrade
+   * Step3 : Check for version upgrade/downgrade.
+   * Step4 : If upgrade/downgrade is required, perform the steps required for the same.
+   * Step5 : Copy hoodie.properties -> hoodie.properties.orig
+   * Step6 : Update hoodie.properties file with current table version
+   * Step7 : Delete hoodie.properties.orig
+   * </p>
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @throws IOException
+   */
+  public static void doUpgradeOrDowngrade(HoodieTableMetaClient metaClient) throws IOException {
+    // Fetch version from property file and current version
+    HoodieTableVersion versionFromPropertyFile = metaClient.getTableConfig().getHoodieTableVersionFromPropertyFile();
+    HoodieTableVersion currentVersion = metaClient.getTableConfig().getCurrentHoodieTableVersion();
+
+    Path metaPath = new Path(metaClient.getMetaPath());
+    Path originalHoodiePropertyPath = getOrigHoodiePropertyFilePath(metaPath.toString());
+
+    boolean updateTableVersionInPropertyFile = false;
+
+    if (metaClient.getFs().exists(originalHoodiePropertyPath)) {
+      // if hoodie.properties.orig exists, rename to hoodie.properties and skip upgrade/downgrade step
+      metaClient.getFs().rename(originalHoodiePropertyPath, getHoodiePropertyFilePath(metaPath.toString()));
+      updateTableVersionInPropertyFile = true;
+    } else {
+      // upgrade or downgrade if there is a version mismatch
+      if (versionFromPropertyFile != currentVersion) {
+        updateTableVersionInPropertyFile = true;
+        if (versionFromPropertyFile == HoodieTableVersion.PRE_ZERO_SIZE_ZERO && currentVersion == HoodieTableVersion.ZERO_SIX_ZERO) {
+          upgradeFromOlderToZeroSixZero();
+        } else if (versionFromPropertyFile == HoodieTableVersion.ZERO_SIX_ZERO && currentVersion == HoodieTableVersion.PRE_ZERO_SIZE_ZERO) {
+          downgradeFromZeroSixZeroToPreZeroSixZero();
+        } else {
+          throw new HoodieException("Illegal state wrt table versions. Version from proerpty file " + versionFromPropertyFile + " and current version " + currentVersion);
+        }
+      }
+    }
+
+    /**
+     * If table version needs to be updated in hoodie.properties file.
+     * Step1: Copy hoodie.properties to hoodie.properties.orig
+     * Step2: add table.version to hoodie.properties
+     * Step3: delete hoodie.properties.orig
+     */
+    if (updateTableVersionInPropertyFile) {
+      updateTableVersionInMetaPath(metaClient);

Review comment:
       @vinothchandar : after updating the hoodie.properties file, I haven't reloaded the meta client as of now. It is just the table version in memory that has changed and no other code blocks should access table.version. Do you think is reload of metaclient mandatory if we update hoodie.properties w/ new table version?  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r466989376



##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
##########
@@ -329,9 +341,34 @@ private static int deleteSavepoint(JavaSparkContext jsc, String savepointTime, S
     }
   }
 
+  /**
+   * Upgrade or downgrade hoodie table.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param basePath base path of the dataset.
+   * @param toVersion version to which upgrade/downgrade to be done.
+   * @return 0 if success, else -1.
+   * @throws Exception
+   */
+  protected static int upgradeOrDowngradeHoodieDataset(JavaSparkContext jsc, String basePath, String toVersion) throws Exception {
+    HoodieWriteConfig config = getWriteConfig(basePath);
+    HoodieTableMetaClient metaClient = ClientUtils.createMetaClient(jsc.hadoopConfiguration(), config, false);
+    try {
+      UpgradeDowngradeUtil.doUpgradeOrDowngrade(metaClient, HoodieTableVersion.valueOf(toVersion), config, jsc, null);

Review comment:
       I am not sure if migrate will be the right terminology to use here. Isn't migrate used to move from one system to another? This is more of an upgrade version or downgrade version right within the same system(hudi).  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r465874204



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngradeUtil.java
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeUtil {

Review comment:
       @nsivabalan Can you point me to the part of the code that you are referring to ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467607914



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade
+   * Step3 : If version are different, perform upgrade/downgrade.
+   * Step4 : Copy hoodie.properties -> hoodie.properties.upgraded with the version updated
+   * Step6 : Rename hoodie.properties.updated to hoodie.properties
+   * </p>
+   *
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @param toVersion version to which upgrade or downgrade has to be done.
+   * @param config instance of {@link HoodieWriteConfig} to use.
+   * @param jsc instance of {@link JavaSparkContext} to use.
+   * @param instantTime current instant time that should not be touched.
+   */
+  public static void run(HoodieTableMetaClient metaClient, HoodieTableVersion toVersion, HoodieWriteConfig config,

Review comment:
       why not? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671083262


   ```
     // set hoodie.table.version to 0 in hoodie.properties file
       metaClient.getTableConfig().setTableVersion(HoodieTableVersion.ZERO);
   
       if (induceResiduesFromPrevUpgrade) {
         createResidualFile();
       }
   
       // should re-create marker files for 2nd commit since its pending. If there was any residues, no upgrade steps should happen except for updating the hoodie.table.version
       UpgradeDowngrade.run(metaClient, HoodieTableVersion.ONE, cfg, jsc, null);
   ```
   
   @nsivabalan this won't be a no-op for sure IMO. `setTableVersion()` no longer persists the version to dfs. just in memory. but for upgrade it does not matter, since default is ZERO. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r461213552



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java
##########
@@ -159,24 +161,32 @@ private void rollBackIndex() {
     LOG.info("Index rolled back for commits " + instantToRollback);
   }
 
-  public List<HoodieRollbackStat> doRollbackAndGetStats() {
-    final String instantTimeToRollback = instantToRollback.getTimestamp();
-    final boolean isPendingCompaction = Objects.equals(HoodieTimeline.COMPACTION_ACTION, instantToRollback.getAction())
-        && !instantToRollback.isCompleted();
-    validateSavepointRollbacks();
-    if (!isPendingCompaction) {
-      validateRollbackCommitSequence();
-    }
-
-    try {
-      List<HoodieRollbackStat> stats = executeRollback();
-      LOG.info("Rolled back inflight instant " + instantTimeToRollback);
+  public List<HoodieRollbackStat> mayBeRollbackAndGetStats(boolean doDelete) {
+    if(doDelete) {
+      final String instantTimeToRollback = instantToRollback.getTimestamp();
+      final boolean isPendingCompaction = Objects.equals(HoodieTimeline.COMPACTION_ACTION, instantToRollback.getAction())
+          && !instantToRollback.isCompleted();
+      validateSavepointRollbacks();
       if (!isPendingCompaction) {
-        rollBackIndex();
+        validateRollbackCommitSequence();
+      }
+
+      try {
+        List<HoodieRollbackStat> stats = executeRollback(doDelete);
+        LOG.info("Rolled back inflight instant " + instantTimeToRollback);
+        if (!isPendingCompaction) {
+          rollBackIndex();
+        }
+        return stats;
+      } catch (IOException e) {
+        throw new HoodieIOException("Unable to execute rollback ", e);
+      }
+    } else{
+      try {
+        return executeRollback(doDelete);

Review comment:
       this is the else part where we just collect stats. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java
##########
@@ -159,24 +161,32 @@ private void rollBackIndex() {
     LOG.info("Index rolled back for commits " + instantToRollback);
   }
 
-  public List<HoodieRollbackStat> doRollbackAndGetStats() {
-    final String instantTimeToRollback = instantToRollback.getTimestamp();
-    final boolean isPendingCompaction = Objects.equals(HoodieTimeline.COMPACTION_ACTION, instantToRollback.getAction())
-        && !instantToRollback.isCompleted();
-    validateSavepointRollbacks();
-    if (!isPendingCompaction) {
-      validateRollbackCommitSequence();
-    }
-
-    try {
-      List<HoodieRollbackStat> stats = executeRollback();
-      LOG.info("Rolled back inflight instant " + instantTimeToRollback);
+  public List<HoodieRollbackStat> mayBeRollbackAndGetStats(boolean doDelete) {

Review comment:
       @vinothchandar : I have added a flag here to say where delete has to be done or just stats need to be collected. Since I don't want to duplicate code, tried my best to re-use. If you can think of any other ways, lmk. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table;
+
+import org.apache.hudi.common.HoodieRollbackStat;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieRollbackException;
+import org.apache.hudi.io.IOType;
+import org.apache.hudi.table.action.rollback.CopyOnWriteRollbackActionExecutor;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeHelper {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngradeHelper.class);
+  public static final String HOODIE_ORIG_PROPERTY_FILE = "hoodie.properties.orig";
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths. Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3),
+   * and Hoodie version was upgraded to 0.6.0, Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded to pre 0.6.0 or to Hoodie table version 0,
+   * then some downgrade steps need to be executed before proceeding w/ any writes.
+   * On a high level, these are the steps performed
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Fix any residues from previous upgrade/downgrade
+   * Step3 : If there are no residues, Check for version upgrade/downgrade. If version mismatch, perform upgrade/downgrade.
+   * Step4 : If there are residues, clean them up and skip upgrade/downgrade since those steps would have been completed last time.
+   * Step5 : Copy hoodie.properties -> hoodie.properties.orig
+   * Step6 : Update hoodie.properties file with current table version
+   * Step7 : Delete hoodie.properties.orig
+   * </p>
+   *
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @param toVersion version to which upgrade or downgrade has to be done.
+   */
+  public static void doUpgradeOrDowngrade(HoodieTableMetaClient metaClient, HoodieTableVersion toVersion, HoodieWriteConfig config, JavaSparkContext jsc) throws IOException {
+    // Fetch version from property file and current version
+    HoodieTableVersion versionFromPropertyFile = metaClient.getTableConfig().getHoodieTableVersionFromPropertyFile();
+
+    Path metaPath = new Path(metaClient.getMetaPath());
+    Path originalHoodiePropertyFile = getOrigHoodiePropertyFilePath(metaPath.toString());
+
+    boolean updateTableVersionInPropertyFile = false;
+
+    if (metaClient.getFs().exists(originalHoodiePropertyFile)) {
+      // if hoodie.properties.orig exists, rename to hoodie.properties and skip upgrade/downgrade step
+      metaClient.getFs().rename(originalHoodiePropertyFile, getHoodiePropertyFilePath(metaPath.toString()));
+      updateTableVersionInPropertyFile = true;
+    } else {
+      // upgrade or downgrade if there is a version mismatch
+      if (versionFromPropertyFile != toVersion) {
+        updateTableVersionInPropertyFile = true;
+        if (versionFromPropertyFile == HoodieTableVersion.ONE && toVersion == HoodieTableVersion.ZERO) {
+          upgradeFromZeroToOne(config, jsc.hadoopConfiguration(), jsc);
+        } else if (versionFromPropertyFile == HoodieTableVersion.ZERO && toVersion == HoodieTableVersion.ONE) {
+          downgradeFromOneToZero();
+        } else {
+          throw new HoodieException("Illegal state wrt table versions. Version from proerpty file " + versionFromPropertyFile + " and current version " + toVersion);
+        }
+      }
+    }
+
+    /**
+     * If table version needs to be updated in hoodie.properties file.
+     * Step1: Copy hoodie.properties to hoodie.properties.orig
+     * Step2: add table.version to hoodie.properties
+     * Step3: delete hoodie.properties.orig
+     */
+    if (updateTableVersionInPropertyFile) {
+      updateTableVersionInHoodiePropertyFile(metaClient, toVersion);
+    }
+  }
+
+  /**
+   * Upgrade steps to be done to upgrade from hoodie table version 0 to 1.
+   */
+  private static void upgradeFromZeroToOne(HoodieWriteConfig config, Configuration hadoopConf, JavaSparkContext jsc) {
+    // fetch pending commit info
+    HoodieTable table = HoodieTable.create(config, hadoopConf);
+    HoodieTimeline inflightTimeline = table.getMetaClient().getCommitsTimeline().filterPendingExcludingCompaction();
+    List<String> commits = inflightTimeline.getReverseOrderedInstants().map(HoodieInstant::getTimestamp)
+        .collect(Collectors.toList());
+    for (String commit : commits) {
+      // for every pending commit, delete old marker files and re-SparkMaincreate marker files in new format
+      recreateMarkerFiles(commit, table, jsc);
+    }
+  }
+
+  public static void recreateMarkerFiles(final String commitInstantTime, HoodieTable table, JavaSparkContext jsc) throws HoodieRollbackException {
+    try {
+      Option<HoodieInstant> commitInstantOpt = Option.fromJavaOptional(table.getActiveTimeline().getCommitsTimeline().getInstants()
+          .filter(instant -> HoodieActiveTimeline.EQUALS.test(instant.getTimestamp(), commitInstantTime))
+          .findFirst());
+      if (commitInstantOpt.isPresent()) {
+        MarkerFiles markerFiles = new MarkerFiles(table, commitInstantTime);
+        markerFiles.quietDeleteMarkerDir();
+
+        List<HoodieRollbackStat> rollbackStats = new CopyOnWriteRollbackActionExecutor(jsc, table.getConfig(), table, "", commitInstantOpt.get(), false).mayBeRollbackAndGetStats(false);
+
+        for (HoodieRollbackStat rollbackStat : rollbackStats) {
+          for (FileStatus fileStatus : rollbackStat.getFilesToRollback()) {
+            String path = fileStatus.getPath().toString();
+            String dataFileName = path.substring(path.lastIndexOf("/") + 1);
+            markerFiles.create(rollbackStat.getPartitionPath(), dataFileName, path.endsWith(table.getBaseFileExtension()) ? IOType.CREATE : IOType.MERGE);

Review comment:
       yet to figure out how to differentiate CREATE and MERGE from fileStatus

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/HoodieRollbackStat.java
##########
@@ -39,12 +40,15 @@
   // Count of HoodieLogFile to commandBlocks written for a particular rollback
   private final Map<FileStatus, Long> commandBlocksCount;
 
+  private final List<FileStatus> filesToRollback;

Review comment:
       have added this to hold file status fully to be used for upgrade 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java
##########
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table;
+
+import org.apache.hudi.common.HoodieRollbackStat;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieRollbackException;
+import org.apache.hudi.io.IOType;
+import org.apache.hudi.table.action.rollback.CopyOnWriteRollbackActionExecutor;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeHelper {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngradeHelper.class);
+  public static final String HOODIE_ORIG_PROPERTY_FILE = "hoodie.properties.orig";
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths. Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3),
+   * and Hoodie version was upgraded to 0.6.0, Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded to pre 0.6.0 or to Hoodie table version 0,
+   * then some downgrade steps need to be executed before proceeding w/ any writes.
+   * On a high level, these are the steps performed
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Fix any residues from previous upgrade/downgrade
+   * Step3 : If there are no residues, Check for version upgrade/downgrade. If version mismatch, perform upgrade/downgrade.
+   * Step4 : If there are residues, clean them up and skip upgrade/downgrade since those steps would have been completed last time.
+   * Step5 : Copy hoodie.properties -> hoodie.properties.orig
+   * Step6 : Update hoodie.properties file with current table version
+   * Step7 : Delete hoodie.properties.orig
+   * </p>
+   *
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @param toVersion version to which upgrade or downgrade has to be done.
+   */
+  public static void doUpgradeOrDowngrade(HoodieTableMetaClient metaClient, HoodieTableVersion toVersion, HoodieWriteConfig config, JavaSparkContext jsc) throws IOException {
+    // Fetch version from property file and current version
+    HoodieTableVersion versionFromPropertyFile = metaClient.getTableConfig().getHoodieTableVersionFromPropertyFile();
+
+    Path metaPath = new Path(metaClient.getMetaPath());
+    Path originalHoodiePropertyFile = getOrigHoodiePropertyFilePath(metaPath.toString());
+
+    boolean updateTableVersionInPropertyFile = false;
+
+    if (metaClient.getFs().exists(originalHoodiePropertyFile)) {
+      // if hoodie.properties.orig exists, rename to hoodie.properties and skip upgrade/downgrade step
+      metaClient.getFs().rename(originalHoodiePropertyFile, getHoodiePropertyFilePath(metaPath.toString()));
+      updateTableVersionInPropertyFile = true;
+    } else {
+      // upgrade or downgrade if there is a version mismatch
+      if (versionFromPropertyFile != toVersion) {
+        updateTableVersionInPropertyFile = true;
+        if (versionFromPropertyFile == HoodieTableVersion.ONE && toVersion == HoodieTableVersion.ZERO) {
+          upgradeFromZeroToOne(config, jsc.hadoopConfiguration(), jsc);
+        } else if (versionFromPropertyFile == HoodieTableVersion.ZERO && toVersion == HoodieTableVersion.ONE) {
+          downgradeFromOneToZero();
+        } else {
+          throw new HoodieException("Illegal state wrt table versions. Version from proerpty file " + versionFromPropertyFile + " and current version " + toVersion);
+        }
+      }
+    }
+
+    /**
+     * If table version needs to be updated in hoodie.properties file.
+     * Step1: Copy hoodie.properties to hoodie.properties.orig
+     * Step2: add table.version to hoodie.properties
+     * Step3: delete hoodie.properties.orig
+     */
+    if (updateTableVersionInPropertyFile) {
+      updateTableVersionInHoodiePropertyFile(metaClient, toVersion);
+    }
+  }
+
+  /**
+   * Upgrade steps to be done to upgrade from hoodie table version 0 to 1.
+   */
+  private static void upgradeFromZeroToOne(HoodieWriteConfig config, Configuration hadoopConf, JavaSparkContext jsc) {
+    // fetch pending commit info
+    HoodieTable table = HoodieTable.create(config, hadoopConf);
+    HoodieTimeline inflightTimeline = table.getMetaClient().getCommitsTimeline().filterPendingExcludingCompaction();
+    List<String> commits = inflightTimeline.getReverseOrderedInstants().map(HoodieInstant::getTimestamp)
+        .collect(Collectors.toList());
+    for (String commit : commits) {
+      // for every pending commit, delete old marker files and re-SparkMaincreate marker files in new format
+      recreateMarkerFiles(commit, table, jsc);
+    }
+  }
+
+  public static void recreateMarkerFiles(final String commitInstantTime, HoodieTable table, JavaSparkContext jsc) throws HoodieRollbackException {
+    try {
+      Option<HoodieInstant> commitInstantOpt = Option.fromJavaOptional(table.getActiveTimeline().getCommitsTimeline().getInstants()
+          .filter(instant -> HoodieActiveTimeline.EQUALS.test(instant.getTimestamp(), commitInstantTime))
+          .findFirst());
+      if (commitInstantOpt.isPresent()) {
+        MarkerFiles markerFiles = new MarkerFiles(table, commitInstantTime);
+        markerFiles.quietDeleteMarkerDir();
+
+        List<HoodieRollbackStat> rollbackStats = new CopyOnWriteRollbackActionExecutor(jsc, table.getConfig(), table, "", commitInstantOpt.get(), false).mayBeRollbackAndGetStats(false);
+
+        for (HoodieRollbackStat rollbackStat : rollbackStats) {
+          for (FileStatus fileStatus : rollbackStat.getFilesToRollback()) {
+            String path = fileStatus.getPath().toString();
+            String dataFileName = path.substring(path.lastIndexOf("/") + 1);
+            markerFiles.create(rollbackStat.getPartitionPath(), dataFileName, path.endsWith(table.getBaseFileExtension()) ? IOType.CREATE : IOType.MERGE);
+          }
+          for (FileStatus fileStatus : rollbackStat.getCommandBlocksCount().keySet()) {

Review comment:
       I assume every entry in commandBlocks will be an APPEND. Correct me if I am wrong. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
               1L
           );
           return new Tuple2<>(rollbackRequest.getPartitionPath(),
-                  HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-                          .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+              HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+                  .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
         }
         default:
           throw new IllegalStateException("Unknown Rollback action " + rollbackRequest);
       }
-    }).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+    });
   }
 
 
-
   /**
    * Common method used for cleaning out base files under a partition path during rollback of a set of commits.
    */
-  private Map<FileStatus, Boolean> deleteCleanedFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
-                                                      String partitionPath, PathFilter filter) throws IOException {
+  private Map<FileStatus, Boolean> deleteBaseAndLogFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
+      String commit, String partitionPath, boolean doDelete) throws IOException {
     LOG.info("Cleaning path " + partitionPath);
+    String basefileExtension = metaClient.getTableConfig().getBaseFileFormat().getFileExtension();
+    SerializablePathFilter filter = (path) -> {
+      if (path.toString().endsWith(basefileExtension)) {
+        String fileCommitTime = FSUtils.getCommitTime(path.getName());
+        return commit.equals(fileCommitTime);
+      } else if (FSUtils.isLogFile(path)) {
+        // Since the baseCommitTime is the only commit for new log files, it's okay here
+        String fileCommitTime = FSUtils.getBaseCommitTimeFromLogPath(path);
+        return commit.equals(fileCommitTime);
+      }
+      return false;
+    };
+
     final Map<FileStatus, Boolean> results = new HashMap<>();
     FileSystem fs = metaClient.getFs();
     FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
     for (FileStatus file : toBeDeleted) {
-      boolean success = fs.delete(file.getPath(), false);
-      results.put(file, success);
-      LOG.info("Delete file " + file.getPath() + "\t" + success);
+      if(doDelete) {
+        boolean success = fs.delete(file.getPath(), false);
+        results.put(file, success);
+        LOG.info("Delete file " + file.getPath() + "\t" + success);
+      } else{
+        results.put(file, true);

Review comment:
       incase of just collecting stats, all files are added to success list.

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -68,34 +69,38 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
    * Performs all rollback actions that we have collected in parallel.
    */
   public List<HoodieRollbackStat> performRollback(JavaSparkContext jsc, HoodieInstant instantToRollback, List<ListingBasedRollbackRequest> rollbackRequests) {
-    SerializablePathFilter filter = (path) -> {

Review comment:
       this filter was used only within one method and hence moved it within the resp method.

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
##########
@@ -100,8 +108,13 @@ public CopyOnWriteRollbackActionExecutor(JavaSparkContext jsc,
   }
 
   @Override
-  protected List<HoodieRollbackStat> executeRollbackUsingFileListing(HoodieInstant instantToRollback) {
+  protected List<HoodieRollbackStat> executeRollbackUsingFileListing(HoodieInstant instantToRollback, boolean doDelete) {
     List<ListingBasedRollbackRequest> rollbackRequests = generateRollbackRequestsByListing();
-    return new ListingBasedRollbackHelper(table.getMetaClient(), config).performRollback(jsc, instantToRollback, rollbackRequests);
+    ListingBasedRollbackHelper listingBasedRollbackHelper = new ListingBasedRollbackHelper(table.getMetaClient(), config);
+    if(doDelete) {
+      return listingBasedRollbackHelper.performRollback(jsc, instantToRollback, rollbackRequests);

Review comment:
       disintegrated ListingBasedRollbackHelper into two apis, performRollback and collectRollbackStats where first calls into 2nd. Incase of actual rollback, we call performRollback and incase of collecting stats for upgrade, we call into collectRollbackStats. I am repurposing HoodieRollbackStat to hold the info on file path to be rolledback. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
               1L
           );
           return new Tuple2<>(rollbackRequest.getPartitionPath(),
-                  HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-                          .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+              HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+                  .withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
         }
         default:
           throw new IllegalStateException("Unknown Rollback action " + rollbackRequest);
       }
-    }).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+    });
   }
 
 
-
   /**
    * Common method used for cleaning out base files under a partition path during rollback of a set of commits.
    */
-  private Map<FileStatus, Boolean> deleteCleanedFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,
-                                                      String partitionPath, PathFilter filter) throws IOException {
+  private Map<FileStatus, Boolean> deleteBaseAndLogFiles(HoodieTableMetaClient metaClient, HoodieWriteConfig config,

Review comment:
       can you help me understand how to differentiate between CREATE and MERGE in these code blocks. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
##########
@@ -80,10 +82,16 @@ public CopyOnWriteRollbackActionExecutor(JavaSparkContext jsc,
     if (!resolvedInstant.isRequested()) {
       // delete all the data files for this commit
       LOG.info("Clean out all base files generated for commit: " + resolvedInstant);
-      stats = getRollbackStrategy().execute(resolvedInstant);
+      if(doDelete) {
+        stats = getRollbackStrategy().execute(resolvedInstant);
+      } else{
+        stats = executeRollbackUsingFileListing(resolvedInstant, false);

Review comment:
       and if doDelete is false, we call into executeRollbackUsingFileListing directly.

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java
##########
@@ -159,24 +161,32 @@ private void rollBackIndex() {
     LOG.info("Index rolled back for commits " + instantToRollback);
   }
 
-  public List<HoodieRollbackStat> doRollbackAndGetStats() {
-    final String instantTimeToRollback = instantToRollback.getTimestamp();
-    final boolean isPendingCompaction = Objects.equals(HoodieTimeline.COMPACTION_ACTION, instantToRollback.getAction())
-        && !instantToRollback.isCompleted();
-    validateSavepointRollbacks();
-    if (!isPendingCompaction) {
-      validateRollbackCommitSequence();
-    }
-
-    try {
-      List<HoodieRollbackStat> stats = executeRollback();
-      LOG.info("Rolled back inflight instant " + instantTimeToRollback);
+  public List<HoodieRollbackStat> mayBeRollbackAndGetStats(boolean doDelete) {
+    if(doDelete) {
+      final String instantTimeToRollback = instantToRollback.getTimestamp();
+      final boolean isPendingCompaction = Objects.equals(HoodieTimeline.COMPACTION_ACTION, instantToRollback.getAction())

Review comment:
       @vinothchandar : forgot to remind you yesterday when we discussed to move the collectStats method to a separate class and call directly for upgrade. These validations steps (validateSavepointRollbacks, validateRollbackCommitSequence) might be required as well right ? So, few bits and pieces in this class is required in upgrade step as well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458509862



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
         : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+    if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+      String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+      if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+        return HoodieTableVersion.ZERO_SIX_ZERO;
+      }
+    }
+    return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+    // TODO: fetch current version dynamically

Review comment:
       @vinothchandar : sorry forgot to ask this question earlier. May I know how to fetch current hoodie version in use? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458731792



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
         : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+    if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+      String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+      if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+        return HoodieTableVersion.ZERO_SIX_ZERO;
+      }
+    }
+    return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+    // TODO: fetch current version dynamically

Review comment:
       let try to rephrase. Lets say a dataset was created in 0.6.0 and then user moves to hoodie version 0.6.1 and when he launches hoodie for first time, how in this code we will get to know that this is using version 0.6.1? hoodie.properties will be having 0.6.0 only right. 
   

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
         : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+    if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+      String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+      if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+        return HoodieTableVersion.ZERO_SIX_ZERO;
+      }
+    }
+    return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+    // TODO: fetch current version dynamically

Review comment:
       let me try to rephrase. Lets say a dataset was created in 0.6.0 and then user moves to hoodie version 0.6.1 and when he launches hoodie for first time, how in this code we will get to know that this is using version 0.6.1? hoodie.properties will be having 0.6.0 only right. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671083934


   Pushed some fixes to tests. I think the tests should be testing. lmk if you think otherwise. 
   lets iterate quickly and get this in. :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r465405658



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngradeUtil.java
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeUtil {

Review comment:
       @nsivabalan Can a user control the hoodie layout version manually from the HoodieWriteConfig. Say, choose the older timeline layout for 0.6.0 in which case there is no need to upgrade ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458411863



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -190,6 +192,7 @@ public HoodieMetrics getMetrics() {
    */
   protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
     HoodieTableMetaClient metaClient = createMetaClient(true);
+    mayBeUpradeOrDowngrade(metaClient);

Review comment:
       looks ok. Mostly sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-668535467


   sure. I was planning to do it this mrng. anyways. LMK if you want me to fix the commit msg or you gonna take care of it as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467588697



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       this is what I mentioned earlier that tests need quite a bit of fixing if you remove that guard :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467607963



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade

Review comment:
       thats a typo. let me fix the comments. Code does have the same terminology




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458729215



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
         : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+    if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+      String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+      if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+        return HoodieTableVersion.ZERO_SIX_ZERO;
+      }
+    }
+    return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+    // TODO: fetch current version dynamically

Review comment:
       By reading `hoodie.properties` we need to treat everything below 0.6.0 as V_PRE_0.6.0 . Cannot deduce the actual jars per se. 
   
   We can also make these version numbers 0,1,2 instead of PRE_0.6.0, 0.6.0, 0.7.0 and so on. ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467588637



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       this needs fixing. This will issue commit via client and since you have removed the guard to execute upgrade (even if marker based is disabled), upgrade would have already been executed. if I am not wrong, below command of 
   ```
   UpgradeDowngrade.run(metaClient, HoodieTableVersion.ONE, cfg, jsc, null); 
   ```
   is a no op since already the table version is upgraded.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671021840


   Local test 
   
   ```
   vmacs:HUDIDATA vs$ cat /tmp/hudi_trips_cow/.hoodie/hoodie.properties
   #Properties saved on Sun Aug 09 01:02:02 PDT 2020
   #Sun Aug 09 01:02:02 PDT 2020
   hoodie.table.name=hudi_trips_cow
   hoodie.archivelog.folder=archived
   hoodie.table.type=COPY_ON_WRITE
   hoodie.timeline.layout.version=1
   vmacs:HUDIDATA vs$ cat /tmp/hudi_trips_cow/.hoodie/hoodie.properties
   #Properties saved on Sun Aug 09 01:04:53 PDT 2020
   #Sun Aug 09 01:04:53 PDT 2020
   hoodie.table.name=hudi_trips_cow
   hoodie.archivelog.folder=archived
   hoodie.table.type=COPY_ON_WRITE
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   
   vmacs:HUDIDATA vs$ ls -R  /tmp/hudi_trips_cow/.hoodie/
   20200809010202.commit            20200809010452.commit.requested  archived
   20200809010202.commit.requested  20200809010452.inflight          hoodie.properties
   20200809010202.inflight          20200809010453.rollback
   20200809010452.commit            20200809010453.rollback.inflight
   
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-668816966


   @nsivabalan I already did that as well. Will review and push some changes to this PR 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r462460480



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java
##########
@@ -59,31 +61,31 @@
   protected final boolean useMarkerBasedStrategy;
 
   public BaseRollbackActionExecutor(JavaSparkContext jsc,
-                                    HoodieWriteConfig config,
-                                    HoodieTable<?> table,
-                                    String instantTime,
-                                    HoodieInstant instantToRollback,
-                                    boolean deleteInstants) {
+      HoodieWriteConfig config,

Review comment:
       no changes in this file. just formatting changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467619791



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       This is what I was trying to convey. As part of this call, (2 commits), already upgrade step would have been executed. But, we also reset hoodie table version and call upgrade explicitly after this. So, should be fine. Not sure if you are aware of this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r466991187



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -186,10 +188,14 @@ public HoodieMetrics getMetrics() {
    * Get HoodieTable and init {@link Timer.Context}.
    *
    * @param operationType write operation type
+   * @param instantTime current inflight instant time
    * @return HoodieTable
    */
-  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
+  protected HoodieTable getTableAndInitCtx(WriteOperationType operationType, String instantTime) {
     HoodieTableMetaClient metaClient = createMetaClient(true);
+    if (config.shouldRollbackUsingMarkers()) {

Review comment:
       this was my thinking behind this guard. If someone wishes to stay with list based rollback, why execute this upgrade this which specifically does some work to assist in marker based rollback which will never be used since marked based rollback is not going to be used at all. I am not very strong on this though. But tests need some fixes though as I rely on creating commits and marker files using client at first, by disabling marker based rollback. If we remove this guard, then tests need to manually create all data files and marker files. I am not saying that as a reason to keep this guard, just saying we have some extra work to be done. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/RollbackUtils.java
##########
@@ -63,4 +84,156 @@ static HoodieRollbackStat mergeRollbackStat(HoodieRollbackStat stat1, HoodieRoll
     return new HoodieRollbackStat(stat1.getPartitionPath(), successDeleteFiles, failedDeleteFiles, commandBlocksCount);
   }
 
+  /**

Review comment:
       yes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467613338



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -96,6 +97,8 @@ public HoodieTableConfig(FileSystem fs, String metaPath, String payloadClassName
       throw new HoodieIOException("Could not load Hoodie properties from " + propertyPath, e);
     }
     this.props = props;
+    ValidationUtils.checkArgument(props.containsKey(HOODIE_TABLE_TYPE_PROP_NAME) && props.containsKey(HOODIE_TABLE_NAME_PROP_NAME),

Review comment:
       ok, was confused as to how this related to this patch of upgrade/downgrade. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-671022017


   New tables have version written now 
   
   ```
   vmacs:HUDIDATA vs$ cat  /tmp/hudi_trips_cow_new/.hoodie/hoodie.properties
   #Properties saved on Sun Aug 09 01:08:18 PDT 2020
   #Sun Aug 09 01:08:18 PDT 2020
   hoodie.table.name=hudi_trips_cow
   hoodie.archivelog.folder=archived
   hoodie.table.type=COPY_ON_WRITE
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   vmacs:HUDIDATA vs$
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458412092



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java
##########
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngradeHelper {
+
+  public static final String HOODIE_ORIG_PROPERTY_FILE = "hoodie.properties.orig";
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie verion was upgraded to 0.6.0, there are some upgrade steps need
+   * to be executed before doing any writes.
+   * Similarly, if a dataset was created using 0.6.0 and then hoodie was downgraded, some downgrade steps need to be executed before proceeding w/ any writes.
+   * On a high level, these are the steps performed
+   * Step1 : Understand current hoodie version and table version from hoodie.properties file
+   * Step2 : Fix any residues from previous upgrade/downgrade
+   * Step3 : Check for version upgrade/downgrade.
+   * Step4 : If upgrade/downgrade is required, perform the steps required for the same.
+   * Step5 : Copy hoodie.properties -> hoodie.properties.orig
+   * Step6 : Update hoodie.properties file with current table version
+   * Step7 : Delete hoodie.properties.orig
+   * </p>
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @throws IOException
+   */
+  public static void doUpgradeOrDowngrade(HoodieTableMetaClient metaClient) throws IOException {
+    // Fetch version from property file and current version
+    HoodieTableVersion versionFromPropertyFile = metaClient.getTableConfig().getHoodieTableVersionFromPropertyFile();
+    HoodieTableVersion currentVersion = metaClient.getTableConfig().getCurrentHoodieTableVersion();
+
+    Path metaPath = new Path(metaClient.getMetaPath());
+    Path originalHoodiePropertyPath = getOrigHoodiePropertyFilePath(metaPath.toString());
+
+    boolean updateTableVersionInPropertyFile = false;
+
+    if (metaClient.getFs().exists(originalHoodiePropertyPath)) {
+      // if hoodie.properties.orig exists, rename to hoodie.properties and skip upgrade/downgrade step
+      metaClient.getFs().rename(originalHoodiePropertyPath, getHoodiePropertyFilePath(metaPath.toString()));
+      updateTableVersionInPropertyFile = true;
+    } else {
+      // upgrade or downgrade if there is a version mismatch
+      if (versionFromPropertyFile != currentVersion) {
+        updateTableVersionInPropertyFile = true;
+        if (versionFromPropertyFile == HoodieTableVersion.PRE_ZERO_SIZE_ZERO && currentVersion == HoodieTableVersion.ZERO_SIX_ZERO) {
+          upgradeFromOlderToZeroSixZero();
+        } else if (versionFromPropertyFile == HoodieTableVersion.ZERO_SIX_ZERO && currentVersion == HoodieTableVersion.PRE_ZERO_SIZE_ZERO) {
+          downgradeFromZeroSixZeroToPreZeroSixZero();
+        } else {
+          throw new HoodieException("Illegal state wrt table versions. Version from proerpty file " + versionFromPropertyFile + " and current version " + currentVersion);
+        }
+      }
+    }
+
+    /**
+     * If table version needs to be updated in hoodie.properties file.
+     * Step1: Copy hoodie.properties to hoodie.properties.orig
+     * Step2: add table.version to hoodie.properties
+     * Step3: delete hoodie.properties.orig
+     */
+    if (updateTableVersionInPropertyFile) {
+      updateTableVersionInMetaPath(metaClient);

Review comment:
       yes. we need to reload if upgrade was done. to be safe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467609575



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -96,6 +97,8 @@ public HoodieTableConfig(FileSystem fs, String metaPath, String payloadClassName
       throw new HoodieIOException("Could not load Hoodie properties from " + propertyPath, e);
     }
     this.props = props;
+    ValidationUtils.checkArgument(props.containsKey(HOODIE_TABLE_TYPE_PROP_NAME) && props.containsKey(HOODIE_TABLE_NAME_PROP_NAME),

Review comment:
       >sorry, I don't get why we need this here. If properties contain table type and table name, why bail out?
   
   we bail out if these props are not present. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467588637



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       this needs fixing. This will issue commit files via client and since you have removed the guard to execute upgrade (even if marker based is disabled), upgrade would have already been executed. if I am not wrong, below command of 
   ```
   UpgradeDowngrade.run(metaClient, HoodieTableVersion.ONE, cfg, jsc, null); 
   ```
   is a no op since already the table version is upgraded.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#issuecomment-668403215


   @nsivabalan I rebased this against master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467613893



##########
File path: hudi-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java
##########
@@ -0,0 +1,405 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.model.FileSlice;
+import org.apache.hudi.common.model.HoodieFileGroup;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.MarkerFiles;
+import org.apache.hudi.testutils.Assertions;
+import org.apache.hudi.testutils.HoodieClientTestBase;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.table.HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH;
+import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit tests {@link UpgradeDowngrade}.
+ */
+public class TestUpgradeDowngrade extends HoodieClientTestBase {
+
+  private static final String TEST_NAME_WITH_PARAMS = "[{index}] Test with induceResiduesFromPrevUpgrade={0}, deletePartialMarkerFiles={1} and TableType = {2}";
+
+  public static Stream<Arguments> configParams() {
+    Object[][] data = new Object[][] {
+            {true, HoodieTableType.COPY_ON_WRITE}, {false, HoodieTableType.COPY_ON_WRITE},
+            {true, HoodieTableType.MERGE_ON_READ}, {false, HoodieTableType.MERGE_ON_READ}
+    };
+    return Stream.of(data).map(Arguments::of);
+  }
+
+  @Test
+  public void testLeftOverUpdatedPropFileCleanup() throws IOException {
+    testUpgradeInternal(true, true, HoodieTableType.MERGE_ON_READ);
+  }
+
+  @ParameterizedTest(name = TEST_NAME_WITH_PARAMS)
+  @MethodSource("configParams")
+  public void testUpgrade(boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    testUpgradeInternal(false, deletePartialMarkerFiles, tableType);
+  }
+
+  public void testUpgradeInternal(boolean induceResiduesFromPrevUpgrade, boolean deletePartialMarkerFiles, HoodieTableType tableType) throws IOException {
+    // init config, table and client.
+    Map<String, String> params = new HashMap<>();
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      params.put(HOODIE_TABLE_TYPE_PROP_NAME, HoodieTableType.MERGE_ON_READ.name());
+      metaClient = HoodieTestUtils.init(hadoopConf, basePath, HoodieTableType.MERGE_ON_READ);
+    }
+    HoodieWriteConfig cfg = getConfigBuilder().withAutoCommit(false).withRollbackUsingMarkers(false).withProps(params).build();
+    HoodieWriteClient client = getHoodieWriteClient(cfg);
+
+    // prepare data. Make 2 commits, in which 2nd is not committed.
+    List<FileSlice> firstPartitionCommit2FileSlices = new ArrayList<>();
+    List<FileSlice> secondPartitionCommit2FileSlices = new ArrayList<>();
+    Pair<List<HoodieRecord>, List<HoodieRecord>> inputRecords = twoUpsertCommitDataWithTwoPartitions(firstPartitionCommit2FileSlices, secondPartitionCommit2FileSlices, cfg, client, false);

Review comment:
       nvm. I overlooked this line 
   ```
   metaClient.getTableConfig().setTableVersion(HoodieTableVersion.ZERO);
   ```
   Tests are fine. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r467603220



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = "hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  private HoodieWriteConfig config;
+  private JavaSparkContext jsc;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version if need be.
+   * <p>
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in all write paths.
+   *
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need to be executed before proceeding w/ any writes.
+   *
+   * On a high level, these are the steps performed
+   *
+   * Step1 : Understand current hoodie table version and table version from hoodie.properties file
+   * Step2 : Delete any left over .upgraded from previous upgrade/downgrade

Review comment:
       updated/upgraded(file is named as HOODIE_UPDATED_PROPERTY_FILE). Lets use same terminology everywhere. Ignore addressing renaming/java docs/refactoring comments for now. Let's get the patch in for now. But leaving comments so that I can take it up after 0.6.0 release. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org