You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/23 01:25:54 UTC

[GitHub] [incubator-hudi] satishkotha opened a new pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

satishkotha opened a new pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274
 
 
   
   ## What is the purpose of the pull request
   
   Add command to show archived commits. This is useful for debugging historical timeline. 
   
   ## Brief change log
   - There is already 'show archived commits' command, but the output is not at all useful. Refactor common timeline methods to HoodieDefaultTimeline to reuse between Active and Archived timeline to make archived commits look similar to active commits
   - Modify tests to use archived timeline instead of reading files explicitly
   - Note that there is no pagination/lazy loading support and if a really long time range is specified, this can result in OOM. To keep the diff small, I'm sending this one first. After I get feedback, i can send another diff to support lazy loading (or pagination). 
   
   ## Verify this pull request
   
   
   This pull request is already covered by existing tests, such as:
     - TestHoodieCommitArchiveLog.java
     - Manually verified the change by running CLI locally. Example output:
   `hoodie:$dataset->commits show archived
   
    __________________________________________________________________________________________________________________________________________________________________________
       | CommitTime    | Total Bytes Written| Total Files Added| Total Files Updated| Total Partitions Written| Total Records Written| Total Update Records Written| Total Errors|
       |=========================================================================================================================================================================|
       | 20200113224332| 45.7 GB            | 0                | 86                 | 4                       | 97806475             | 107538                      | 0           |
       | 20200113223441| 840.5 MB           | 0                | 1618               | 258                     | 1214615              | 15512                       | 0           |
       | 20200113223441| 840.5 MB           | 0                | 1618               | 258                     | 1214615              | 15512                       | 0           |`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372151885
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -289,11 +288,8 @@ public ConsistencyGuardConfig getConsistencyGuardConfig() {
    *
    * @return Active commit timeline
    */
-  public synchronized HoodieArchivedTimeline getArchivedTimeline() {
-    if (archivedTimeline == null) {
-      archivedTimeline = new HoodieArchivedTimeline(this);
-    }
-    return archivedTimeline;
+  public synchronized HoodieArchivedTimeline getArchivedTimeline(String startTs, String endTs) {
 
 Review comment:
   After talking in person, changed behavior to create single instance of "HoodieArchivedTimeline" and load metadata for all  commits.  
   
   Note that this means we read all archived files first. Then do a second pass for details of commits in specific time range. This increases overall time taken by first command. In the example dataset, it took ~20 minutes for initial metadata to load. 
   Then subsequent commands are few seconds each. 
   
   With previous approach we only do one pass on files. All commands are few seconds each. 
   
   So I think we need to improve metadata to reduce time taken by first step with new approach.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-582041849
 
 
   @satishkotha one conflict, please rebase and then I can merge this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash merged pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash merged pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371422362
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -49,34 +60,76 @@
  * This class can be serialized and de-serialized and on de-serialization the FileSystem is re-initialized.
  */
 public class HoodieArchivedTimeline extends HoodieDefaultTimeline {
+  private static final Pattern ARCHIVE_FILE_PATTERN =
+          Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");
 
   private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits";
 
 Review comment:
   I'm referring to this variable `HOODIE_COMMIT_ARCHIVE_LOG_FILE` that stores just the string `commits`, regex is a different variable ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370493642
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +110,33 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+    if (startTs == null || startTs.isEmpty()) {
 
 Review comment:
   Can we use StringUtils.isNullOrEmpty (in all other places as well) ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372577201
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +111,38 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
-
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
+    if (StringUtils.isNullOrEmpty(startTs)) {
 
 Review comment:
   CLIOption attribute can only be assigned a static value. i think default static value is not that friendly. I chose to pick '10 days before today' which is likely to be commonly used. Let me know if you prefer default constant date instead

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370894209
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -49,34 +60,76 @@
  * This class can be serialized and de-serialized and on de-serialization the FileSystem is re-initialized.
  */
 public class HoodieArchivedTimeline extends HoodieDefaultTimeline {
+  private static final Pattern ARCHIVE_FILE_PATTERN =
+          Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");
 
   private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits";
 
 Review comment:
   regex captures everything [0-9]* at the end gives the suffix and is used for sorting files by 'version'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372601793
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +111,38 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
-
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
+    if (StringUtils.isNullOrEmpty(startTs)) {
 
 Review comment:
   okay, that makes sense

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370862971
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
 ##########
 @@ -230,6 +230,14 @@ static boolean compareTimestamps(String commit1, String commit2, BiPredicate<Str
     return predicateToApply.test(commit1, commit2);
   }
 
+  /**
+   * Return true if specified timestamp is in range (startTs, endTs].
+   */
+  static boolean isInRange(String timestamp, String startTs, String endTs) {
 
 Review comment:
   The HoodieDefaultTimeline has implementation -> `public HoodieDefaultTimeline findInstantsInRange(String startTs, String endTs)` already, can we try to reuse that ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375016293
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##########
 @@ -290,7 +291,7 @@ public long getTotalRollbackBlocks() {
     return totalRollbackBlocks;
   }
 
-  public void setTotalRollbackBlocks(Long totalRollbackBlocks) {
 
 Review comment:
   same here, we can address this as part of some other refactoring diff

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370893348
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +110,33 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+    if (startTs == null || startTs.isEmpty()) {
 
 Review comment:
   sounds good. I'm sending an update

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372574591
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
 ##########
 @@ -143,6 +143,83 @@ public HoodieTimeline filter(Predicate<HoodieInstant> filter) {
     return new HoodieDefaultTimeline(instants.stream().filter(filter), details);
   }
 
+  /**
+   * Get all instants (commits, delta commits) that produce new data, in the active timeline.
+   */
+  public HoodieTimeline getCommitsTimeline() {
 
 Review comment:
   @vinothchandar can you ack that this is okay to move to this class ? This way the ArchivedTimeline can also use these methods (and is aligned with our thoughts on having the same operations on archived and active timeline..)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375016233
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##########
 @@ -135,6 +135,7 @@
   /**
    * Total number of rollback blocks seen in a compaction operation.
    */
+  @Nullable
 
 Review comment:
   Please avoid this change as part of this diff

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370863148
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -49,34 +60,76 @@
  * This class can be serialized and de-serialized and on de-serialization the FileSystem is re-initialized.
  */
 public class HoodieArchivedTimeline extends HoodieDefaultTimeline {
+  private static final Pattern ARCHIVE_FILE_PATTERN =
+          Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");
 
   private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits";
 
 Review comment:
   This is just the prefix right ? May be rename is HOODIE_COMMIT_ARCHIVE_LOG_FILE_NAME_PREFIX ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370894046
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
 ##########
 @@ -230,6 +230,14 @@ static boolean compareTimestamps(String commit1, String commit2, BiPredicate<Str
     return predicateToApply.test(commit1, commit2);
   }
 
+  /**
+   * Return true if specified timestamp is in range (startTs, endTs].
+   */
+  static boolean isInRange(String timestamp, String startTs, String endTs) {
 
 Review comment:
   HoodieDefaultTimeline has changes to use this method. I refactored to make this part reusable
   
   `  @Override
     public HoodieDefaultTimeline findInstantsInRange(String startTs, String endTs) {
       return new HoodieDefaultTimeline(
           instants.stream().filter(s -> HoodieTimeline.isInRange(s.getTimestamp(), startTs, endTs)),
           details);
     }
   ` 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375075546
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##########
 @@ -135,6 +135,7 @@
   /**
    * Total number of rollback blocks seen in a compaction operation.
    */
+  @Nullable
 
 Review comment:
   Discussed offline. Without this we are not able to read certain archived commits

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370893358
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java
 ##########
 @@ -38,6 +38,7 @@
   public static final String HOODIE_PARTITION_METAFILE = ".hoodie_partition_metadata";
   public static final String PARTITION_DEPTH_KEY = "partitionDepth";
   public static final String COMMIT_TIME_KEY = "commitTime";
+  public static final String ACTION_TYPE_KEY = "actionType";
 
 Review comment:
   sounds good. I'm sending an update

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372573105
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -96,7 +101,16 @@ private void readObject(java.io.ObjectInputStream in) throws IOException, ClassN
   }
 
   public static Path getArchiveLogPath(String archiveFolder) {
-    return new Path(archiveFolder, HOODIE_COMMIT_ARCHIVE_LOG_FILE);
+    return new Path(archiveFolder, HOODIE_COMMIT_ARCHIVE_LOG_FILE_PREFIX);
+  }
+
+  public void loadInstantDetailsInMemory(String startTs, String endTs) {
+    loadInstants(startTs, endTs);
+  }
+
+  public void removeInstantDetailsFromMemory(String startTs, String endTs) {
 
 Review comment:
   nit : s/removeInstantDetailsFromMemory/clearInstantDetailsFromMemory

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r374439658
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
 ##########
 @@ -143,6 +143,83 @@ public HoodieTimeline filter(Predicate<HoodieInstant> filter) {
     return new HoodieDefaultTimeline(instants.stream().filter(filter), details);
   }
 
+  /**
+   * Get all instants (commits, delta commits) that produce new data, in the active timeline.
+   */
+  public HoodieTimeline getCommitsTimeline() {
 
 Review comment:
   @vinothchandar just pinging to resurface this since it might be lost in the large number of notifications :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372572403
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +111,38 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
-
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
+    if (StringUtils.isNullOrEmpty(startTs)) {
 
 Review comment:
   You can use a defaultValue during the cliOption itself so you don't have to use these checks ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371580262
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -289,11 +288,8 @@ public ConsistencyGuardConfig getConsistencyGuardConfig() {
    *
    * @return Active commit timeline
    */
-  public synchronized HoodieArchivedTimeline getArchivedTimeline() {
-    if (archivedTimeline == null) {
-      archivedTimeline = new HoodieArchivedTimeline(this);
-    }
-    return archivedTimeline;
+  public synchronized HoodieArchivedTimeline getArchivedTimeline(String startTs, String endTs) {
 
 Review comment:
   Isn't the instance variable loaded lazily only when getArchivedTimeline() is called in which case subsequent cases can use that instance variable. I think the concern here is that the only method we expose now is getArchivedTimeline(String startTs, String endTs) and caching the instance variable from that doesn't make sense as you are saying as well. I'm wondering if we should keep an instance variable and keep the same method as before getArchivedTimeline() and then the constructor below is also light. Finally, we expose a method on the archived timeline to filter(startTs, endTs). This way we don't have to create multiple objects for each window invocation. WDYT ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371422362
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -49,34 +60,76 @@
  * This class can be serialized and de-serialized and on de-serialization the FileSystem is re-initialized.
  */
 public class HoodieArchivedTimeline extends HoodieDefaultTimeline {
+  private static final Pattern ARCHIVE_FILE_PATTERN =
+          Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");
 
   private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits";
 
 Review comment:
   I'm referring to this variable `HOODIE_COMMIT_ARCHIVE_LOG_FILE` that stores just the string `commits`, how is that related to the regex ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-579935361
 
 
   Addressed comments and made some more improvements. Please take a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r372577365
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -96,7 +101,16 @@ private void readObject(java.io.ObjectInputStream in) throws IOException, ClassN
   }
 
   public static Path getArchiveLogPath(String archiveFolder) {
-    return new Path(archiveFolder, HOODIE_COMMIT_ARCHIVE_LOG_FILE);
+    return new Path(archiveFolder, HOODIE_COMMIT_ARCHIVE_LOG_FILE_PREFIX);
+  }
+
+  public void loadInstantDetailsInMemory(String startTs, String endTs) {
+    loadInstants(startTs, endTs);
+  }
+
+  public void removeInstantDetailsFromMemory(String startTs, String endTs) {
 
 Review comment:
   ok, renamed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370493663
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java
 ##########
 @@ -38,6 +38,7 @@
   public static final String HOODIE_PARTITION_METAFILE = ".hoodie_partition_metadata";
   public static final String PARTITION_DEPTH_KEY = "partitionDepth";
   public static final String COMMIT_TIME_KEY = "commitTime";
+  public static final String ACTION_TYPE_KEY = "actionType";
 
 Review comment:
   Can you explain why we need this change ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370862504
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java
 ##########
 @@ -38,6 +38,7 @@
   public static final String HOODIE_PARTITION_METAFILE = ".hoodie_partition_metadata";
   public static final String PARTITION_DEPTH_KEY = "partitionDepth";
   public static final String COMMIT_TIME_KEY = "commitTime";
+  public static final String ACTION_TYPE_KEY = "actionType";
 
 Review comment:
   Please move it to the ArchivedTimeline since it makes more senes there

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370818776
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java
 ##########
 @@ -38,6 +38,7 @@
   public static final String HOODIE_PARTITION_METAFILE = ".hoodie_partition_metadata";
   public static final String PARTITION_DEPTH_KEY = "partitionDepth";
   public static final String COMMIT_TIME_KEY = "commitTime";
+  public static final String ACTION_TYPE_KEY = "actionType";
 
 Review comment:
    I thought having all metadata constants in one place would make it simpler. This is used in reading archived commit. I can move the constant to ArchivedTimeline if you think thats a better place.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370893819
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -289,11 +288,8 @@ public ConsistencyGuardConfig getConsistencyGuardConfig() {
    *
    * @return Active commit timeline
    */
-  public synchronized HoodieArchivedTimeline getArchivedTimeline() {
-    if (archivedTimeline == null) {
-      archivedTimeline = new HoodieArchivedTimeline(this);
-    }
-    return archivedTimeline;
+  public synchronized HoodieArchivedTimeline getArchivedTimeline(String startTs, String endTs) {
 
 Review comment:
   archivedTimeline as instance variable does not make sense because we are creating archive timeline for small time window (depending on command line arguments). So we will create mutiple archivedTimelines when someone is paginating through archived commit files. null check prevents us from doing that. 
   
   In the long term, when we have lazy loading, we can consider creating one instance of ArchivedTimeline that stores all metadata. we can bring back instance variable at that time. Until we have that, this does not make sense.
   
    Let me know if you think there is a better way to organize this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371584059
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -289,11 +288,8 @@ public ConsistencyGuardConfig getConsistencyGuardConfig() {
    *
    * @return Active commit timeline
    */
-  public synchronized HoodieArchivedTimeline getArchivedTimeline() {
-    if (archivedTimeline == null) {
-      archivedTimeline = new HoodieArchivedTimeline(this);
-    }
-    return archivedTimeline;
+  public synchronized HoodieArchivedTimeline getArchivedTimeline(String startTs, String endTs) {
 
 Review comment:
   I initially wanted to do that. But HoodieDefaultTimeline(base class) keeps track of instants in private class variable. And we do all filtering operations on top of that. So, if we reuse same HoodieArchivedTimeline object, we will likely have to change structure of how instants are kept in memory for base class.  I can talk to you more in person tomorrow. But right now, i prefer creating multiple instances of archived timeline because its less error prone IMO.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r374532498
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
 ##########
 @@ -143,6 +143,83 @@ public HoodieTimeline filter(Predicate<HoodieInstant> filter) {
     return new HoodieDefaultTimeline(instants.stream().filter(filter), details);
   }
 
+  /**
+   * Get all instants (commits, delta commits) that produce new data, in the active timeline.
+   */
+  public HoodieTimeline getCommitsTimeline() {
 
 Review comment:
   @n3nash can't think of anything top of my mind.. Should be fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-582105870
 
 
   @n3nash Fixed conflict. Note that test has been moved from hudi-common to hudi-client because sequence file based writing for archive log is no longer supported. Please take a look after Travis completes building

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-577736949
 
 
   > @satishkotha can you please look into the failing build ?
   
   @n3nash Fixed now, please take a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hudi] zhangyue19921010 commented on pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on pull request #1274:
URL: https://github.com/apache/hudi/pull/1274#issuecomment-1006465368


   Hi guys, it seems that there 's a little problem with the regex pattern `  private static final Pattern ARCHIVE_FILE_PATTERN =
             Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");` 
         
   Just raise a PR https://github.com/apache/hudi/pull/4521 trying to fix it. Wish you're interested and help me review? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375075573
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##########
 @@ -290,7 +291,7 @@ public long getTotalRollbackBlocks() {
     return totalRollbackBlocks;
   }
 
-  public void setTotalRollbackBlocks(Long totalRollbackBlocks) {
 
 Review comment:
   Discussed offline. Without this we are not able to read certain archived commits

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371580428
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
 ##########
 @@ -230,6 +230,14 @@ static boolean compareTimestamps(String commit1, String commit2, BiPredicate<Str
     return predicateToApply.test(commit1, commit2);
   }
 
+  /**
+   * Return true if specified timestamp is in range (startTs, endTs].
+   */
+  static boolean isInRange(String timestamp, String startTs, String endTs) {
 
 Review comment:
   makes sense

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-577507036
 
 
   @satishkotha can you please look into the failing build ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370864693
 
 

 ##########
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
 ##########
 @@ -62,26 +110,33 @@ public String showCommits(
       throws IOException {
 
     HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
-    List<HoodieInstant> commits = timeline.getReverseOrderedInstants().collect(Collectors.toList());
-    List<Comparable[]> rows = new ArrayList<>();
-    for (HoodieInstant commit : commits) {
-      HoodieCommitMetadata commitMetadata =
-          HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(commit).get(), HoodieCommitMetadata.class);
-      rows.add(new Comparable[] {commit.getTimestamp(), commitMetadata.fetchTotalBytesWritten(),
-          commitMetadata.fetchTotalFilesInsert(), commitMetadata.fetchTotalFilesUpdated(),
-          commitMetadata.fetchTotalPartitionsWritten(), commitMetadata.fetchTotalRecordsWritten(),
-          commitMetadata.fetchTotalUpdateRecordsWritten(), commitMetadata.fetchTotalWriteErrors()});
-    }
+    return printCommits(activeTimeline, limit, sortByField, descending, headerOnly);
+  }
 
-    Map<String, Function<Object, String>> fieldNameToConverterMap = new HashMap<>();
-    fieldNameToConverterMap.put("Total Bytes Written", entry -> NumericUtils.humanReadableByteCount((Double.parseDouble(entry.toString()))));
+  @CliCommand(value = "commits show archived", help = "Show the archived commits")
+  public String showArchivedCommits(
+          @CliOption(key = {"startTs"},  mandatory = false, help = "start time for commits, default: now - 10 days")
+          String startTs,
+          @CliOption(key = {"endTs"},  mandatory = false, help = "end time for commits, default: now - 1 day")
+          String endTs,
+          @CliOption(key = {"limit"}, mandatory = false, help = "Limit commits", unspecifiedDefaultValue = "-1")
+          final Integer limit,
+          @CliOption(key = {"sortBy"}, help = "Sorting Field", unspecifiedDefaultValue = "")
+          final String sortByField,
+          @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = "false")
+          final boolean descending,
+          @CliOption(key = {"headeronly"}, help = "Print Header Only", unspecifiedDefaultValue = "false")
+          final boolean headerOnly)
+          throws IOException {
 
-    TableHeader header = new TableHeader().addTableHeaderField("CommitTime").addTableHeaderField("Total Bytes Written")
-        .addTableHeaderField("Total Files Added").addTableHeaderField("Total Files Updated")
-        .addTableHeaderField("Total Partitions Written").addTableHeaderField("Total Records Written")
-        .addTableHeaderField("Total Update Records Written").addTableHeaderField("Total Errors");
-    return HoodiePrintHelper.print(header, fieldNameToConverterMap, sortByField, descending, limit, headerOnly, rows);
+    if (startTs == null || startTs.isEmpty()) {
 
 Review comment:
   @satishkotha

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r371424644
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -49,34 +60,76 @@
  * This class can be serialized and de-serialized and on de-serialization the FileSystem is re-initialized.
  */
 public class HoodieArchivedTimeline extends HoodieDefaultTimeline {
+  private static final Pattern ARCHIVE_FILE_PATTERN =
+          Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");
 
   private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits";
 
 Review comment:
   oh, my bad. I didnt change that line, so i thought you were referring to the line i changed. I can change HOODIE_COMMIT_ARCHIVE_LOG_FILE to include its prefix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r370862726
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -289,11 +288,8 @@ public ConsistencyGuardConfig getConsistencyGuardConfig() {
    *
    * @return Active commit timeline
    */
-  public synchronized HoodieArchivedTimeline getArchivedTimeline() {
-    if (archivedTimeline == null) {
-      archivedTimeline = new HoodieArchivedTimeline(this);
-    }
-    return archivedTimeline;
+  public synchronized HoodieArchivedTimeline getArchivedTimeline(String startTs, String endTs) {
 
 Review comment:
   Any reason to remove `if (archivedTimeline == null)` check ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services