You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:40:08 UTC

[GitHub] [incubator-gobblin] ZihanLi58 opened a new pull request #3071: [GOBBLIN-1223] Change the criteria for re-compaction, limit the time for re-compaction

ZihanLi58 opened a new pull request #3071:
URL: https://github.com/apache/incubator-gobblin/pull/3071


   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1223
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if applicable):
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   Unit test
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] asfgit closed pull request #3071: [GOBBLIN-1223] Change the criteria for re-compaction, limit the time for re-compaction

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3071:
URL: https://github.com/apache/incubator-gobblin/pull/3071


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #3071: [GOBBLIN-1223] Change the criteria for re-compaction, limit the time for re-compaction

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3071:
URL: https://github.com/apache/incubator-gobblin/pull/3071#discussion_r462667342



##########
File path: gobblin-compaction/src/test/java/org/apache/gobblin/compaction/mapreduce/AvroCompactionTaskTest.java
##########
@@ -17,6 +17,7 @@
 
 package org.apache.gobblin.compaction.mapreduce;
 
+import com.sun.corba.se.spi.orbutil.fsm.InputImpl;

Review comment:
       Wong import ?

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/InputRecordCountHelper.java
##########
@@ -106,7 +107,8 @@ public State loadState (Path dir) throws IOException {
     return loadState(this.fs, dir);
   }
 
-  private static State loadState (FileSystem fs, Path dir) throws IOException {
+  @VisibleForTesting
+  public static State loadState (FileSystem fs, Path dir) throws IOException {

Review comment:
       blank after method name. Try using auto-formatter from IDE?

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/InputRecordCountHelper.java
##########
@@ -123,7 +125,8 @@ public void saveState (Path dir, State state) throws IOException {
     saveState(this.fs, dir, state);
   }
 
-  private static void saveState (FileSystem fs, Path dir, State state) throws IOException {
+  @VisibleForTesting

Review comment:
       let's do package static instead of public static if visible for testing purpose?

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/CompactionTimeRangeVerifier.java
##########
@@ -73,10 +75,28 @@ public Result verify (FileSystemDataset dataset) {
       Period minTimeAgo = formatter.parsePeriod(minTimeAgoStr);
       latest = compactionStartTime.minus(minTimeAgo);
 
+      // get latest last run start time, we want to limit the duration between two compaction for the same dataset
+      if (state.contains(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION)) {
+        String minDurationStrList = this.state.getProp(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION);
+        String minDurationStr = getMachedLookbackTime(datasetName, minDurationStrList, TimeBasedSubDirDatasetsFinder.DEFAULT_MIN_RECOMPACTION_DURATION);
+        Period minDurationTime = formatter.parsePeriod(minDurationStr);
+        DateTime latestLastRunTime = compactionStartTime.minus(minDurationTime);
+        InputRecordCountHelper helper = new InputRecordCountHelper(state);
+        State compactState = helper.loadState(new Path(result.getDstAbsoluteDir()));
+        if (compactState.contains(CompactionSlaEventHelper.LAST_RUN_START_TIME)
+            && compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME) > latestLastRunTime.getMillis()) {
+          log.warn("Last compaction for {} is {}, not before {}", dataset.datasetRoot(), new DateTime(compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME), timeZone), latestLastRunTime);
+          return new Result(false, "Last compaction for " + dataset.datasetRoot() + " is not before" + latestLastRunTime);
+        }
+
+      }
+
       if (earliest.isBefore(folderTime) && latest.isAfter(folderTime)) {
         log.debug("{} falls in the user defined time range", dataset.datasetRoot());
         return new Result(true, "");
       }
+    } catch (RuntimeException e) {

Review comment:
       Why are we catching an unchecked exception and just throw it without doing anything here? 

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/CompactionTimeRangeVerifier.java
##########
@@ -73,10 +75,28 @@ public Result verify (FileSystemDataset dataset) {
       Period minTimeAgo = formatter.parsePeriod(minTimeAgoStr);
       latest = compactionStartTime.minus(minTimeAgo);
 
+      // get latest last run start time, we want to limit the duration between two compaction for the same dataset
+      if (state.contains(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION)) {
+        String minDurationStrList = this.state.getProp(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION);
+        String minDurationStr = getMachedLookbackTime(datasetName, minDurationStrList, TimeBasedSubDirDatasetsFinder.DEFAULT_MIN_RECOMPACTION_DURATION);
+        Period minDurationTime = formatter.parsePeriod(minDurationStr);
+        DateTime latestLastRunTime = compactionStartTime.minus(minDurationTime);

Review comment:
       rename it to something like `latestEligibleCompactTime` ? 

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/CompactionTimeRangeVerifier.java
##########
@@ -73,10 +75,28 @@ public Result verify (FileSystemDataset dataset) {
       Period minTimeAgo = formatter.parsePeriod(minTimeAgoStr);
       latest = compactionStartTime.minus(minTimeAgo);
 
+      // get latest last run start time, we want to limit the duration between two compaction for the same dataset
+      if (state.contains(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION)) {
+        String minDurationStrList = this.state.getProp(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION);
+        String minDurationStr = getMachedLookbackTime(datasetName, minDurationStrList, TimeBasedSubDirDatasetsFinder.DEFAULT_MIN_RECOMPACTION_DURATION);
+        Period minDurationTime = formatter.parsePeriod(minDurationStr);
+        DateTime latestLastRunTime = compactionStartTime.minus(minDurationTime);
+        InputRecordCountHelper helper = new InputRecordCountHelper(state);
+        State compactState = helper.loadState(new Path(result.getDstAbsoluteDir()));
+        if (compactState.contains(CompactionSlaEventHelper.LAST_RUN_START_TIME)
+            && compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME) > latestLastRunTime.getMillis()) {
+          log.warn("Last compaction for {} is {}, not before {}", dataset.datasetRoot(), new DateTime(compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME), timeZone), latestLastRunTime);
+          return new Result(false, "Last compaction for " + dataset.datasetRoot() + " is not before" + latestLastRunTime);
+        }
+

Review comment:
       Additional blank line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #3071: [GOBBLIN-1223] Change the criteria for re-compaction, limit the time for re-compaction

Posted by GitBox <gi...@apache.org>.
ZihanLi58 commented on a change in pull request #3071:
URL: https://github.com/apache/incubator-gobblin/pull/3071#discussion_r462690694



##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/InputRecordCountHelper.java
##########
@@ -123,7 +125,8 @@ public void saveState (Path dir, State state) throws IOException {
     saveState(this.fs, dir, state);
   }
 
-  private static void saveState (FileSystem fs, Path dir, State state) throws IOException {
+  @VisibleForTesting

Review comment:
       It's because test class and InputRecordCountHelper are not in the same package

##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/CompactionTimeRangeVerifier.java
##########
@@ -73,10 +75,28 @@ public Result verify (FileSystemDataset dataset) {
       Period minTimeAgo = formatter.parsePeriod(minTimeAgoStr);
       latest = compactionStartTime.minus(minTimeAgo);
 
+      // get latest last run start time, we want to limit the duration between two compaction for the same dataset
+      if (state.contains(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION)) {
+        String minDurationStrList = this.state.getProp(TimeBasedSubDirDatasetsFinder.MIN_RECOMPACTION_DURATION);
+        String minDurationStr = getMachedLookbackTime(datasetName, minDurationStrList, TimeBasedSubDirDatasetsFinder.DEFAULT_MIN_RECOMPACTION_DURATION);
+        Period minDurationTime = formatter.parsePeriod(minDurationStr);
+        DateTime latestLastRunTime = compactionStartTime.minus(minDurationTime);
+        InputRecordCountHelper helper = new InputRecordCountHelper(state);
+        State compactState = helper.loadState(new Path(result.getDstAbsoluteDir()));
+        if (compactState.contains(CompactionSlaEventHelper.LAST_RUN_START_TIME)
+            && compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME) > latestLastRunTime.getMillis()) {
+          log.warn("Last compaction for {} is {}, not before {}", dataset.datasetRoot(), new DateTime(compactState.getPropAsLong(CompactionSlaEventHelper.LAST_RUN_START_TIME), timeZone), latestLastRunTime);
+          return new Result(false, "Last compaction for " + dataset.datasetRoot() + " is not before" + latestLastRunTime);
+        }
+
+      }
+
       if (earliest.isBefore(folderTime) && latest.isAfter(folderTime)) {
         log.debug("{} falls in the user defined time range", dataset.datasetRoot());
         return new Result(true, "");
       }
+    } catch (RuntimeException e) {

Review comment:
       It's due to the findBugsMain, seems we catch exception and not throw it intentionally, I will change the rule to pass it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 commented on pull request #3071: [GOBBLIN-1223] Change the criteria for re-compaction, limit the time for re-compaction

Posted by GitBox <gi...@apache.org>.
ZihanLi58 commented on pull request #3071:
URL: https://github.com/apache/incubator-gobblin/pull/3071#issuecomment-665985806


   @autumnust Can you help take a look at this when you have time? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org