You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/08/13 20:59:51 UTC

[GitHub] [incubator-gobblin] zxcware opened a new pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

zxcware opened a new pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078


   … small fixes
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. 
       - https://issues.apache.org/jira/browse/GOBBLIN-1233
   
   
   ### Description
   - [x] Here are some details about my PR:
    - `WhitelistBlacklist` today is case agnostic. The PR adds an option to make it case sensitive
    - Fix Bug: GOBBLIN-831. On zero work unit, `KafkaWorkUnitPacker` will trigger NPE
    - Fix Bug: `DatasetCleanerTask` doesn't set working state on task failure or succeess
    - Improvement: Allow `AsyncHttpWriter` to retry on any transmission exception besides IOException
   
   ### Tests
   - [x] My PR adds the following unit tests:
    - `WhitelistBlacklistTest.testCaseAware()` TBD.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
zxcware commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470789301



##########
File path: gobblin-data-management/src/main/java/org/apache/gobblin/runtime/retention/DatasetCleanerTask.java
##########
@@ -47,8 +51,15 @@ public void run() {
       DatasetCleaner datasetCleaner = new DatasetCleaner(FileSystem.get(new Configuration()),
           this.taskContext.getTaskState().getProperties());
       datasetCleaner.clean();
+      this.workingState = WorkUnitState.WorkingState.SUCCESSFUL;
     } catch (IOException e) {
+      this.workingState = WorkUnitState.WorkingState.FAILED;
       throw new RuntimeException(e);
     }
   }
+
+  @Override
+  public void commit() {
+    log.info("task {} commits with state {}", this.taskContext.getTaskState().getTaskId(), this.workingState);

Review comment:
       regardless where it runs, a task will have a task id, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
zxcware commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470765932



##########
File path: gobblin-core/src/main/java/org/apache/gobblin/writer/partitioner/TimeBasedWriterPartitioner.java
##########
@@ -74,6 +76,7 @@
   private final String writerPartitionSuffix;
   private final DatePartitionType granularity;
   private final DateTimeZone timeZone;
+  @Getter

Review comment:
       Yes. We have a row check policy that requires time unit info from partitioner to compare record time with current time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
ZihanLi58 commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470772296



##########
File path: gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriter.java
##########
@@ -83,7 +83,7 @@ protected void dispatch(Queue<BufferedRecord<D>> buffer) throws DispatchExceptio
     while (attempt < maxAttempts) {
       try {
           response = httpClient.sendRequest(rawRequest);
-      } catch (IOException e) {
+      } catch (Exception e) {

Review comment:
       Why we want change IOException to a more general Exception here?

##########
File path: gobblin-data-management/src/main/java/org/apache/gobblin/runtime/retention/DatasetCleanerTask.java
##########
@@ -47,8 +51,15 @@ public void run() {
       DatasetCleaner datasetCleaner = new DatasetCleaner(FileSystem.get(new Configuration()),
           this.taskContext.getTaskState().getProperties());
       datasetCleaner.clean();
+      this.workingState = WorkUnitState.WorkingState.SUCCESSFUL;
     } catch (IOException e) {
+      this.workingState = WorkUnitState.WorkingState.FAILED;
       throw new RuntimeException(e);
     }
   }
+
+  @Override
+  public void commit() {
+    log.info("task {} commits with state {}", this.taskContext.getTaskState().getTaskId(), this.workingState);

Review comment:
       Just curious, if this task runs in mapper or driver? I think most gobblin retention job runs on driver, if that's the case, what's the taskID here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] asfgit closed pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
zxcware commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470788527



##########
File path: gobblin-utility/src/main/java/org/apache/gobblin/data/management/copy/hive/WhitelistBlacklist.java
##########
@@ -53,23 +53,36 @@
 
   public static final String WHITELIST = "whitelist";
   public static final String BLACKLIST = "blacklist";
+  public static final String IGNORE_CASE = "whitelistBlacklist.ignoreCase";

Review comment:
       fixed

##########
File path: gobblin-utility/src/main/java/org/apache/gobblin/data/management/copy/hive/WhitelistBlacklist.java
##########
@@ -53,23 +53,36 @@
 
   public static final String WHITELIST = "whitelist";
   public static final String BLACKLIST = "blacklist";
+  public static final String IGNORE_CASE = "whitelistBlacklist.ignoreCase";
 
   private static final Pattern ALL_TABLES = Pattern.compile(".*");
 
   private final SetMultimap<Pattern, Pattern> whitelistMultimap;
   private final SetMultimap<Pattern, Pattern> blacklistMultimap;
+  private final boolean ignoreCase;
 
   public WhitelistBlacklist(Config config) throws IOException {
-    this(config.hasPath(WHITELIST) ? config.getString(WHITELIST).toLowerCase() : "",
-        config.hasPath(BLACKLIST) ? config.getString(BLACKLIST).toLowerCase() : "");
+    this(config.hasPath(WHITELIST) ? config.getString(WHITELIST) : "",
+        config.hasPath(BLACKLIST) ? config.getString(BLACKLIST) : "",
+        !config.hasPath(IGNORE_CASE) || config.getBoolean(IGNORE_CASE));
   }
 
   public WhitelistBlacklist(String whitelist, String blacklist) throws IOException {
+    this(whitelist, blacklist, true);
+  }
+
+  public WhitelistBlacklist(String whitelist, String blacklist, boolean ignoreCase) throws IOException {
     this.whitelistMultimap = HashMultimap.create();
     this.blacklistMultimap = HashMultimap.create();
 
-    populateMultimap(this.whitelistMultimap, whitelist.toLowerCase());
-    populateMultimap(this.blacklistMultimap, blacklist.toLowerCase());
+    this.ignoreCase = ignoreCase;

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470398611



##########
File path: gobblin-utility/src/main/java/org/apache/gobblin/data/management/copy/hive/WhitelistBlacklist.java
##########
@@ -53,23 +53,36 @@
 
   public static final String WHITELIST = "whitelist";
   public static final String BLACKLIST = "blacklist";
+  public static final String IGNORE_CASE = "whitelistBlacklist.ignoreCase";

Review comment:
       Please update the javadoc explaining IGNORE_CASE

##########
File path: gobblin-utility/src/main/java/org/apache/gobblin/data/management/copy/hive/WhitelistBlacklist.java
##########
@@ -53,23 +53,36 @@
 
   public static final String WHITELIST = "whitelist";
   public static final String BLACKLIST = "blacklist";
+  public static final String IGNORE_CASE = "whitelistBlacklist.ignoreCase";
 
   private static final Pattern ALL_TABLES = Pattern.compile(".*");
 
   private final SetMultimap<Pattern, Pattern> whitelistMultimap;
   private final SetMultimap<Pattern, Pattern> blacklistMultimap;
+  private final boolean ignoreCase;
 
   public WhitelistBlacklist(Config config) throws IOException {
-    this(config.hasPath(WHITELIST) ? config.getString(WHITELIST).toLowerCase() : "",
-        config.hasPath(BLACKLIST) ? config.getString(BLACKLIST).toLowerCase() : "");
+    this(config.hasPath(WHITELIST) ? config.getString(WHITELIST) : "",
+        config.hasPath(BLACKLIST) ? config.getString(BLACKLIST) : "",
+        !config.hasPath(IGNORE_CASE) || config.getBoolean(IGNORE_CASE));
   }
 
   public WhitelistBlacklist(String whitelist, String blacklist) throws IOException {
+    this(whitelist, blacklist, true);
+  }
+
+  public WhitelistBlacklist(String whitelist, String blacklist, boolean ignoreCase) throws IOException {
     this.whitelistMultimap = HashMultimap.create();
     this.blacklistMultimap = HashMultimap.create();
 
-    populateMultimap(this.whitelistMultimap, whitelist.toLowerCase());
-    populateMultimap(this.blacklistMultimap, blacklist.toLowerCase());
+    this.ignoreCase = ignoreCase;

Review comment:
       Nit a line break could be after setting ignoreCase and not before it.

##########
File path: gobblin-core/src/main/java/org/apache/gobblin/writer/partitioner/TimeBasedWriterPartitioner.java
##########
@@ -74,6 +76,7 @@
   private final String writerPartitionSuffix;
   private final DatePartitionType granularity;
   private final DateTimeZone timeZone;
+  @Getter

Review comment:
       required?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] zxcware commented on a change in pull request #3078: [GOBBLIN-1233] Add case-aware support in WhitelistBlacklist and other small fixes

Posted by GitBox <gi...@apache.org>.
zxcware commented on a change in pull request #3078:
URL: https://github.com/apache/incubator-gobblin/pull/3078#discussion_r470788906



##########
File path: gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriter.java
##########
@@ -83,7 +83,7 @@ protected void dispatch(Queue<BufferedRecord<D>> buffer) throws DispatchExceptio
     while (attempt < maxAttempts) {
       try {
           response = httpClient.sendRequest(rawRequest);
-      } catch (IOException e) {
+      } catch (Exception e) {

Review comment:
       To allow retry on any exception.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org