You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/01/29 17:39:17 UTC

[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

arjun4084346 opened a new pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @jack-moseley  @sv2000  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin-1035](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. 
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if applicable):
   It relaxes criteria for HiveDatasetDescriptor to accept wildcard chars in database name and table name
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   updated HiveDatasetDescriptorTest
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757716
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
+  String createWhitelistedTables() {
 
 Review comment:
   createWhitelistDbTablePattern()? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373916400
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created as <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+  String createHiveDatasetWhitelist() {
+    if (new GlobPattern(this.databaseName).hasWildcard()) {
 
 Review comment:
   Yeah - would be good to add that as a comment. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `<.01%`.
   > The diff coverage is `64.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2877      +/-   ##
   ============================================
   + Coverage     45.71%   45.72%   +<.01%     
   - Complexity     9108     9110       +2     
   ============================================
     Files          1921     1921              
     Lines         72391    72391              
     Branches       7968     7967       -1     
   ============================================
   + Hits          33094    33099       +5     
   + Misses        36273    36269       -4     
   + Partials       3024     3023       -1
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
   | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70% <0%> (-2.23%)` | `13% <0%> (ø)` | |
   | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...4536035](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757547
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
 
 Review comment:
   Your javadoc is a little confusing to read. Do you mean to say that for an input with db = ei_tracking and table = zephyr*,abook*, the method will return the whitelist pattern ei_tracking.zephyr*|abook*?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `41.71%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #2877       +/-   ##
   ============================================
   - Coverage      45.8%   4.09%   -41.72%     
   + Complexity     9111     747     -8364     
   ============================================
     Files          1915    1921        +6     
     Lines         72285   72351       +66     
     Branches       7972    7963        -9     
   ============================================
   - Hits          33109    2961    -30148     
   - Misses        36150   69071    +32921     
   + Partials       3026     319     -2707
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `0% <0%> (-40.48%)` | `0 <0> (-4)` | |
   | [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-5%)` | |
   | [...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | [...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-1%)` | |
   | [.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-7%)` | |
   | [...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | [...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | [...blin/converter/string/ObjectToStringConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9zdHJpbmcvT2JqZWN0VG9TdHJpbmdDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | [...rquet/JsonIntermediateToParquetGroupConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tcGFycXVldC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9jb252ZXJ0ZXIvcGFycXVldC9Kc29uSW50ZXJtZWRpYXRlVG9QYXJxdWV0R3JvdXBDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | ... and [1122 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...1f04b0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373125899
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
+  String createWhitelistedTables() {
 
 Review comment:
   naming it by the name of property HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] asfgit closed pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373886146
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created as <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+  String createHiveDatasetWhitelist() {
+    if (new GlobPattern(this.databaseName).hasWildcard()) {
 
 Review comment:
   Maybe add a comment why we are using Hadoop's GlobPattern instead of java.util.regex? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373885976
 
 

 ##########
 File path: gobblin-service/src/test/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptorTest.java
 ##########
 @@ -30,16 +30,44 @@
 
 
 public class HiveDatasetDescriptorTest {
+  Config baseConfig = ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, ConfigValueFactory.fromAnyRef("hive"))
+      .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("testDb_Db1"))
+      .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("testTable_Table1"));;
 
   @Test
   public void objectCreation() throws IOException {
-    Config baseConfig = ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, ConfigValueFactory.fromAnyRef("hive"))
-        .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("testDb_Db1"))
-        .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("testTable_Table1"));;
-
     SqlDatasetDescriptor descriptor1 = new HiveDatasetDescriptor(baseConfig.withValue(HiveDatasetDescriptor.IS_PARTITIONED_KEY, ConfigValueFactory.fromAnyRef(true)));
     SqlDatasetDescriptor descriptor2 = new HiveDatasetDescriptor(baseConfig.withValue(HiveDatasetDescriptor.IS_PARTITIONED_KEY, ConfigValueFactory.fromAnyRef(false)));
 
     Assert.assertNotEquals(descriptor1, descriptor2);
   }
+
+  @Test
+  public void testWhitelist() throws IOException {
+    HiveDatasetDescriptor descriptor = new HiveDatasetDescriptor(baseConfig
+        .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("ei_*"))
+        .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("abc,def,fgh"))
+    );
+    Assert.assertTrue(descriptor.whitelistBlacklist.acceptTable("ei_tracking", "abc"));
 
 Review comment:
   Can we use use a db name that is different from the one being used currently?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `64.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2877      +/-   ##
   ============================================
   + Coverage     45.71%   45.73%   +0.01%     
   - Complexity     9108     9112       +4     
   ============================================
     Files          1921     1921              
     Lines         72391    72391              
     Branches       7968     7967       -1     
   ============================================
   + Hits          33094    33105      +11     
   + Misses        36273    36264       -9     
   + Partials       3024     3022       -2
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.7% <0%> (+1.35%)` | `29% <0%> (+1%)` | :arrow_up: |
   | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...2663800](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373284155
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -66,20 +67,23 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
         .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
-            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
   // If the db name contains wildcards, whitelist is created <regex_db>.*
   // Otherwise, whitelist is created as <db>.tables.
   // This is the format which HiveDatasetFinder understands.
   // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
 
 Review comment:
   Minor typo: "whitelist will be"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373284045
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -66,20 +67,23 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
         .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
-            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
   // If the db name contains wildcards, whitelist is created <regex_db>.*
 
 Review comment:
   Minor typo: "whilelist is created as"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io commented on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `<.01%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##             master   #2877      +/-   ##
   ===========================================
   - Coverage      45.8%   45.8%   -0.01%     
   - Complexity     9111    9114       +3     
   ===========================================
     Files          1915    1915              
     Lines         72285   72301      +16     
     Branches       7972    7975       +3     
   ===========================================
   + Hits          33109   33115       +6     
   - Misses        36150   36160      +10     
     Partials       3026    3026
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `42% <42.85%> (+1.52%)` | `6 <3> (+2)` | :arrow_up: |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `35.51% <0%> (-3.74%)` | `12% <0%> (-1%)` | |
   | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.22% <0%> (-1.06%)` | `28% <0%> (ø)` | |
   | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | [...org/apache/gobblin/yarn/GobblinYarnTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5UYXNrUnVubmVyLmphdmE=) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | [...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==) | `15.32% <0%> (+0.15%)` | `4% <0%> (ø)` | :arrow_down: |
   | [.../org/apache/gobblin/metrics/RootMetricContext.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9Sb290TWV0cmljQ29udGV4dC5qYXZh) | `79.68% <0%> (+7.81%)` | `16% <0%> (+1%)` | :arrow_up: |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `100% <0%> (+14.28%)` | `4% <0%> (+1%)` | :arrow_up: |
   | [...a/org/apache/gobblin/util/limiter/NoopLimiter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbGltaXRlci9Ob29wTGltaXRlci5qYXZh) | `60% <0%> (+20%)` | `3% <0%> (+1%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...d02e4ec](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373125173
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createWhitelistedTables())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
 
 Review comment:
   yes, that's what i mean to say.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373223062
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -102,25 +112,33 @@ protected boolean isPathContaining(DatasetDescriptor other) {
     String otherDbName = parts.get(0);
     String otherTableNames = parts.get(1);
 
-    if(!Pattern.compile(this.databaseName).matcher(otherDbName).matches()) {
+    if (!isDbAccepted(otherDbName)) {
       return false;
     }
 
     List<String> tables = Splitter.on(",").splitToList(otherTableNames);
     for (String table : tables) {
-      if (!isPathContaining(table)) {
+      if (!isTableAccepted(table)) {
         return false;
       }
     }
     return true;
   }
 
-  private boolean isPathContaining(String tableName) {
-    if (tableName == null) {
-      return false;
+  private boolean isDbAccepted(String otherDbName) {
+    List<String> dbs = Splitter.on(",").splitToList(this.databaseName);
+
+    for (String db : dbs) {
+      if (new GlobPattern(db).matches(otherDbName)) {
 
 Review comment:
   replaced it with with WhitelistBlacklist, which internally uses java.util.regex patterns

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `0.03%`.
   > The diff coverage is `61.53%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2877      +/-   ##
   ============================================
   - Coverage      45.8%   45.77%   -0.04%     
   - Complexity     9111     9112       +1     
   ============================================
     Files          1915     1921       +6     
     Lines         72285    72351      +66     
     Branches       7972     7963       -9     
   ============================================
   + Hits          33109    33116       +7     
   - Misses        36150    36210      +60     
   + Partials       3026     3025       -1
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `32.71% <0%> (-6.55%)` | `11% <0%> (-2%)` | |
   | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `69.69% <0%> (-4.5%)` | `11% <0%> (ø)` | |
   | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.22% <0%> (-1.06%)` | `28% <0%> (ø)` | |
   | [...blin/service/modules/orchestration/DagManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXIuamF2YQ==) | `76.54% <0%> (-1.04%)` | `13% <0%> (+1%)` | |
   | [...in/service/modules/core/GobblinServiceManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9jb3JlL0dvYmJsaW5TZXJ2aWNlTWFuYWdlci5qYXZh) | `54.71% <0%> (-0.04%)` | `25% <0%> (ø)` | |
   | [...a/org/apache/gobblin/service/FlowStatusClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi1mbG93LWNvbmZpZy1zZXJ2aWNlL2dvYmJsaW4tZmxvdy1jb25maWctc2VydmljZS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9GbG93U3RhdHVzQ2xpZW50LmphdmE=) | `72.34% <0%> (ø)` | `7% <0%> (ø)` | :arrow_down: |
   | [...org/apache/gobblin/service/FlowStatusResource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi1mbG93LWNvbmZpZy1zZXJ2aWNlL2dvYmJsaW4tZmxvdy1jb25maWctc2VydmljZS1zZXJ2ZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9GbG93U3RhdHVzUmVzb3VyY2UuamF2YQ==) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | ... and [13 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...3926e31](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373897715
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created as <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+  String createHiveDatasetWhitelist() {
+    if (new GlobPattern(this.databaseName).hasWildcard()) {
 
 Review comment:
   I am using Hadoop's GlobPattern instead of java.util.regex, because I could not find any API in java.util.regex which tells if the string is a plain string or contains special characters. Do you know any API in java.util.regex? or should I put this reason in comment?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373885919
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created as <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
 
 Review comment:
   Can we remove these references?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
   > Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `64.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2877      +/-   ##
   ============================================
   + Coverage     45.71%   45.72%   +0.01%     
   - Complexity     9108     9112       +4     
   ============================================
     Files          1921     1921              
     Lines         72391    72391              
     Branches       7968     7967       -1     
   ============================================
   + Hits          33094    33104      +10     
   + Misses        36273    36264       -9     
   + Partials       3024     3023       -1
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
   | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.7% <0%> (+1.35%)` | `29% <0%> (+1%)` | :arrow_up: |
   | [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...377478a](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373897715
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
       partitionFormat = "";
       conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
     }
+
+    whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+        ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
     this.setRawConfig(this.getRawConfig()
         .withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
         .withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
         .withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
-        .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+        .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+            ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
         ));
   }
 
-  private String createWhitelistedTables() {
-    return this.tableName.replace(',', '|');
+  // If the db name contains wildcards, whitelist is created as <regex_db>.*
+  // Otherwise, whitelist is created as <db>.tables.
+  // This is the format which HiveDatasetFinder understands.
+  // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+  String createHiveDatasetWhitelist() {
+    if (new GlobPattern(this.databaseName).hasWildcard()) {
 
 Review comment:
   I am using Hadoop's GlobPattern instead of java.util.regex, because I could not find any API java.util.regex which tells if the string is a plain string or contains special characters. Do you know any API in java.util.regex? or should I put this reason in comment?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757288
 
 

 ##########
 File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
 ##########
 @@ -102,25 +112,33 @@ protected boolean isPathContaining(DatasetDescriptor other) {
     String otherDbName = parts.get(0);
     String otherTableNames = parts.get(1);
 
-    if(!Pattern.compile(this.databaseName).matcher(otherDbName).matches()) {
+    if (!isDbAccepted(otherDbName)) {
       return false;
     }
 
     List<String> tables = Splitter.on(",").splitToList(otherTableNames);
     for (String table : tables) {
-      if (!isPathContaining(table)) {
+      if (!isTableAccepted(table)) {
         return false;
       }
     }
     return true;
   }
 
-  private boolean isPathContaining(String tableName) {
-    if (tableName == null) {
-      return false;
+  private boolean isDbAccepted(String otherDbName) {
+    List<String> dbs = Splitter.on(",").splitToList(this.databaseName);
+
+    for (String db : dbs) {
+      if (new GlobPattern(db).matches(otherDbName)) {
 
 Review comment:
   Should we use java.util.regex patterns for contains() checks, instead of Hadoop GlobPattern which is more restrictive.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services