You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/01/29 17:39:17 UTC
[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
arjun4084346 opened a new pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @jack-moseley @sv2000 please review
### JIRA
- [x] My PR addresses the following [Gobblin-1035](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title.
### Description
- [x] Here are some details about my PR, including screenshots (if applicable):
It relaxes criteria for HiveDatasetDescriptor to accept wildcard chars in database name and table name
### Tests
- [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
updated HiveDatasetDescriptorTest
### Commits
- [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757716
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createWhitelistedTables())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
+ String createWhitelistedTables() {
Review comment:
createWhitelistDbTablePattern()?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373916400
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created as <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+ String createHiveDatasetWhitelist() {
+ if (new GlobPattern(this.databaseName).hasWildcard()) {
Review comment:
Yeah - would be good to add that as a comment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `<.01%`.
> The diff coverage is `64.28%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
============================================
+ Coverage 45.71% 45.72% +<.01%
- Complexity 9108 9110 +2
============================================
Files 1921 1921
Lines 72391 72391
Branches 7968 7967 -1
============================================
+ Hits 33094 33099 +5
+ Misses 36273 36269 -4
+ Partials 3024 3023 -1
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
| [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70% <0%> (-2.23%)` | `13% <0%> (ø)` | |
| [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
| [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
| [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...4536035](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757547
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createWhitelistedTables())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
Review comment:
Your javadoc is a little confusing to read. Do you mean to say that for an input with db = ei_tracking and table = zephyr*,abook*, the method will return the whitelist pattern ei_tracking.zephyr*|abook*?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `41.71%`.
> The diff coverage is `0%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
============================================
- Coverage 45.8% 4.09% -41.72%
+ Complexity 9111 747 -8364
============================================
Files 1915 1921 +6
Lines 72285 72351 +66
Branches 7972 7963 -9
============================================
- Hits 33109 2961 -30148
- Misses 36150 69071 +32921
+ Partials 3026 319 -2707
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `0% <0%> (-40.48%)` | `0 <0> (-4)` | |
| [...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
| [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-5%)` | |
| [...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
| [...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-1%)` | |
| [.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh) | `0% <0%> (-100%)` | `0% <0%> (-7%)` | |
| [...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
| [...bblin/data/management/copy/AllEqualComparator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvQWxsRXF1YWxDb21wYXJhdG9yLmphdmE=) | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
| [...blin/converter/string/ObjectToStringConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9zdHJpbmcvT2JqZWN0VG9TdHJpbmdDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
| [...rquet/JsonIntermediateToParquetGroupConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tcGFycXVldC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9jb252ZXJ0ZXIvcGFycXVldC9Kc29uSW50ZXJtZWRpYXRlVG9QYXJxdWV0R3JvdXBDb252ZXJ0ZXIuamF2YQ==) | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
| ... and [1122 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...1f04b0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull
request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed
db and tables
Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373125899
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createWhitelistedTables())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
+ String createWhitelistedTables() {
Review comment:
naming it by the name of property HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] asfgit closed pull request #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373886146
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created as <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+ String createHiveDatasetWhitelist() {
+ if (new GlobPattern(this.databaseName).hasWildcard()) {
Review comment:
Maybe add a comment why we are using Hadoop's GlobPattern instead of java.util.regex?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373885976
##########
File path: gobblin-service/src/test/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptorTest.java
##########
@@ -30,16 +30,44 @@
public class HiveDatasetDescriptorTest {
+ Config baseConfig = ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, ConfigValueFactory.fromAnyRef("hive"))
+ .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("testDb_Db1"))
+ .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("testTable_Table1"));;
@Test
public void objectCreation() throws IOException {
- Config baseConfig = ConfigFactory.empty().withValue(DatasetDescriptorConfigKeys.PLATFORM_KEY, ConfigValueFactory.fromAnyRef("hive"))
- .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("testDb_Db1"))
- .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("testTable_Table1"));;
-
SqlDatasetDescriptor descriptor1 = new HiveDatasetDescriptor(baseConfig.withValue(HiveDatasetDescriptor.IS_PARTITIONED_KEY, ConfigValueFactory.fromAnyRef(true)));
SqlDatasetDescriptor descriptor2 = new HiveDatasetDescriptor(baseConfig.withValue(HiveDatasetDescriptor.IS_PARTITIONED_KEY, ConfigValueFactory.fromAnyRef(false)));
Assert.assertNotEquals(descriptor1, descriptor2);
}
+
+ @Test
+ public void testWhitelist() throws IOException {
+ HiveDatasetDescriptor descriptor = new HiveDatasetDescriptor(baseConfig
+ .withValue(DatasetDescriptorConfigKeys.DATABASE_KEY, ConfigValueFactory.fromAnyRef("ei_*"))
+ .withValue(DatasetDescriptorConfigKeys.TABLE_KEY, ConfigValueFactory.fromAnyRef("abc,def,fgh"))
+ );
+ Assert.assertTrue(descriptor.whitelistBlacklist.acceptTable("ei_tracking", "abc"));
Review comment:
Can we use use a db name that is different from the one being used currently?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `0.01%`.
> The diff coverage is `64.28%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
============================================
+ Coverage 45.71% 45.73% +0.01%
- Complexity 9108 9112 +4
============================================
Files 1921 1921
Lines 72391 72391
Branches 7968 7967 -1
============================================
+ Hits 33094 33105 +11
+ Misses 36273 36264 -9
+ Partials 3024 3022 -2
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.7% <0%> (+1.35%)` | `29% <0%> (+1%)` | :arrow_up: |
| [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
| [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...2663800](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373284155
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -66,20 +67,23 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
.withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
- ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
// If the db name contains wildcards, whitelist is created <regex_db>.*
// Otherwise, whitelist is created as <db>.tables.
// This is the format which HiveDatasetFinder understands.
// e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
Review comment:
Minor typo: "whitelist will be"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373284045
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -66,20 +67,23 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
.withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
- ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
// If the db name contains wildcards, whitelist is created <regex_db>.*
Review comment:
Minor typo: "whilelist is created as"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io commented on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `<.01%`.
> The diff coverage is `42.85%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
===========================================
- Coverage 45.8% 45.8% -0.01%
- Complexity 9111 9114 +3
===========================================
Files 1915 1915
Lines 72285 72301 +16
Branches 7972 7975 +3
===========================================
+ Hits 33109 33115 +6
- Misses 36150 36160 +10
Partials 3026 3026
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `42% <42.85%> (+1.52%)` | `6 <3> (+2)` | :arrow_up: |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `35.51% <0%> (-3.74%)` | `12% <0%> (-1%)` | |
| [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.22% <0%> (-1.06%)` | `28% <0%> (ø)` | |
| [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
| [...org/apache/gobblin/yarn/GobblinYarnTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5UYXNrUnVubmVyLmphdmE=) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
| [...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==) | `15.32% <0%> (+0.15%)` | `4% <0%> (ø)` | :arrow_down: |
| [.../org/apache/gobblin/metrics/RootMetricContext.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9Sb290TWV0cmljQ29udGV4dC5qYXZh) | `79.68% <0%> (+7.81%)` | `16% <0%> (+1%)` | :arrow_up: |
| [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `100% <0%> (+14.28%)` | `4% <0%> (+1%)` | :arrow_up: |
| [...a/org/apache/gobblin/util/limiter/NoopLimiter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbGltaXRlci9Ob29wTGltaXRlci5qYXZh) | `60% <0%> (+20%)` | `3% <0%> (+1%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...d02e4ec](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull
request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed
db and tables
Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373125173
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -69,12 +70,21 @@ public HiveDatasetDescriptor(Config config) throws IOException {
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createWhitelistedTables())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will ei_tracking.zephyr*|abook*
Review comment:
yes, that's what i mean to say.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull
request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed
db and tables
Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373223062
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -102,25 +112,33 @@ protected boolean isPathContaining(DatasetDescriptor other) {
String otherDbName = parts.get(0);
String otherTableNames = parts.get(1);
- if(!Pattern.compile(this.databaseName).matcher(otherDbName).matches()) {
+ if (!isDbAccepted(otherDbName)) {
return false;
}
List<String> tables = Splitter.on(",").splitToList(otherTableNames);
for (String table : tables) {
- if (!isPathContaining(table)) {
+ if (!isTableAccepted(table)) {
return false;
}
}
return true;
}
- private boolean isPathContaining(String tableName) {
- if (tableName == null) {
- return false;
+ private boolean isDbAccepted(String otherDbName) {
+ List<String> dbs = Splitter.on(",").splitToList(this.databaseName);
+
+ for (String db : dbs) {
+ if (new GlobPattern(db).matches(otherDbName)) {
Review comment:
replaced it with with WhitelistBlacklist, which internally uses java.util.regex patterns
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/cd53dcfa2c046f53db29689e1d687bf4412565aa?src=pr&el=desc) will **decrease** coverage by `0.03%`.
> The diff coverage is `61.53%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
============================================
- Coverage 45.8% 45.77% -0.04%
- Complexity 9111 9112 +1
============================================
Files 1915 1921 +6
Lines 72285 72351 +66
Branches 7972 7963 -9
============================================
+ Hits 33109 33116 +7
- Misses 36150 36210 +60
+ Partials 3026 3025 -1
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `32.71% <0%> (-6.55%)` | `11% <0%> (-2%)` | |
| [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `69.69% <0%> (-4.5%)` | `11% <0%> (ø)` | |
| [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `63.33% <0%> (-1.12%)` | `15% <0%> (-1%)` | |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.22% <0%> (-1.06%)` | `28% <0%> (ø)` | |
| [...blin/service/modules/orchestration/DagManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXIuamF2YQ==) | `76.54% <0%> (-1.04%)` | `13% <0%> (+1%)` | |
| [...in/service/modules/core/GobblinServiceManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9jb3JlL0dvYmJsaW5TZXJ2aWNlTWFuYWdlci5qYXZh) | `54.71% <0%> (-0.04%)` | `25% <0%> (ø)` | |
| [...a/org/apache/gobblin/service/FlowStatusClient.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi1mbG93LWNvbmZpZy1zZXJ2aWNlL2dvYmJsaW4tZmxvdy1jb25maWctc2VydmljZS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9GbG93U3RhdHVzQ2xpZW50LmphdmE=) | `72.34% <0%> (ø)` | `7% <0%> (ø)` | :arrow_down: |
| [...org/apache/gobblin/service/FlowStatusResource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi1mbG93LWNvbmZpZy1zZXJ2aWNlL2dvYmJsaW4tZmxvdy1jb25maWctc2VydmljZS1zZXJ2ZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9GbG93U3RhdHVzUmVzb3VyY2UuamF2YQ==) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
| [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
| ... and [13 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [cd53dcf...3926e31](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull
request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed
db and tables
Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373897715
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created as <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+ String createHiveDatasetWhitelist() {
+ if (new GlobPattern(this.databaseName).hasWildcard()) {
Review comment:
I am using Hadoop's GlobPattern instead of java.util.regex, because I could not find any API in java.util.regex which tells if the string is a plain string or contains special characters. Do you know any API in java.util.regex? or should I put this reason in comment?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373885919
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created as <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
Review comment:
Can we remove these references?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2877:
[GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#issuecomment-579991934
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=h1) Report
> Merging [#2877](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/853cb344485db795b3e698d05da467cf9bbfc6f5?src=pr&el=desc) will **increase** coverage by `0.01%`.
> The diff coverage is `64.28%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2877 +/- ##
============================================
+ Coverage 45.71% 45.72% +0.01%
- Complexity 9108 9112 +4
============================================
Files 1921 1921
Lines 72391 72391
Branches 7968 7967 -1
============================================
+ Hits 33094 33104 +10
+ Misses 36273 36264 -9
+ Partials 3024 3023 -1
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/metrics/event/lineage/LineageInfo.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9ldmVudC9saW5lYWdlL0xpbmVhZ2VJbmZvLmphdmE=) | `67.7% <100%> (ø)` | `18 <0> (ø)` | :arrow_down: |
| [...service/modules/dataset/HiveDatasetDescriptor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9kYXRhc2V0L0hpdmVEYXRhc2V0RGVzY3JpcHRvci5qYXZh) | `54.76% <61.53%> (+14.28%)` | `6 <3> (+2)` | :arrow_up: |
| [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `79.43% <0%> (-0.94%)` | `24% <0%> (ø)` | |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `64.7% <0%> (+1.35%)` | `29% <0%> (+1%)` | :arrow_up: |
| [...lin/elasticsearch/writer/FutureCallbackHolder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tZWxhc3RpY3NlYXJjaC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9lbGFzdGljc2VhcmNoL3dyaXRlci9GdXR1cmVDYWxsYmFja0hvbGRlci5qYXZh) | `62.85% <0%> (+1.42%)` | `4% <0%> (ø)` | :arrow_down: |
| [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2877/diff?src=pr&el=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0%> (+7.14%)` | `3% <0%> (ø)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=footer). Last update [853cb34...377478a](https://codecov.io/gh/apache/incubator-gobblin/pull/2877?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull
request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed
db and tables
Posted by GitBox <gi...@apache.org>.
arjun4084346 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r373897715
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -65,16 +67,28 @@ public HiveDatasetDescriptor(Config config) throws IOException {
partitionFormat = "";
conflictPolicy = HiveCopyEntityHelper.ExistingEntityPolicy.REPLACE_TABLE.name();
}
+
+ whitelistBlacklist = new WhitelistBlacklist(config.withValue(WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())));
this.setRawConfig(this.getRawConfig()
.withValue(CONFLICT_POLICY, ConfigValueFactory.fromAnyRef(conflictPolicy))
.withValue(PARTITION_COLUMN, ConfigValueFactory.fromAnyRef(partitionColumn))
.withValue(PARTITION_FORMAT, ConfigValueFactory.fromAnyRef(partitionFormat))
- .withValue(WHITELIST_TABLES, ConfigValueFactory.fromAnyRef(createWhitelistedTables())
+ .withValue(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST,
+ ConfigValueFactory.fromAnyRef(createHiveDatasetWhitelist())
));
}
- private String createWhitelistedTables() {
- return this.tableName.replace(',', '|');
+ // If the db name contains wildcards, whitelist is created as <regex_db>.*
+ // Otherwise, whitelist is created as <db>.tables.
+ // This is the format which HiveDatasetFinder understands.
+ // e.g. db=ei_tracking, table=zephyr*,abook*, whitelist will be ei_tracking.zephyr*|abook*
+ String createHiveDatasetWhitelist() {
+ if (new GlobPattern(this.databaseName).hasWildcard()) {
Review comment:
I am using Hadoop's GlobPattern instead of java.util.regex, because I could not find any API java.util.regex which tells if the string is a plain string or contains special characters. Do you know any API in java.util.regex? or should I put this reason in comment?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and
tables
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2877: [GOBBLIN-1035] make hive dataset descriptor accepts regexed db and tables
URL: https://github.com/apache/incubator-gobblin/pull/2877#discussion_r372757288
##########
File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/dataset/HiveDatasetDescriptor.java
##########
@@ -102,25 +112,33 @@ protected boolean isPathContaining(DatasetDescriptor other) {
String otherDbName = parts.get(0);
String otherTableNames = parts.get(1);
- if(!Pattern.compile(this.databaseName).matcher(otherDbName).matches()) {
+ if (!isDbAccepted(otherDbName)) {
return false;
}
List<String> tables = Splitter.on(",").splitToList(otherTableNames);
for (String table : tables) {
- if (!isPathContaining(table)) {
+ if (!isTableAccepted(table)) {
return false;
}
}
return true;
}
- private boolean isPathContaining(String tableName) {
- if (tableName == null) {
- return false;
+ private boolean isDbAccepted(String otherDbName) {
+ List<String> dbs = Splitter.on(",").splitToList(this.databaseName);
+
+ for (String db : dbs) {
+ if (new GlobPattern(db).matches(otherDbName)) {
Review comment:
Should we use java.util.regex patterns for contains() checks, instead of Hadoop GlobPattern which is more restrictive.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services