You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/04/11 04:43:39 UTC
[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
### JIRA
- https://issues.apache.org/jira/browse/GOBBLIN-1114
### Description
Make the up-conversion of record recursive.
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
### Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
> The diff coverage is `73.97%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
+ Coverage 45.54% 45.56% +0.02%
- Complexity 9160 9168 +8
============================================
Files 1937 1937
Lines 73362 73395 +33
Branches 8099 8105 +6
============================================
+ Hits 33414 33444 +30
Misses 36855 36855
- Partials 3093 3096 +3
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.42% <73.68%> (+2.55%)` | `16.00 <8.00> (ø)` | |
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `72.91% <74.28%> (+8.63%)` | `12.00 <8.00> (+8.00)` | |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `34.92% <0.00%> (-2.39%)` | `14.00% <0.00%> (-1.00%)` | |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `67.53% <0.00%> (-0.38%)` | `33.00% <0.00%> (ø%)` | |
| [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `64.44% <0.00%> (+1.11%)` | `16.00% <0.00%> (+1.00%)` | |
| [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407133543
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -91,7 +113,9 @@ OrcStruct upConvertOrcStruct(OrcStruct orcStruct, Context context) {
TypeDescription readerType = newSchemaTypes.get(indexInNewSchema);
if (isEvolutionValid(fileType, readerType)) {
Review comment:
Should isEvolutionValid() be inside a utils class?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io commented on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
> The diff coverage is `70.73%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
+ Coverage 45.54% 45.57% +0.02%
- Complexity 9160 9168 +8
============================================
Files 1937 1937
Lines 73362 73396 +34
Branches 8099 8105 +6
============================================
+ Hits 33414 33450 +36
+ Misses 36855 36849 -6
- Partials 3093 3097 +4
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
| [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
| [...he/gobblin/metrics/reporter/ScheduledReporter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9yZXBvcnRlci9TY2hlZHVsZWRSZXBvcnRlci5qYXZh) | `59.09% <0.00%> (-1.52%)` | `14.00% <0.00%> (-1.00%)` | |
| [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `80.37% <0.00%> (+0.93%)` | `24.00% <0.00%> (ø%)` | |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `40.47% <0.00%> (+3.17%)` | `16.00% <0.00%> (+1.00%)` | |
| [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] autumnust closed pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
### JIRA
- https://issues.apache.org/jira/browse/GOBBLIN-1114
### Description
Make the up-conversion of record recursive.
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
### Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] autumnust closed pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **decrease** coverage by `0.90%`.
> The diff coverage is `70.73%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
- Coverage 45.54% 44.64% -0.91%
+ Complexity 9160 9000 -160
============================================
Files 1937 1937
Lines 73362 73396 +34
Branches 8099 8105 +6
============================================
- Hits 33414 32765 -649
- Misses 36855 37576 +721
+ Partials 3093 3055 -38
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
| [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
| [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
| [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
| [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0.00% <0.00%> (-83.34%)` | `0.00% <0.00%> (-1.00%)` | |
| [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0.00% <0.00%> (-80.77%)` | `0.00% <0.00%> (-7.00%)` | |
| [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0.00% <0.00%> (-78.58%)` | `0.00% <0.00%> (-2.00%)` | |
| [.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
| [...he/gobblin/runtime/ZkDatasetStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
| ... and [42 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132255
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcUtils.java
##########
@@ -67,6 +71,6 @@ public static TypeDescription getNewestSchemaFromSource(Job job, FileSystem fs)
}
throw new IllegalStateException(
- String.format("There's no file carrying orc file schema in %s list", sourceDirs));
+ String.format("There's no file carrying orc file schema in the list of directories: %s", Arrays.asList(sourceDirs)));
Review comment:
Arrays.toString(sourceDirs)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132409
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
Review comment:
Can we print the file name too? If the mapper is processing multiple files, will be good to get the file name where the failure occurs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132718
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
}
+ writeCount += 1;
context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
}
/**
- * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
- * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+ * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
Review comment:
This comment can be improved. Do you mean to say "Recursively up convert the {@link OrcStruct} into {@link mapperSchema}"?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407136221
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
Review comment:
Do you think this would work: ((FileSplit) context.getInputSplit()).getInputPath()?
https://stackoverflow.com/questions/19012482/how-to-get-the-input-file-name-in-the-mapper-in-a-hadoop-program
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
> The diff coverage is `73.97%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
+ Coverage 45.54% 45.56% +0.02%
- Complexity 9160 9168 +8
============================================
Files 1937 1937
Lines 73362 73395 +33
Branches 8099 8105 +6
============================================
+ Hits 33414 33444 +30
Misses 36855 36855
- Partials 3093 3096 +3
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.42% <73.68%> (+2.55%)` | `16.00 <8.00> (ø)` | |
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `72.91% <74.28%> (+8.63%)` | `12.00 <8.00> (+8.00)` | |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `34.92% <0.00%> (-2.39%)` | `14.00% <0.00%> (-1.00%)` | |
| [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `67.53% <0.00%> (-0.38%)` | `33.00% <0.00%> (ø%)` | |
| [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `64.44% <0.00%> (+1.11%)` | `16.00% <0.00%> (+1.00%)` | |
| [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **decrease** coverage by `0.92%`.
> The diff coverage is `70.73%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
- Coverage 45.54% 44.62% -0.93%
+ Complexity 9160 8998 -162
============================================
Files 1937 1937
Lines 73362 73396 +34
Branches 8099 8105 +6
============================================
- Hits 33414 32753 -661
- Misses 36855 37587 +732
+ Partials 3093 3056 -37
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
| [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
| [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
| [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
| [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0.00% <0.00%> (-83.34%)` | `0.00% <0.00%> (-1.00%)` | |
| [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0.00% <0.00%> (-80.77%)` | `0.00% <0.00%> (-7.00%)` | |
| [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0.00% <0.00%> (-78.58%)` | `0.00% <0.00%> (-2.00%)` | |
| [.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
| [...he/gobblin/runtime/ZkDatasetStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
| ... and [41 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
### JIRA
- https://issues.apache.org/jira/browse/GOBBLIN-1114
### Description
Make the up-conversion of record recursive.
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
### Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
# [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
> Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
> The diff coverage is `70.73%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #2954 +/- ##
============================================
+ Coverage 45.54% 45.57% +0.02%
- Complexity 9160 9168 +8
============================================
Files 1937 1937
Lines 73362 73396 +34
Branches 8099 8105 +6
============================================
+ Hits 33414 33450 +36
+ Misses 36855 36849 -6
- Partials 3093 3097 +4
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
| [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
| [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
| [...he/gobblin/metrics/reporter/ScheduledReporter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9yZXBvcnRlci9TY2hlZHVsZWRSZXBvcnRlci5qYXZh) | `59.09% <0.00%> (-1.52%)` | `14.00% <0.00%> (-1.00%)` | |
| [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `80.37% <0.00%> (+0.93%)` | `24.00% <0.00%> (ø%)` | |
| [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `40.47% <0.00%> (+3.17%)` | `16.00% <0.00%> (+1.00%)` | |
| [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132844
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
}
+ writeCount += 1;
context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
}
/**
- * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
- * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+ * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
+ * Limitation:
+ * 1. Do not support up-convert key type in map.
+ * 2. Conversion only happens if that comply with org.apache.gobblin.compaction.mapreduce.orc.OrcValueMapper#isEvolutionValid return true.
Review comment:
"if that comply with" -> "if".
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] asfgit closed pull request #2954:
[GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132289
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
Review comment:
move this line outside the if..else block, since it is both in the if as well as else conditions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132758
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
}
+ writeCount += 1;
context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
}
/**
- * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
- * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+ * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
+ * Limitation:
+ * 1. Do not support up-convert key type in map.
Review comment:
"Does not support up-conversion of key types in Maps" ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request
#2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407135539
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
##########
@@ -24,58 +24,80 @@
import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.impl.ConvertTreeReaderFactory;
import org.apache.orc.impl.SchemaEvolution;
import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
import org.apache.orc.mapred.OrcValue;
import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
/**
* To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
* {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
* read in the record as the input value.
*/
+@Slf4j
public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
private OrcValue outValue;
private TypeDescription mapperSchema;
+ // This is added mostly for debuggability.
+ private static int writeCount = 0;
+
@Override
protected void setup(Context context)
throws IOException, InterruptedException {
super.setup(context);
this.outValue = new OrcValue();
- this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+ this.mapperSchema =
+ TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
}
@Override
protected void map(NullWritable key, OrcStruct orcStruct, Context context)
throws IOException, InterruptedException {
- OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
- if (context.getNumReduceTasks() == 0) {
- this.outValue.value = upConvertedStruct;
- context.write(NullWritable.get(), this.outValue);
- } else {
- this.outValue.value = upConvertedStruct;
- context.write(getDedupKey(upConvertedStruct), this.outValue);
+ OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+ try {
+ if (context.getNumReduceTasks() == 0) {
+ this.outValue.value = upConvertedStruct;
+ context.write(NullWritable.get(), this.outValue);
+ } else {
+ this.outValue.value = upConvertedStruct;
+ context.write(getDedupKey(upConvertedStruct), this.outValue);
+ }
+ } catch (Exception e) {
+ throw new RuntimeException("Failure in write record no." + writeCount, e);
Review comment:
Yes, finding the filename here seems non-trivial since I need the split id to be available in the context object. I have another ticket to track this work.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services