You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/04/11 04:43:39 UTC

[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - https://issues.apache.org/jira/browse/GOBBLIN-1114
   
   
   ### Description
   Make the up-conversion of record recursive. 
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `73.97%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   + Coverage     45.54%   45.56%   +0.02%     
   - Complexity     9160     9168       +8     
   ============================================
     Files          1937     1937              
     Lines         73362    73395      +33     
     Branches       8099     8105       +6     
   ============================================
   + Hits          33414    33444      +30     
     Misses        36855    36855              
   - Partials       3093     3096       +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.42% <73.68%> (+2.55%)` | `16.00 <8.00> (ø)` | |
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `72.91% <74.28%> (+8.63%)` | `12.00 <8.00> (+8.00)` | |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `34.92% <0.00%> (-2.39%)` | `14.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `67.53% <0.00%> (-0.38%)` | `33.00% <0.00%> (ø%)` | |
   | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `64.44% <0.00%> (+1.11%)` | `16.00% <0.00%> (+1.00%)` | |
   | [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407133543
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -91,7 +113,9 @@ OrcStruct upConvertOrcStruct(OrcStruct orcStruct, Context context) {
           TypeDescription readerType = newSchemaTypes.get(indexInNewSchema);
 
           if (isEvolutionValid(fileType, readerType)) {
 
 Review comment:
   Should isEvolutionValid() be inside a utils class?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io commented on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `70.73%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   + Coverage     45.54%   45.57%   +0.02%     
   - Complexity     9160     9168       +8     
   ============================================
     Files          1937     1937              
     Lines         73362    73396      +34     
     Branches       8099     8105       +6     
   ============================================
   + Hits          33414    33450      +36     
   + Misses        36855    36849       -6     
   - Partials       3093     3097       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
   | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
   | [...he/gobblin/metrics/reporter/ScheduledReporter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9yZXBvcnRlci9TY2hlZHVsZWRSZXBvcnRlci5qYXZh) | `59.09% <0.00%> (-1.52%)` | `14.00% <0.00%> (-1.00%)` | |
   | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `80.37% <0.00%> (+0.93%)` | `24.00% <0.00%> (ø%)` | |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `40.47% <0.00%> (+3.17%)` | `16.00% <0.00%> (+1.00%)` | |
   | [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - https://issues.apache.org/jira/browse/GOBBLIN-1114
   
   
   ### Description
   Make the up-conversion of record recursive. 
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
autumnust closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **decrease** coverage by `0.90%`.
   > The diff coverage is `70.73%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   - Coverage     45.54%   44.64%   -0.91%     
   + Complexity     9160     9000     -160     
   ============================================
     Files          1937     1937              
     Lines         73362    73396      +34     
     Branches       8099     8105       +6     
   ============================================
   - Hits          33414    32765     -649     
   - Misses        36855    37576     +721     
   + Partials       3093     3055      -38     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
   | [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0.00% <0.00%> (-83.34%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0.00% <0.00%> (-80.77%)` | `0.00% <0.00%> (-7.00%)` | |
   | [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0.00% <0.00%> (-78.58%)` | `0.00% <0.00%> (-2.00%)` | |
   | [.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...he/gobblin/runtime/ZkDatasetStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
   | ... and [42 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132255
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcUtils.java
 ##########
 @@ -67,6 +71,6 @@ public static TypeDescription getNewestSchemaFromSource(Job job, FileSystem fs)
     }
 
     throw new IllegalStateException(
-        String.format("There's no file carrying orc file schema in %s list", sourceDirs));
+        String.format("There's no file carrying orc file schema in the list of directories: %s", Arrays.asList(sourceDirs)));
 
 Review comment:
   Arrays.toString(sourceDirs)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132409
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
 
 Review comment:
   Can we print the file name too? If the mapper is processing multiple files, will be good to get the file name where the failure occurs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132718
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
     }
+    writeCount += 1;
 
     context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
   }
 
   /**
-   * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
-   * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+   * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
 
 Review comment:
   This comment can be improved. Do you mean to say "Recursively up convert the {@link OrcStruct} into {@link mapperSchema}"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407136221
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
 
 Review comment:
   Do you think this would work: ((FileSplit) context.getInputSplit()).getInputPath()?
   https://stackoverflow.com/questions/19012482/how-to-get-the-input-file-name-in-the-mapper-in-a-hadoop-program

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `73.97%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   + Coverage     45.54%   45.56%   +0.02%     
   - Complexity     9160     9168       +8     
   ============================================
     Files          1937     1937              
     Lines         73362    73395      +33     
     Branches       8099     8105       +6     
   ============================================
   + Hits          33414    33444      +30     
     Misses        36855    36855              
   - Partials       3093     3096       +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.42% <73.68%> (+2.55%)` | `16.00 <8.00> (ø)` | |
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `72.91% <74.28%> (+8.63%)` | `12.00 <8.00> (+8.00)` | |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `34.92% <0.00%> (-2.39%)` | `14.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `67.53% <0.00%> (-0.38%)` | `33.00% <0.00%> (ø%)` | |
   | [...e/gobblin/runtime/locks/ZookeeperBasedJobLock.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvWm9va2VlcGVyQmFzZWRKb2JMb2NrLmphdmE=) | `64.44% <0.00%> (+1.11%)` | `16.00% <0.00%> (+1.00%)` | |
   | [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...ff06d35](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **decrease** coverage by `0.92%`.
   > The diff coverage is `70.73%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   - Coverage     45.54%   44.62%   -0.93%     
   + Complexity     9160     8998     -162     
   ============================================
     Files          1937     1937              
     Lines         73362    73396      +34     
     Branches       8099     8105       +6     
   ============================================
   - Hits          33414    32753     -661     
   - Misses        36855    37587     +732     
   + Partials       3093     3056      -37     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
   | [...gobblin/runtime/mapreduce/GobblinOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0dvYmJsaW5PdXRwdXRGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...askStateCollectorServiceHiveRegHandlerFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvVGFza1N0YXRlQ29sbGVjdG9yU2VydmljZUhpdmVSZWdIYW5kbGVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...re/filesystem/FsDatasetStateStoreEntryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWV0YXN0b3JlL2ZpbGVzeXN0ZW0vRnNEYXRhc2V0U3RhdGVTdG9yZUVudHJ5TWFuYWdlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...in/runtime/mapreduce/CustomizedProgresserBase.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbWFwcmVkdWNlL0N1c3RvbWl6ZWRQcm9ncmVzc2VyQmFzZS5qYXZh) | `0.00% <0.00%> (-83.34%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/gobblin/runtime/ZkDatasetStateStore.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlLmphdmE=) | `0.00% <0.00%> (-80.77%)` | `0.00% <0.00%> (-7.00%)` | |
   | [...lin/runtime/locks/LegacyJobLockFactoryManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9ja3MvTGVnYWN5Sm9iTG9ja0ZhY3RvcnlNYW5hZ2VyLmphdmE=) | `0.00% <0.00%> (-78.58%)` | `0.00% <0.00%> (-2.00%)` | |
   | [.../apache/gobblin/metastore/ZkStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0YXN0b3JlL1prU3RhdGVTdG9yZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...he/gobblin/runtime/ZkDatasetStateStoreFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4taGVsaXgvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vcnVudGltZS9aa0RhdGFzZXRTdGF0ZVN0b3JlRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-71.43%)` | `0.00% <0.00%> (-2.00%)` | |
   | ... and [41 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
autumnust opened a new pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - https://issues.apache.org/jira/browse/GOBBLIN-1114
   
   
   ### Description
   Make the up-conversion of record recursive. 
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#issuecomment-612520950
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=h1) Report
   > Merging [#2954](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9879303fc2a1bef21ab394f7fa6910ce0f2ccfbc&el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `70.73%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2954      +/-   ##
   ============================================
   + Coverage     45.54%   45.57%   +0.02%     
   - Complexity     9160     9168       +8     
   ============================================
     Files          1937     1937              
     Lines         73362    73396      +34     
     Branches       8099     8105       +6     
   ============================================
   + Hits          33414    33450      +36     
   + Misses        36855    36849       -6     
   - Partials       3093     3097       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...che/gobblin/compaction/mapreduce/orc/OrcUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNVdGlscy5qYXZh) | `56.25% <0.00%> (-8.04%)` | `4.00 <0.00> (ø)` | |
   | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `81.55% <76.31%> (+2.68%)` | `24.00 <8.00> (+8.00)` | |
   | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
   | [...he/gobblin/metrics/reporter/ScheduledReporter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9yZXBvcnRlci9TY2hlZHVsZWRSZXBvcnRlci5qYXZh) | `59.09% <0.00%> (-1.52%)` | `14.00% <0.00%> (-1.00%)` | |
   | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `80.37% <0.00%> (+0.93%)` | `24.00% <0.00%> (ø%)` | |
   | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `40.47% <0.00%> (+3.17%)` | `16.00% <0.00%> (+1.00%)` | |
   | [...e/gobblin/runtime/app/ServiceBasedAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2954/diff?src=pr&el=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBwL1NlcnZpY2VCYXNlZEFwcExhdW5jaGVyLmphdmE=) | `49.51% <0.00%> (+5.82%)` | `12.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=footer). Last update [9879303...7050e0b](https://codecov.io/gh/apache/incubator-gobblin/pull/2954?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132844
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
     }
+    writeCount += 1;
 
     context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
   }
 
   /**
-   * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
-   * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+   * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
+   * Limitation:
+   * 1. Do not support up-convert key type in map.
+   * 2. Conversion only happens if that comply with org.apache.gobblin.compaction.mapreduce.orc.OrcValueMapper#isEvolutionValid return true.
 
 Review comment:
   "if that comply with" -> "if".

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] asfgit closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132289
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
 
 Review comment:
   move this line outside the if..else block, since it is both in the if as well as else conditions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
sv2000 commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407132758
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
     }
+    writeCount += 1;
 
     context.getCounter(EVENT_COUNTER.RECORD_COUNT).increment(1);
   }
 
   /**
-   * If a {@link OrcStruct}'s schema differs from newest schema obtained when creating MR jobs (which is the
-   * newest schema seen by the MR job), all the other ORC object will need to be up-converted.
+   * Up convert the {@link OrcStruct} towards {@link #mapperSchema} recursively.
+   * Limitation:
+   * 1. Do not support up-convert key type in map.
 
 Review comment:
   "Does not support up-conversion of key types in Maps" ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #2954: [GOBBLIN-1114] OrcValueMapper schema evolution up-conversion recursive
URL: https://github.com/apache/incubator-gobblin/pull/2954#discussion_r407135539
 
 

 ##########
 File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcValueMapper.java
 ##########
 @@ -24,58 +24,80 @@
 
 import org.apache.gobblin.compaction.mapreduce.RecordKeyMapperBase;
 import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.orc.OrcConf;
 import org.apache.orc.TypeDescription;
 import org.apache.orc.impl.ConvertTreeReaderFactory;
 import org.apache.orc.impl.SchemaEvolution;
 import org.apache.orc.mapred.OrcKey;
+import org.apache.orc.mapred.OrcList;
+import org.apache.orc.mapred.OrcMap;
 import org.apache.orc.mapred.OrcStruct;
+import org.apache.orc.mapred.OrcUnion;
 import org.apache.orc.mapred.OrcValue;
 import org.apache.orc.mapreduce.OrcMapreduceRecordReader;
 
+import com.google.common.annotations.VisibleForTesting;
+
+import lombok.extern.slf4j.Slf4j;
+
 
 /**
  * To keep consistent with {@link OrcMapreduceRecordReader}'s decision on implementing
  * {@link RecordReader} with {@link NullWritable} as the key and generic type of value, the ORC Mapper will
  * read in the record as the input value.
  */
+@Slf4j
 public class OrcValueMapper extends RecordKeyMapperBase<NullWritable, OrcStruct, Object, OrcValue> {
 
   private OrcValue outValue;
   private TypeDescription mapperSchema;
 
+  // This is added mostly for debuggability.
+  private static int writeCount = 0;
+
   @Override
   protected void setup(Context context)
       throws IOException, InterruptedException {
     super.setup(context);
     this.outValue = new OrcValue();
-    this.mapperSchema = TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
+    this.mapperSchema =
+        TypeDescription.fromString(context.getConfiguration().get(OrcConf.MAPRED_INPUT_SCHEMA.getAttribute()));
   }
 
   @Override
   protected void map(NullWritable key, OrcStruct orcStruct, Context context)
       throws IOException, InterruptedException {
-    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, context);
-    if (context.getNumReduceTasks() == 0) {
-      this.outValue.value = upConvertedStruct;
-      context.write(NullWritable.get(), this.outValue);
-    } else {
-      this.outValue.value = upConvertedStruct;
-      context.write(getDedupKey(upConvertedStruct), this.outValue);
+    OrcStruct upConvertedStruct = upConvertOrcStruct(orcStruct, mapperSchema);
+    try {
+      if (context.getNumReduceTasks() == 0) {
+        this.outValue.value = upConvertedStruct;
+        context.write(NullWritable.get(), this.outValue);
+      } else {
+        this.outValue.value = upConvertedStruct;
+        context.write(getDedupKey(upConvertedStruct), this.outValue);
+      }
+    } catch (Exception e) {
+      throw new RuntimeException("Failure in write record no." + writeCount, e);
 
 Review comment:
   Yes, finding the filename here seems non-trivial since I need the split id to be available in the context object. I have another ticket to track this work. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services