You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2021/04/15 03:34:24 UTC

[GitHub] [gobblin] sv2000 opened a new pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

sv2000 opened a new pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253


   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1417
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if applicable):
   Current behavior enforces the default row batch size of 1024. To prevent OOMs during writes, this config may need to be overridden.
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   Tested with a MR job and ensuring the config is overridden.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] sv2000 closed pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
sv2000 closed pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] sv2000 commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
sv2000 commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-817149051


   @autumnust Please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] autumnust commented on a change in pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#discussion_r613609065



##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcKeyCompactorOutputFormat.java
##########
@@ -32,14 +31,19 @@
 import org.apache.orc.mapreduce.OrcMapreduceRecordWriter;
 import org.apache.orc.mapreduce.OrcOutputFormat;
 
-import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.*;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter;
+import org.apache.gobblin.writer.GobblinOrcWriter;
+
+import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.COMPACTION_OUTPUT_EXTENSION;

Review comment:
       no, explicit import is obviously better here. thanks for fixing this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] asfgit closed pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] autumnust commented on a change in pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#discussion_r613380033



##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcKeyCompactorOutputFormat.java
##########
@@ -32,14 +31,19 @@
 import org.apache.orc.mapreduce.OrcMapreduceRecordWriter;
 import org.apache.orc.mapreduce.OrcOutputFormat;
 
-import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.*;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter;
+import org.apache.gobblin.writer.GobblinOrcWriter;
+
+import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.COMPACTION_OUTPUT_EXTENSION;

Review comment:
       This doesn't seem to be needed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] codecov-commenter commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-833960798


   # [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3253](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bf4bd3f) into [master](https://codecov.io/gh/apache/gobblin/commit/701840858f1323a1f009a44bc70c1e14e7464935?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7018408) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/gobblin/pull/3253/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff            @@
   ##             master   #3253   +/-   ##
   ========================================
     Coverage      8.99%   9.00%           
   - Complexity     1740    1743    +3     
   ========================================
     Files          2037    2037           
     Lines         79237   79244    +7     
     Branches       8838    8838           
   ========================================
   + Hits           7129    7137    +8     
     Misses        71406   71406           
   + Partials        702     701    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...va/org/apache/gobblin/writer/GobblinOrcWriter.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tb3JjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9Hb2JibGluT3JjV3JpdGVyLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | [...action/mapreduce/CompactionOrcJobConfigurator.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL0NvbXBhY3Rpb25PcmNKb2JDb25maWd1cmF0b3IuamF2YQ==) | `93.75% <100.00%> (+0.89%)` | `9.00 <2.00> (+2.00)` | |
   | [...ion/mapreduce/orc/OrcKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNLZXlDb21wYWN0b3JPdXRwdXRGb3JtYXQuamF2YQ==) | `100.00% <100.00%> (ø)` | `5.00 <1.00> (+1.00)` | |
   | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0.00%> (+7.14%)` | `3.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7018408...bf4bd3f](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gobblin] codecov-io commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-819852218


   # [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3253](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (45813ae) into [master](https://codecov.io/gh/apache/gobblin/commit/d9ae5353c74fdcd385835fca9b586b3fdb90971b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d9ae535) will **decrease** coverage by `37.41%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head 45813ae differs from pull request most recent head d92f54a. Consider uploading reports for the commit d92f54a to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/gobblin/pull/3253/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3253       +/-   ##
   ============================================
   - Coverage     46.42%   9.00%   -37.42%     
   + Complexity     9974    1740     -8234     
   ============================================
     Files          2034    2034               
     Lines         79042   79092       +50     
     Branches       8809    8816        +7     
   ============================================
   - Hits          36692    7124    -29568     
   - Misses        38938   71268    +32330     
   + Partials       3412     700     -2712     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...va/org/apache/gobblin/writer/GobblinOrcWriter.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tb3JjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9Hb2JibGluT3JjV3JpdGVyLmphdmE=) | `0.00% <ø> (-62.04%)` | `0.00 <0.00> (-21.00)` | |
   | [...action/mapreduce/CompactionOrcJobConfigurator.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL0NvbXBhY3Rpb25PcmNKb2JDb25maWd1cmF0b3IuamF2YQ==) | `93.75% <100.00%> (+0.89%)` | `9.00 <2.00> (+2.00)` | |
   | [...ion/mapreduce/orc/OrcKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNLZXlDb21wYWN0b3JPdXRwdXRGb3JtYXQuamF2YQ==) | `100.00% <100.00%> (ø)` | `5.00 <1.00> (+1.00)` | |
   | [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | ... and [1079 more](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d9ae535...d92f54a](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org