You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2021/04/15 03:34:24 UTC
[GitHub] [gobblin] sv2000 opened a new pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
sv2000 opened a new pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
### JIRA
- [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
- https://issues.apache.org/jira/browse/GOBBLIN-1417
### Description
- [x] Here are some details about my PR, including screenshots (if applicable):
Current behavior enforces the default row batch size of 1024. To prevent OOMs during writes, this config may need to be overridden.
### Tests
- [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
Tested with a MR job and ensuring the config is overridden.
### Commits
- [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] sv2000 closed pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
sv2000 closed pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] sv2000 commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
sv2000 commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-817149051
@autumnust Please review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] autumnust commented on a change in pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#discussion_r613609065
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcKeyCompactorOutputFormat.java
##########
@@ -32,14 +31,19 @@
import org.apache.orc.mapreduce.OrcMapreduceRecordWriter;
import org.apache.orc.mapreduce.OrcOutputFormat;
-import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.*;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter;
+import org.apache.gobblin.writer.GobblinOrcWriter;
+
+import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.COMPACTION_OUTPUT_EXTENSION;
Review comment:
no, explicit import is obviously better here. thanks for fixing this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] asfgit closed pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] autumnust commented on a change in pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#discussion_r613380033
##########
File path: gobblin-compaction/src/main/java/org/apache/gobblin/compaction/mapreduce/orc/OrcKeyCompactorOutputFormat.java
##########
@@ -32,14 +31,19 @@
import org.apache.orc.mapreduce.OrcMapreduceRecordWriter;
import org.apache.orc.mapreduce.OrcOutputFormat;
-import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.*;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter;
+import org.apache.gobblin.writer.GobblinOrcWriter;
+
+import static org.apache.gobblin.compaction.mapreduce.CompactorOutputCommitter.COMPACTION_OUTPUT_EXTENSION;
Review comment:
This doesn't seem to be needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] codecov-commenter commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-833960798
# [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#3253](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bf4bd3f) into [master](https://codecov.io/gh/apache/gobblin/commit/701840858f1323a1f009a44bc70c1e14e7464935?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7018408) will **increase** coverage by `0.00%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/gobblin/pull/3253/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #3253 +/- ##
========================================
Coverage 8.99% 9.00%
- Complexity 1740 1743 +3
========================================
Files 2037 2037
Lines 79237 79244 +7
Branches 8838 8838
========================================
+ Hits 7129 7137 +8
Misses 71406 71406
+ Partials 702 701 -1
```
| [Impacted Files](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...va/org/apache/gobblin/writer/GobblinOrcWriter.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tb3JjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9Hb2JibGluT3JjV3JpdGVyLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
| [...action/mapreduce/CompactionOrcJobConfigurator.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL0NvbXBhY3Rpb25PcmNKb2JDb25maWd1cmF0b3IuamF2YQ==) | `93.75% <100.00%> (+0.89%)` | `9.00 <2.00> (+2.00)` | |
| [...ion/mapreduce/orc/OrcKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNLZXlDb21wYWN0b3JPdXRwdXRGb3JtYXQuamF2YQ==) | `100.00% <100.00%> (ø)` | `5.00 <1.00> (+1.00)` | |
| [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `92.85% <0.00%> (+7.14%)` | `3.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7018408...bf4bd3f](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [gobblin] codecov-io commented on pull request #3253: GOBBLIN-1417: Make row batch size in ORC Mapreduce writer configurable
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #3253:
URL: https://github.com/apache/gobblin/pull/3253#issuecomment-819852218
# [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#3253](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (45813ae) into [master](https://codecov.io/gh/apache/gobblin/commit/d9ae5353c74fdcd385835fca9b586b3fdb90971b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d9ae535) will **decrease** coverage by `37.41%`.
> The diff coverage is `100.00%`.
> :exclamation: Current head 45813ae differs from pull request most recent head d92f54a. Consider uploading reports for the commit d92f54a to get more accurate results
[![Impacted file tree graph](https://codecov.io/gh/apache/gobblin/pull/3253/graphs/tree.svg?width=650&height=150&src=pr&token=4MgURJ0bGc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #3253 +/- ##
============================================
- Coverage 46.42% 9.00% -37.42%
+ Complexity 9974 1740 -8234
============================================
Files 2034 2034
Lines 79042 79092 +50
Branches 8809 8816 +7
============================================
- Hits 36692 7124 -29568
- Misses 38938 71268 +32330
+ Partials 3412 700 -2712
```
| [Impacted Files](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...va/org/apache/gobblin/writer/GobblinOrcWriter.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tb3JjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9Hb2JibGluT3JjV3JpdGVyLmphdmE=) | `0.00% <ø> (-62.04%)` | `0.00 <0.00> (-21.00)` | |
| [...action/mapreduce/CompactionOrcJobConfigurator.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL0NvbXBhY3Rpb25PcmNKb2JDb25maWd1cmF0b3IuamF2YQ==) | `93.75% <100.00%> (+0.89%)` | `9.00 <2.00> (+2.00)` | |
| [...ion/mapreduce/orc/OrcKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNLZXlDb21wYWN0b3JPdXRwdXRGb3JtYXQuamF2YQ==) | `100.00% <100.00%> (ø)` | `5.00 <1.00> (+1.00)` | |
| [...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
| [...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
| [...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
| [...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
| [...va/org/apache/gobblin/converter/EmptyIterable.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9FbXB0eUl0ZXJhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
| [...org/apache/gobblin/ack/BasicAckableForTesting.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vYWNrL0Jhc2ljQWNrYWJsZUZvclRlc3RpbmcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
| [...n/java/org/apache/gobblin/salesforce/SfConfig.java](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2ZDb25maWcuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
| ... and [1079 more](https://codecov.io/gh/apache/gobblin/pull/3253/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d9ae535...d92f54a](https://codecov.io/gh/apache/gobblin/pull/3253?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org