You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/06 18:18:32 UTC
[GitHub] [beam] TheNeuralBit opened a new pull request, #22174: Parallelizable mean
TheNeuralBit opened a new pull request, #22174:
URL: https://github.com/apache/beam/pull/22174
(Depends on #22173, see 23884a3280663f46523449623e3f6d750ce8a2a3 for changes relevant to this PR)
Fixes #22171
This adds a parallelizable custom implementation of `DeferredSeries.mean`, which uses `sum()/count()`. In addition:
- `DeferredDataFrame` leverages this implementation through `_agg_method`
- Updates tests
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22174: Parallelizable mean
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22174:
URL: https://github.com/apache/beam/pull/22174#issuecomment-1176776191
Run Python PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22174: Parallelizable DataFrame/Series mean
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22174:
URL: https://github.com/apache/beam/pull/22174#issuecomment-1176847000
Run Python PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #22174: Parallelizable mean
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22174:
URL: https://github.com/apache/beam/pull/22174#issuecomment-1176655642
Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @AnandInguva for label python.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] codecov[bot] commented on pull request #22174: Parallelizable mean
Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #22174:
URL: https://github.com/apache/beam/pull/22174#issuecomment-1176554037
# [Codecov](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#22174](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (23884a3) into [master](https://codecov.io/gh/apache/beam/commit/31970d73c950573322255d7aca1cb5f87fb5c795?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (31970d7) will **decrease** coverage by `0.00%`.
> The diff coverage is `100.00%`.
```diff
@@ Coverage Diff @@
## master #22174 +/- ##
==========================================
- Coverage 74.22% 74.22% -0.01%
==========================================
Files 702 702
Lines 92829 92838 +9
==========================================
+ Hits 68904 68910 +6
- Misses 22658 22661 +3
Partials 1267 1267
```
| Flag | Coverage Δ | |
|---|---|---|
| python | `83.61% <100.00%> (-0.01%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `95.41% <100.00%> (+0.15%)` | :arrow_up: |
| [sdks/python/apache\_beam/io/source\_test\_utils.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vc291cmNlX3Rlc3RfdXRpbHMucHk=) | `88.01% <0.00%> (-1.39%)` | :arrow_down: |
| [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.97% <0.00%> (-0.76%)` | :arrow_down: |
| [sdks/python/apache\_beam/runners/direct/executor.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvZXhlY3V0b3IucHk=) | `96.46% <0.00%> (-0.55%)` | :arrow_down: |
| [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.54% <0.00%> (-0.13%)` | :arrow_down: |
| [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.59% <0.00%> (-0.13%)` | :arrow_down: |
| [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/22174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.98% <0.00%> (+0.72%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [31970d7...23884a3](https://codecov.io/gh/apache/beam/pull/22174?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22174: Parallelizable DataFrame/Series mean
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on code in PR #22174:
URL: https://github.com/apache/beam/pull/22174#discussion_r917413791
##########
sdks/python/apache_beam/dataframe/frames.py:
##########
@@ -1577,6 +1578,17 @@ def std(self, *args, **kwargs):
# Compute variance (deferred scalar) with same args, then sqrt it
return self.var(*args, **kwargs).apply(lambda var: math.sqrt(var))
+ @frame_base.with_docs_from(pd.Series)
+ @frame_base.args_to_kwargs(pd.Series)
+ @frame_base.populate_defaults(pd.Series)
+ def mean(self, skipna, **kwargs):
Review Comment:
That's actually what the decorators do - populate_defaults pulls the default values from the panda implementation of the function, so we don't have to duplicate them here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #22174: Parallelizable DataFrame/Series mean
Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged PR #22174:
URL: https://github.com/apache/beam/pull/22174
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] AnandInguva commented on a diff in pull request #22174: Parallelizable DataFrame/Series mean
Posted by GitBox <gi...@apache.org>.
AnandInguva commented on code in PR #22174:
URL: https://github.com/apache/beam/pull/22174#discussion_r917328278
##########
sdks/python/apache_beam/dataframe/frames.py:
##########
@@ -1577,6 +1578,17 @@ def std(self, *args, **kwargs):
# Compute variance (deferred scalar) with same args, then sqrt it
return self.var(*args, **kwargs).apply(lambda var: math.sqrt(var))
+ @frame_base.with_docs_from(pd.Series)
+ @frame_base.args_to_kwargs(pd.Series)
+ @frame_base.populate_defaults(pd.Series)
+ def mean(self, skipna, **kwargs):
Review Comment:
Can we make `skipna` as keyword argument?
```suggestion
def mean(self, skipna=True, **kwargs):
```
##########
sdks/python/apache_beam/dataframe/frames.py:
##########
@@ -1577,6 +1578,17 @@ def std(self, *args, **kwargs):
# Compute variance (deferred scalar) with same args, then sqrt it
return self.var(*args, **kwargs).apply(lambda var: math.sqrt(var))
+ @frame_base.with_docs_from(pd.Series)
+ @frame_base.args_to_kwargs(pd.Series)
+ @frame_base.populate_defaults(pd.Series)
+ def mean(self, skipna, **kwargs):
Review Comment:
Following this https://pandas.pydata.org/docs/reference/api/pandas.Series.mean.html#pandas.Series.mean
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org