You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/28 17:07:39 UTC

[GitHub] [beam] mosche opened a new pull request, #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

mosche opened a new pull request, #24380:
URL: https://github.com/apache/beam/pull/24380

   Please check `NamedAggregators`, it is not possible to add an entry to its internal map. With that all classes removed in this PR are absolutely useless and fulfill no purpose.
   
   Strictly speaking this is a breaking change. Though, considering these classes are removed from a runner and shouldn't appear on the user's classpath + they are totally useless I consider the chances of breaking any user's code to be extremely low.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339032060

   @mosche Could you elaborate more why it should be removed? It's not clear for me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339698356

   Ok, I'm convinced, thanks for discussion.
   LGTM.
   
   > The deprecation strategy is a bad trap (unless these deprecations are removed in a disciplined manner) deteriorating quality quickly over time. And Beam is just a perfect example here if you look at all the deprecations in Beam's core.
   
   Though, I just wanted to add on this that it's up to committers to track this and check from time to time and finally remove or, at least, ping people who is in charge of this part of the project. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339624740

   > > What I wonder here is that we will drop the whole metric system right ? I remember @mosche you said that it is working only in local mode. Can you elaborate ?
   > 
   > @echauchot This doesn't drop the metrics system. Currently there's two Spark accumulators registered:
   > 
   > * `NamedAggregatorsAccumulator` in `AggregatorsAccumulator`: This one isn't used at all, the accumulator is always empty and doesn't expose any API to change that.
   > * `MetricsContainerStepMapAccumulator` in `MetricsAccumulator`: This is the accumulator that integrates with the `MetricsContainerStepMap` to expose container metrics.
   > 
   Ah yes, I remember this part from when I worked on the MetricsPusher service. I wanted to ensure that it is not dropped. Fair enough.
   > I'm only removing the first obsolete part. The metric system integration is kept. Also it's not related to the bug on the Dataset runner i mentioned.
   ok
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339715833

   > Though, I just wanted to add on this that it's up to committers to track this and check from time to time and finally remove or, at least, ping people who is in charge of this part of the project.
   
   Absolutely, it is! Unfortunately there's also lots of attrition everywhere and new contributors then lack context. It's an unhealthy cycle and Beam's lack of major versions doesn't make it any easier :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1329457829

   R: @aromanenko-dev 
   R: @echauchot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339054515

   @aromanenko-dev Please have a look at the `NamedAggregators` class. Spark requires `Accumulators` to be registered ahead of time. `NamedAggregators` is meant to be a more dynamic container that allows dynamic accumulators. However, the whole thing is totally useless. It's not possible to add any such accumulator into the internal state map of `NamedAggregators`, it's always empty!
   
   Here's two things you can check to understand this better:
   - Have a look at `NamedAggregators#mNamedAggregators` to check where entries are added (`put`). There's no such place where this is possible.
   - Check usage of the `NamedAggregators.State` interface, which describes a named aggregator. This is never used.
   
   I have no glue for what reasons that code was ever added, but it's broken and can't be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339097172

   In case if it doesn't cause any known bugs, I'd mark this code as `deprecated` and add the related comments about causes of this. 
   @echauchot What do you think on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339559503

   > What I wonder here is that we will drop the whole metric system right ? I remember @mosche you said that it is working only in local mode. Can you elaborate ?
   
   @echauchot This doesn't drop the metrics system. Currently there's two Spark accumulators registered:
   - `NamedAggregatorsAccumulator` in `AggregatorsAccumulator`: This one isn't used at all, the accumulator is always empty and doesn't expose any API to change that.
   - `MetricsContainerStepMapAccumulator` in `MetricsAccumulator`: This is the accumulator that integrates with the `MetricsContainerStepMap` to expose container metrics.
   
   I'm only removing the first obsolete part. The metric system integration is kept. 
   Also it's not related to the bug on the Dataset runner i mentioned. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339546660

   @mosche @aromanenko-dev I would tend to agree with @mosche: deprecation is mostly for user facing code like the SDK. Users will just package the runner code in their uber jar. So deprecation seems useless on such a non-user facing code. If these classes are **really** not used in the runner I go for removing them. 
   
   What I wonder here is that we will drop the whole metric system right ? I remember @mosche you said that it is working only in local mode. Can you elaborate ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1346620677

   > Absolutely, it is! Unfortunately there's also lots of attrition everywhere and new contributors then lack context. It's an unhealthy cycle and Beam's lack of major versions doesn't make it any easier :/
   
   I agree that this is a problem (though, Beam, as open source project, is not an exception here) and I'd suggest to start a discussion about this on dev@beam.apache.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339165867

   @mosche My general point for such cases in open source projects - we can't just remove a code that potentially could be used even if it's not working or buggy but was sitting there for a long time. 
   
   If it's a bug then it should be fixed or just deprecated (with correspondent warnings) and removed later. Actually, this is one of reasons to minimise the number of new public classes/methods that we introduce and we should pay attention for..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] echauchot commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
echauchot commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339130866

   > Thanks. In case if it doesn't cause any known bugs, I'd mark this code as `deprecated` and add the related comments about causes of this.
   > 
   > @echauchot What do you think on this?
   
   Taking a look
   > 
   > PS: @mosche Please, add a related issue for this (if not yet)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev merged pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
aromanenko-dev merged PR #24380:
URL: https://github.com/apache/beam/pull/24380


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339101883

   I don't understand why even bother keeping this as deprecated code. It simply cannot be used.
   Also, this is a runner, so it's even more unlikely that somebody accidentally even uses these broken / useless classes as import. That's the most you can do with these. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339179680

   @aromanenko-dev Generally I would agree. Though, Beam runners are not the typical library used in user code.
   
   Currently there's no way users can do anything meaningful with NamedAggregators, it's simply a broken piece of code left over for whatever reasons. Why would any user import any such related class? It's not buggy, it's dead... you cannot do anything with it.
   
   Btw, the primary reason not to use open source, is the poor quality of so much open source code out there. The deprecation strategy is a bad trap (unless these deprecations are removed in a disciplined manner) deteriorating quality quickly over time. And Beam is just a perfect example here if you look at all the deprecations in Beam's core.
   
   We have to be very careful about our limited resources and energy. IMHO going through the deprecation detour is wasted energy in this case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1329450905

   Run Spark ValidatesRunner
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1329459395

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on pull request #24380: [Spark RDD runner] Remove obsolete unusable AggregatorsAccumulator / NamedAggregators

Posted by GitBox <gi...@apache.org>.
mosche commented on PR #24380:
URL: https://github.com/apache/beam/pull/24380#issuecomment-1339102795

   Issue already exists https://github.com/apache/beam/issues/24379


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org