You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2021/03/13 03:22:00 UTC
[jira] [Commented] (BEAM-4518) Errors when running Python Game
stats with a low fixed_window_duration in the DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300695#comment-17300695 ]
Kenneth Knowles commented on BEAM-4518:
---------------------------------------
I am guessing that this is obsolete. Is that right? Can it be closed?
> Errors when running Python Game stats with a low fixed_window_duration in the DirectRunner
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-4518
> URL: https://issues.apache.org/jira/browse/BEAM-4518
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Ahmet Altay
> Priority: P1
>
> Using the injector and the following command to start the DirectRunner pipeline:
> python -m apache_beam.examples.complete.game.game_stats \
> --project=google.com:clouddfe \
> --topic projects/google.com:clouddfe/topics/leader_board-$USER-topic-1 \
> --dataset ${USER}_test --fixed_window_duration 1
> Fails with:
> ValueError: PCollection of size 2 with more than one element accessed as a singleton view. First two elements encountered are "13.93", "11.7777777778". [while running 'CalculateSpammyUsers/ProcessAndFilter']
> Offending code is here:
> global_mean_score = (
> sum_scores
> | beam.Values()
> | beam.CombineGlobally(beam.combiners.MeanCombineFn())\
> .as_singleton_view())
> # Filter the user sums using the global mean.
> filtered = (
> sum_scores
> # Use the derived mean total score (global_mean_score) as a side input.
> | 'ProcessAndFilter' >> beam.Filter(
> lambda key_score, global_mean:\
> key_score[1] > global_mean * self.SCORE_WEIGHT,
> global_mean_score))
> Since global_mean_score is the result of CombineGlobally, this is either an issue with CombineGlobally or side inputs implementation in DirectRunner. The latter is more likely since it works on DataflowRunner.
> cc: [~mariagh]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)