You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ahmet Altay (JIRA)" <ji...@apache.org> on 2018/10/03 23:43:00 UTC

[jira] [Updated] (BEAM-4518) Errors when running Python Game stats with a low fixed_window_duration in the DirectRunner

     [ https://issues.apache.org/jira/browse/BEAM-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmet Altay updated BEAM-4518:
------------------------------
    Fix Version/s:     (was: 2.8.0)

> Errors when running Python Game stats with a low fixed_window_duration in the DirectRunner
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-4518
>                 URL: https://issues.apache.org/jira/browse/BEAM-4518
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Ahmet Altay
>            Priority: Major
>
> Using the injector and the following command to start the DirectRunner pipeline:
> python -m apache_beam.examples.complete.game.game_stats \
> --project=google.com:clouddfe \
> --topic projects/google.com:clouddfe/topics/leader_board-$USER-topic-1 \
> --dataset ${USER}_test --fixed_window_duration 1
> Fails with:
> ValueError: PCollection of size 2 with more than one element accessed as a singleton view. First two elements encountered are "13.93", "11.7777777778". [while running 'CalculateSpammyUsers/ProcessAndFilter']
> Offending code is here:
> global_mean_score = (
>  sum_scores
>  | beam.Values()
>  | beam.CombineGlobally(beam.combiners.MeanCombineFn())\
>  .as_singleton_view())
>  # Filter the user sums using the global mean.
>  filtered = (
>  sum_scores
>  # Use the derived mean total score (global_mean_score) as a side input.
>  | 'ProcessAndFilter' >> beam.Filter(
>  lambda key_score, global_mean:\
>  key_score[1] > global_mean * self.SCORE_WEIGHT,
>  global_mean_score))
> Since global_mean_score is the result of CombineGlobally, this is either an issue with CombineGlobally or side inputs implementation in DirectRunner. The latter is more likely since it works on DataflowRunner.
> cc: [~mariagh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)