You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Frank Yellin (JIRA)" <ji...@apache.org> on 2016/08/06 00:06:20 UTC

[jira] [Created] (BEAM-536) Aggregator.py. More misleading documentation. More bad documentation

Frank Yellin created BEAM-536:
---------------------------------

             Summary: Aggregator.py.  More misleading documentation.  More bad documentation
                 Key: BEAM-536
                 URL: https://issues.apache.org/jira/browse/BEAM-536
             Project: Beam
          Issue Type: Bug
            Reporter: Frank Yellin
            Priority: Minor


The last paragraph of the documentation for Aggregator is:

You can also query the combined value(s) of an aggregator by calling
aggregated_value() or aggregated_values() on the result object returned after
running a pipeline.

There are multiple problems in this one sentence!

#1) There is no such method aggregated_value() that I can find anywhere.

#2) DirectRunner implements aggregated_values(), but DirectPipelineRunner does not.  The latter is the far more interesting case.

#3) When I use a BlockingDirectPipelineRunner and ask for its aggregated_values(), I get an error message indicating that this is not implemented in DirectPipelineRunner.  Very confusing since I never asked for a DirectPipelineRunner.

It is clear that this is because BlockingDirectPipelineRunner is a method rather than a class.  Is this really the right thing?  Will there be other confusing error messages.

#4) The documentation for aggregated_values() says "returns a dict of step names to values of the aggregator."  I have no idea what a "step" means in this context.  In practice, it seems to be a single-element dictionary whose key is 'user--' prefixed onto the aggregator name.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)