You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Frank Yellin (JIRA)" <ji...@apache.org> on 2016/08/06 00:06:20 UTC

[jira] [Created] (BEAM-536) Aggregator.py. More misleading documentation. More bad documentation

Frank Yellin created BEAM-536:
---------------------------------

Summary: Aggregator.py. More misleading documentation. More bad documentation
Key: BEAM-536
URL: https://issues.apache.org/jira/browse/BEAM-536
Project: Beam
Issue Type: Bug
Reporter: Frank Yellin
Priority: Minor

The last paragraph of the documentation for Aggregator is:

You can also query the combined value(s) of an aggregator by calling
aggregated_value() or aggregated_values() on the result object returned after
running a pipeline.

There are multiple problems in this one sentence!

#1) There is no such method aggregated_value() that I can find anywhere.

#2) DirectRunner implements aggregated_values(), but DirectPipelineRunner does not. The latter is the far more interesting case.

#3) When I use a BlockingDirectPipelineRunner and ask for its aggregated_values(), I get an error message indicating that this is not implemented in DirectPipelineRunner. Very confusing since I never asked for a DirectPipelineRunner.

It is clear that this is because BlockingDirectPipelineRunner is a method rather than a class. Is this really the right thing? Will there be other confusing error messages.

#4) The documentation for aggregated_values() says "returns a dict of step names to values of the aggregator." I have no idea what a "step" means in this context. In practice, it seems to be a single-element dictionary whose key is 'user--' prefixed onto the aggregator name.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)