You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ahmet Altay (JIRA)" <ji...@apache.org> on 2016/08/06 01:02:20 UTC
[jira] [Commented] (BEAM-536) Aggregator.py. More misleading documentation. More bad documentation

    [ https://issues.apache.org/jira/browse/BEAM-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410353#comment-15410353 ] 

Ahmet Altay commented on BEAM-536:
----------------------------------

#1 - The comment should be cleaned.

#2 - Tracking issue: https://issues.apache.org/jira/browse/BEAM-531

#3 - BlockingDataflowPipelineRunner is being removed for Java (https://github.com/apache/incubator-beam/pull/762) . It is being replaced with an optional set of wait...() methods on the result. We should do the same thing in the Python SDK.

__str__ and __repr__ methods of DataflowPipelineRunner also use class name (https://github.com/aaltay/incubator-beam/blob/python-sdk/sdks/python/apache_beam/runners/dataflow_runner.py#L651). So printing the BlockingDataflowPipelineRunner object will use the wrong name.

#4 - This also needs doc improvements. (related javadoc https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/AggregatorPipelineExtractor.html#getAggregatorSteps--)  User counters are by default prefixed with user-, there might be non user- aggregators once DataflowPipelineRunner implements this.

I believe #1 and #4 can be tracked here for documentation changes. #3 requires a new bug of its own.



> Aggregator.py.  More misleading documentation.  More bad documentation
> ----------------------------------------------------------------------
>
>                 Key: BEAM-536
>                 URL: https://issues.apache.org/jira/browse/BEAM-536
>             Project: Beam
>          Issue Type: Bug
>            Reporter: Frank Yellin
>            Priority: Minor
>
> The last paragraph of the documentation for Aggregator is:
> You can also query the combined value(s) of an aggregator by calling
> aggregated_value() or aggregated_values() on the result object returned after
> running a pipeline.
> There are multiple problems in this one sentence!
> #1) There is no such method aggregated_value() that I can find anywhere.
> #2) DirectRunner implements aggregated_values(), but DirectPipelineRunner does not.  The latter is the far more interesting case.
> #3) When I use a BlockingDirectPipelineRunner and ask for its aggregated_values(), I get an error message indicating that this is not implemented in DirectPipelineRunner.  Very confusing since I never asked for a DirectPipelineRunner.
> It is clear that this is because BlockingDirectPipelineRunner is a method rather than a class.  Is this really the right thing?  Will there be other confusing error messages.
> #4) The documentation for aggregated_values() says "returns a dict of step names to values of the aggregator."  I have no idea what a "step" means in this context.  In practice, it seems to be a single-element dictionary whose key is 'user--' prefixed onto the aggregator name.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)