You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Frances Perry (JIRA)" <ji...@apache.org> on 2017/08/11 20:35:00 UTC

[jira] [Assigned] (BEAM-2450) Transform names and named applications should not be null or empty

     [ https://issues.apache.org/jira/browse/BEAM-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frances Perry reassigned BEAM-2450:
-----------------------------------

    Assignee:     (was: Frances Perry)

> Transform names and named applications should not be null or empty
> ------------------------------------------------------------------
>
>                 Key: BEAM-2450
>                 URL: https://issues.apache.org/jira/browse/BEAM-2450
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model, sdk-java-core, sdk-py
>            Reporter: Scott Wegner
>            Priority: Minor
>
> Beam SDK allows setting the name of a transform [1] and also naming the transform application [2]. If no name is specified on application, the name of the transform is used. If no name is specified for the transform, the class name is used.
> The application name serves as metadata for the applied PTransforms in the constructed graph. The are effectively extra display data (historically, PTransform names predate display data). The names are used by runners for UI and monitoring applications, such as the displayed pipeline graph in the Dataflow Monitoring UI [3].
> Currently there is no explicit validation on the specified application name. The current behavior seems to be:
> * null application names cause a NullPointerException at construction time.
> * Specifying the empty string compiles and succeeds in the DirectRunner, but causes strange behavior in Dataflow when rendering the graph in the UI. I have not tested the behavior of other runners.
> We should add explicit validation in the model on the specified transform name and application name. I propose that we disallow null and empty names.
> This is technically a breaking change as the SDK currently allows the empty string, but only because it is under-specified. The upgrade path for any pipelines broken by this change is simple: specify a non-empty name or fallback to the default class name.
> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236
> [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295
> [3] https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)