You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Katie Liu <ka...@gmail.com> on 2023/01/11 22:38:00 UTC

Portable v.s. non-portable PTransform names

Hi beam-dev,

I have a question regarding the PTransform name formatting.
For the same user defined function, the naming is different using samza
portable is "Kati-Step-2-ParMultiDo-Anonymous-", while in normal mode it is
"Kati-Step-2/ParMultiDo(Anonymous)".

Does this problem only exist in Samza? And are there pointers to where the
PTransform name is generated?

Thanks,
Katie

Re: Portable v.s. non-portable PTransform names

Posted by Luke Cwik via dev <de...@beam.apache.org>.
The PCollection value comes from the key on the pipeline proto[1]. That key
is populated during pipeline construction time[2] and is based upon the
unique name of the PTransform + the name of the output being used (aka tag
with .output being a default).

It looks like the counter PTRANFORM is coming from the metric step name[3].

I would take a look at the pipeline proto[4] that is generated during
pipeline construction and the process bundle descriptors[5] during pipeline
execution to see where something is being changed if at all.

They should be able to have the same style in generated names but tracking
down to where they are being changed is a good first step.

1:
https://github.com/apache/beam/blob/957301519bb76a9647d026885fced1a775a7c9ff/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L68
2:
https://github.com/apache/beam/blob/957301519bb76a9647d026885fced1a775a7c9ff/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java#L33
3:
https://github.com/apache/beam/blob/434427e90b55027c5944fa73de68bff4f9a4e8fe/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerImpl.java#L247
4:
https://github.com/apache/beam/blob/434427e90b55027c5944fa73de68bff4f9a4e8fe/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L91
5:
https://github.com/apache/beam/blob/434427e90b55027c5944fa73de68bff4f9a4e8fe/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L189

On Wed, Jan 11, 2023 at 3:29 PM Katie Liu <ka...@gmail.com> wrote:

> Attaching the monitoring_infos received, if helpful.
>
> I observed that the PCOLLECTION name format is the same in non-portable mode, but the PTRANSFORM name has dashes instead.
>
> ```
>
> monitoring_infos {
>   urn: "beam:metric:element_count:v1"
>   type: "beam:metrics:sum_int64:v1"
>   payload: "\000"
>   labels {
>     key: "PCOLLECTION"
>     value: "Kati-Step-2/ParMultiDo(Anonymous).output"
>   }
> }
> monitoring_infos {
>   urn: "beam:metric:user:sum_int64:v1"
>   type: "beam:metrics:sum_int64:v1"
>   payload: "\n"
>   labels {
>     key: "NAME"
>     value: "count101"
>   }
>   labels {
>     key: "NAMESPACE"
>     value: "org.apache.beam.runners.samza.portable.SamzaPortableTest"
>   }
>   labels {
>     key: "PTRANSFORM"
>     value: "Kati-Step-2-ParMultiDo-Anonymous-"
>   }
> }
>
> ```
>
>
> On Wed, Jan 11, 2023 at 2:38 PM Katie Liu <ka...@gmail.com> wrote:
>
>> Hi beam-dev,
>>
>> I have a question regarding the PTransform name formatting.
>> For the same user defined function, the naming is different using samza
>> portable is "Kati-Step-2-ParMultiDo-Anonymous-", while in normal mode it
>> is "Kati-Step-2/ParMultiDo(Anonymous)".
>>
>> Does this problem only exist in Samza? And are there pointers to where
>> the PTransform name is generated?
>>
>> Thanks,
>> Katie
>>
>

Re: Portable v.s. non-portable PTransform names

Posted by Katie Liu <ka...@gmail.com>.
Attaching the monitoring_infos received, if helpful.

I observed that the PCOLLECTION name format is the same in
non-portable mode, but the PTRANSFORM name has dashes instead.

```

monitoring_infos {
  urn: "beam:metric:element_count:v1"
  type: "beam:metrics:sum_int64:v1"
  payload: "\000"
  labels {
    key: "PCOLLECTION"
    value: "Kati-Step-2/ParMultiDo(Anonymous).output"
  }
}
monitoring_infos {
  urn: "beam:metric:user:sum_int64:v1"
  type: "beam:metrics:sum_int64:v1"
  payload: "\n"
  labels {
    key: "NAME"
    value: "count101"
  }
  labels {
    key: "NAMESPACE"
    value: "org.apache.beam.runners.samza.portable.SamzaPortableTest"
  }
  labels {
    key: "PTRANSFORM"
    value: "Kati-Step-2-ParMultiDo-Anonymous-"
  }
}

```


On Wed, Jan 11, 2023 at 2:38 PM Katie Liu <ka...@gmail.com> wrote:

> Hi beam-dev,
>
> I have a question regarding the PTransform name formatting.
> For the same user defined function, the naming is different using samza
> portable is "Kati-Step-2-ParMultiDo-Anonymous-", while in normal mode it
> is "Kati-Step-2/ParMultiDo(Anonymous)".
>
> Does this problem only exist in Samza? And are there pointers to where the
> PTransform name is generated?
>
> Thanks,
> Katie
>