You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Duy Le (JIRA)" <ji...@apache.org> on 2018/10/08 11:30:00 UTC
[jira] [Created] (BEAM-5674) DataflowRunner.Deduplicate/WithKeys
cannot infer Coder for K when running with experiment
"enable_custom_pubsub_source"
Duy Le created BEAM-5674:
----------------------------
Summary: DataflowRunner.Deduplicate/WithKeys cannot infer Coder for K when running with experiment "enable_custom_pubsub_source"
Key: BEAM-5674
URL: https://issues.apache.org/jira/browse/BEAM-5674
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Affects Versions: 2.7.0
Environment: OS: Linux 64bit,
Platform: Dataflow
Reporter: Duy Le
Assignee: Henning Rohde
*Bug*: DataflowRunner.Deduplicate/WithKeys cannot infer Coder for K when running with experiment "enable_custom_pubsub_source"
*Steps*
# Start a Beam pipeline with DataflowRunner using ExperimentalOptions."enable_custom_pubsub_source"
# Observe the result when the pipeline is being constructed.
*Actual*: An error thrown "Unable to return a default Coder for PubsubIO.Read/PubsubUnboundedSource/Read(PubsubSource)/DataflowRunner.Deduplicate/WithKeys/AddKeys/Map/ParMultiDo(Anonymous).output [PCollection]..."
*Expected*: The pipeline should be constructed successfully.
*Root cause*:
In
{noformat}
DataflowRunner.Deduplicate{noformat}
transform, it applies
{code:java}
WithKeys.of(){code}
transform to an input of *ValueWithRecordId* with a function in the style of Java 8 lambda.
As the Javadoc states that:
{code:java}
If using a lambda in Java 8, {@link #withKeyType(TypeDescriptor)} must be called on the result {@link PTransform}{code}
*Suggested Solution*:
Since the lambda function returns a hashed code (_int_) of the
{code:java}
value.getId(){code}
(_byte[]_), can we just use
{code:java}
withKeyType(TypeDescriptors.integers()){code}
right after the
{code:java}
WithKeys.of(){code}
method?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)