You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/04/21 18:39:00 UTC
[jira] [Commented] (BEAM-14206) Direct Runner Does Not Work As Well As Google Dataflow
[ https://issues.apache.org/jira/browse/BEAM-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525968#comment-17525968 ]
Kenneth Knowles commented on BEAM-14206:
----------------------------------------
[~chamikara] this has to do with the PubsubIO swap out for a native implementation in Dataflow. Can you link up this bug with some main bug? Maybe a hotlist/label would be good to gather up all the many issues related to this.
> Direct Runner Does Not Work As Well As Google Dataflow
> ------------------------------------------------------
>
> Key: BEAM-14206
> URL: https://issues.apache.org/jira/browse/BEAM-14206
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.37.0
> Reporter: Eric Kolotyluk
> Priority: P2
> Labels: GCP
> Attachments: streaming-pubsub-poc.zip
>
>
> *As a Staff Software Engineer* at Forgerock developing Beam applications,
> *I want* Direct Runner to work the same as Google Dataflow with Flex Templates,
> *So that* Beam applications can be developed and tested locally with a good developer experience.
> *_This is a high-level report that may need to be broken down into more specific reports._*
> {code:java}
> var stream = pipeline // Extract - Transform - Load, Lather Rinse Repeat
> //.apply("Extract from Source", PubsubIO.readStrings().fromSubscription(inputSubscription).withTimestampAttribute("date"))
> .apply("Extract from Source", PubsubIO.readStrings().fromSubscription(inputSubscription))
> .apply("Transform to TableRow", ParDo.of(new TransformToTableRow()))
> .apply("Define Window", Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(1))).withAllowedLateness(Duration.standardMinutes(10)).accumulatingFiredPanes())
> .apply("Transform to JSON", MapElements.into(TypeDescriptors.strings()).via(GenericJson::toString));
> stream.apply("Load to PubSub", PubsubIO.writeStrings().to(sinkTopic));
> stream.apply("Load to GCS", TextIO.write().to(sinkUri).withWindowedWrites().withNumShards(1));
> {code}
> This code works as expected running under Google Dataflow as a Flex Template.
> When running locally under Direct Runner there are two problems
> # .withTimestampAttribute("date") fails to find the "date" attribute
> ** {{Exception in thread "main" java.lang.IllegalArgumentException: PubSub message is missing a value for timestamp attribute date}}
> # No results are written to GCS
> ** However, results are written to the local file system as temporary files
> ** Results are written to PubSub correctly
> More context can be provided on request...
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)