You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Eric Kolotyluk (Jira)" <ji...@apache.org> on 2022/03/30 15:50:00 UTC

[jira] [Updated] (BEAM-14206) Direct Runner Does Not Work As Well As Google Dataflow

     [ https://issues.apache.org/jira/browse/BEAM-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Kolotyluk updated BEAM-14206:
----------------------------------
    Description: 
*As a Staff Software Engineer* at Forgerock developing Beam applications,
*I want* Direct Runner to work the same as Google Dataflow with Flex Templates,
*So that* Beam applications can be developed and tested locally.

*_This is a high-level report that may need to be broken down into more specific reports._*
{code:java}
var stream = pipeline // Extract - Transform - Load, Lather Rinse Repeat
    //.apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription).withTimestampAttribute("date"))
    .apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription))
    .apply("Transform to TableRow", ParDo.of(new TransformToTableRow()))
    .apply("Define Window",         Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(1))).withAllowedLateness(Duration.standardMinutes(10)).accumulatingFiredPanes())

    .apply("Transform to JSON",     MapElements.into(TypeDescriptors.strings()).via(GenericJson::toString));

stream.apply("Load to PubSub",      PubsubIO.writeStrings().to(sinkTopic));

stream.apply("Load to GCS",   TextIO.write().to(sinkUri).withWindowedWrites().withNumShards(1));
{code}
This code works as expected running under Google Dataflow as a Flex Template.

When running locally under Direct Runner there are two problems
 # .withTimestampAttribute("date") fails to find the "date" attribute
 # No results are written to GCS
 ** However, results are written to the local file system as temporary files
 ** Results are written to PubSub correctly

More context can be provided on request...

 

  was:
*As a Staff Software Engineer* at Forgerock developing Beam applications,
*I want* Direct Runner to work the same as Google Dataflow with Flex Templates,
*So that* Beam applications can be developed and tested locally.

*_This is a high-level report that may need to be broken down into more specific reports._*
{code:java}
var stream = pipeline // Extract - Transform - Load, Lather Rinse Repeat
    //.apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription).withTimestampAttribute("date"))
    .apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription))
    .apply("Transform to TableRow", ParDo.of(new TransformToTableRow()))
    .apply("Define Window",         Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(1))).withAllowedLateness(Duration.standardMinutes(10)).accumulatingFiredPanes())

    .apply("Transform to JSON",     MapElements.into(TypeDescriptors.strings()).via(GenericJson::toString));

stream.apply("Load to PubSub",      PubsubIO.writeStrings().to(sinkTopic));

stream.apply("Load to GCS",   TextIO.write().to(sinkUri).withWindowedWrites().withNumShards(1));
{code}
This code work as expected running under Google Dataflow as a Flex Template.

When running locally under Direct Runner there are two problems
 # .withTimestampAttribute("date") fails to find the "date" attribute
 # No results are written to GCS
 ** However, results are written to the local file system as temporary files
 ** Results are written to PubSub correctly

 

 


> Direct Runner Does Not Work As Well As Google Dataflow
> ------------------------------------------------------
>
>                 Key: BEAM-14206
>                 URL: https://issues.apache.org/jira/browse/BEAM-14206
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>    Affects Versions: 2.37.0
>            Reporter: Eric Kolotyluk
>            Priority: P2
>              Labels: GCP
>
> *As a Staff Software Engineer* at Forgerock developing Beam applications,
> *I want* Direct Runner to work the same as Google Dataflow with Flex Templates,
> *So that* Beam applications can be developed and tested locally.
> *_This is a high-level report that may need to be broken down into more specific reports._*
> {code:java}
> var stream = pipeline // Extract - Transform - Load, Lather Rinse Repeat
>     //.apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription).withTimestampAttribute("date"))
>     .apply("Extract from Source",   PubsubIO.readStrings().fromSubscription(inputSubscription))
>     .apply("Transform to TableRow", ParDo.of(new TransformToTableRow()))
>     .apply("Define Window",         Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(1))).withAllowedLateness(Duration.standardMinutes(10)).accumulatingFiredPanes())
>     .apply("Transform to JSON",     MapElements.into(TypeDescriptors.strings()).via(GenericJson::toString));
> stream.apply("Load to PubSub",      PubsubIO.writeStrings().to(sinkTopic));
> stream.apply("Load to GCS",   TextIO.write().to(sinkUri).withWindowedWrites().withNumShards(1));
> {code}
> This code works as expected running under Google Dataflow as a Flex Template.
> When running locally under Direct Runner there are two problems
>  # .withTimestampAttribute("date") fails to find the "date" attribute
>  # No results are written to GCS
>  ** However, results are written to the local file system as temporary files
>  ** Results are written to PubSub correctly
> More context can be provided on request...
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)