You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (JIRA)" <ji...@apache.org> on 2019/04/01 18:11:00 UTC

[jira] [Comment Edited] (BEAM-6745) Cannot run pipeline on Dataflow (GO SDK)

    [ https://issues.apache.org/jira/browse/BEAM-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779662#comment-16779662 ] 

Robert Burke edited comment on BEAM-6745 at 4/1/19 6:10 PM:
------------------------------------------------------------

There's no *dataflow side* documentation of the SDK, or any statements of support that I'm aware of. At present, if you use the Go SDK on Dataflow, you do so at your own risk. While Google does fund some work on Apache Beam, the Go SDK not currently among the set of things that are being maintained at production quality on Dataflow. ergo, things will break, documentation will become stale.

It's a quirk of portability that it enables "unofficial" language SDK support on any compatible runner. However, there's nothing guaranteeing this, and there's no effort to maintain anything around it, (as determined by this bug).

Official support would include be the version of the dataflow library to provide a compatible, versioned SDK container, without users ever needing to specify anything, and that tests for certain versions of the SDK run successfully against the service and similar. 

In short, it's a matter of it Can run on dataflow, but not necessarily that folks should use it.

I'm hoping to be able to change that, but I can't speak to any timelines at present.

Edit: I think the point I'm trying to make here is that the Go SDK tries to support Dataflow, but that Dataflow, as a paid service, doesn't support the Go SDK, as there are certain expectations once money gets involved. 

 


was (Author: lostluck):
There's no *dataflow side* documentation of the SDK, or any statements of support that I'm aware of. At present, if you use the Go SDK on Dataflow, you do so at your own risk. While Google does fund some work on Apache Beam.

It's a quirk of portability that it enables "unofficial" language SDK support on any compatible runner. However, there's nothing guaranteeing this, and there's no effort to maintain anything around it, (as determined by this bug).

Official support would include be the version of the dataflow library to provide a compatible, versioned SDK container, without users ever needing to specify anything, and that tests for certain versions of the SDK run successfully against the service and similar. 

In short, it's a matter of it Can run on dataflow, but not necessarily that folks should use it.

I'm hoping to be able to change that, but I can't speak to any timelines at present.

Edit: I think the point I'm trying to make here is that the Go SDK tries to support Dataflow, but that Dataflow, as a paid service, doesn't support the Go SDK, as there are certain expectations once money gets involved. 

 

> Cannot run pipeline on Dataflow (GO SDK)
> ----------------------------------------
>
>                 Key: BEAM-6745
>                 URL: https://issues.apache.org/jira/browse/BEAM-6745
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-go
>            Reporter: Michael Chemani
>            Priority: Major
>
> I got 
> ```
> {{Failed to retrieve staged files: failed to retrieve worker in 3 attempts: bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want}}
> ```
>  
> When trying to run 
> ```
> {{dataflow \ --runner dataflow \ --index gs://\{BUCKET}/data_100k.csv \ --output gs://\{BUCKET}/ \ --project {PROJECT} \ --temp_location gs://\{BUCKET}/tmp/ \ --staging_location gs://\{BUCKET}/binaries/ \ --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515}}
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)