You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Meriem Sara <Sa...@outlook.fr> on 2020/11/21 23:05:41 UTC

is apache beam go sdk supported by apache Spark runner

Hello everyone. I am trying to use apache beam with Golang to execute a data processing workflow using apache Spark.  However, I am confused if the go SDK is supported by apache Spark. Could you please provide us wirh more information ?

Thank you

Re: is apache beam go sdk supported by apache Spark runner

Posted by Alexey Romanenko <ar...@gmail.com>.
As I can say for Spark Runner, natively it supports only Java SDK. I’m far away a Go SDK expert, but I think you can run a pipeline, written with Go SDK, using a Portable Runner and Spark Runner Job Server, like it’s possible to do for Python SDK pipelines. I’m not sure if it’s already officially supported but I believe that GoSDK-people may provide more details on this. 

Actually, I moved forward and I tried to run it on my side but with no success for now.

1) Run a Spark Runner Job Server:

$ docker run --net=host apache/beam_spark_job_server:latest
20/11/23 17:50:04 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: ArtifactStagingService started on localhost:8098
20/11/23 17:50:04 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: Java ExpansionService started on localhost:8097
20/11/23 17:50:04 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: JobService started on localhost:8099

2) Run strinsplit example on master and I have the protobuf error:

$ go run sdks/go/examples/stringsplit/stringsplit.go --runner=universal --endpoint=localhost:8099

panic: proto: file "v1.proto" is already registered
See https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict


goroutine 1 [running]:
google.golang.org/protobuf/reflect/protoregistry.glob..func1(0x1d85000, 0xc00039c700, 0x1d68780, 0xc0003a4430, 0xc00039c700)
	/Users/aromanenko/go/src/google.golang.org/protobuf/reflect/protoregistry/registry.go:38 +0x21f
google.golang.org/protobuf/reflect/protoregistry.(*Files).RegisterFile(0xc000080520, 0x1d87ac0, 0xc00039c700, 0x0, 0x0)
	/Users/aromanenko/go/src/google.golang.org/protobuf/reflect/protoregistry/registry.go:111 +0xb72
google.golang.org/protobuf/internal/filedesc.Builder.Build(0x0, 0x0, 0xc000230a00, 0x12a, 0x200, 0x100000001, 0x0, 0x1d6f3a0, 0xc00003c450, 0x1d79460, ...)
	/Users/aromanenko/go/src/google.golang.org/protobuf/internal/filedesc/build.go:113 +0x1aa
github.com/golang/protobuf/proto.RegisterFile(0x1c60292, 0x8, 0x23708a0, 0xe2, 0xe2)
	/Users/aromanenko/go/src/github.com/golang/protobuf/proto/registry.go:47 +0x147
github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio/v1.init.1()
	/Users/aromanenko/go/src/github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio/v1/v1.pb.go:115 +0x5a
exit status 2

3) 
$ go version
go version go1.15.5 darwin/amd64

I wonder if it's a known issue or it’s something wrong with my environment?


> On 22 Nov 2020, at 00:05, Meriem Sara <Sa...@outlook.fr> wrote:
> 
> Hello everyone. I am trying to use apache beam with Golang to execute a data processing workflow using apache Spark.  However, I am confused if the go SDK is supported by apache Spark. Could you please provide us wirh more information ?
> 
> Thank you