You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jeff Klukas <jk...@mozilla.com> on 2019/01/02 17:11:31 UTC

Using gRPC with PubsubIO?

I see that the Beam codebase includes a PubsubGrpcClient, but there doesn't
appear to be any way to configure PubsubIO to use that client over the
PubsubJsonClient.

There's even a PubsubIO.Read#withClientFactory, but it's marked as for
testing only.

Is gRPC support something that's still in development? Or am I missing
something about how to configure this?

I'm particularly interested in using gRPC due to the message size inflation
of base64 encoding required for JSON transport. My payloads are all below
the 10 MB Pubsub limit, but I need to support some near the top end of that
range that are currently causing errors due to base64 inflation.

Re: Using gRPC with PubsubIO?

Posted by Jeff Klukas <jk...@mozilla.com>.
I believe this explains why I have observing Pubsub write errors (about
messages being too large) in logs for the Dataflow "shuffler" rather than
the workers.

The specific error I saw was about a 7 MB message being too large with
base64 encoding to meet Pubsub requirements (10 MB max message size), which
makes me think that the Dataflow Pubsub writer was still using JSON rather
than gRPC. But sounds like this is not configurable from the client and
Google has full control over the details of how Pubsub writing and reading
work in Dataflow jobs.


On Wed, Jan 2, 2019 at 1:04 PM Steve Niemitz <sn...@apache.org> wrote:

> Something to consider: if you're running in Dataflow, the entire Pubsub
> read step becomes a noop [1], and the underlying streaming implementation
> itself handles reading from pubsub (either windmill or the streaming
> engine).
>
> [1]
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L373
>
> On Wed, Jan 2, 2019 at 12:11 PM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> I see that the Beam codebase includes a PubsubGrpcClient, but there
>> doesn't appear to be any way to configure PubsubIO to use that client over
>> the PubsubJsonClient.
>>
>> There's even a PubsubIO.Read#withClientFactory, but it's marked as for
>> testing only.
>>
>> Is gRPC support something that's still in development? Or am I missing
>> something about how to configure this?
>>
>> I'm particularly interested in using gRPC due to the message size
>> inflation of base64 encoding required for JSON transport. My payloads are
>> all below the 10 MB Pubsub limit, but I need to support some near the top
>> end of that range that are currently causing errors due to base64 inflation.
>>
>

Re: Using gRPC with PubsubIO?

Posted by Steve Niemitz <sn...@apache.org>.
Something to consider: if you're running in Dataflow, the entire Pubsub
read step becomes a noop [1], and the underlying streaming implementation
itself handles reading from pubsub (either windmill or the streaming
engine).

[1]
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L373

On Wed, Jan 2, 2019 at 12:11 PM Jeff Klukas <jk...@mozilla.com> wrote:

> I see that the Beam codebase includes a PubsubGrpcClient, but there
> doesn't appear to be any way to configure PubsubIO to use that client over
> the PubsubJsonClient.
>
> There's even a PubsubIO.Read#withClientFactory, but it's marked as for
> testing only.
>
> Is gRPC support something that's still in development? Or am I missing
> something about how to configure this?
>
> I'm particularly interested in using gRPC due to the message size
> inflation of base64 encoding required for JSON transport. My payloads are
> all below the 10 MB Pubsub limit, but I need to support some near the top
> end of that range that are currently causing errors due to base64 inflation.
>