You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jeff Klukas <jk...@mozilla.com> on 2019/01/02 17:03:01 UTC

Why does Beam not use the google-api-client libraries?

I'm building a high-volume Beam pipeline using PubsubIO and running into
some concerns over performance and delivery semantics, prompting me to want
to better understand the implementation. Reading through the library,
PubsubIO appears to be a completely separate implementation of Pubsub
client behavior from Google's own Java client. As a developer trying to
read and understand the implementation, this is a significant hurdle, since
any previous knowledge of the Google library is not applicable and is
potentially at odds with what's in PubsubIO.

Why doesn't beam use the Google clients for PubsubIO, BigQueryIO, etc.? Is
it for historical reasons? Is there difficulty in packaging and integration
of the Google clients? Or are the needs for Beam just substantially
different from what the Google libraries provide?

Re: Why does Beam not use the google-api-client libraries?

Posted by Reuven Lax <re...@google.com>.
Cham is absolutely correct. The google-cloud-pubsub higher-level library
didn't exist when the Beam connector were written, and nobody has gotten
around to rewriting that connector.

On Wed, Jan 2, 2019 at 11:29 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Thanks Jeff for the interest in this.
>
> I think most of the existing GCP IO connectors use Google API client
> libraries [1] due to historical reasons (these were the libraries that were
> available when these connectors were originally built).
> We should upgrade to latest Google Cloud client libraries [2] at some
> point but I don't have an exact ETA for this.
>
> Thanks,
> Cham
>
> [1] https://developers.google.com/api-client-library/
> [2] https://cloud.google.com/apis/docs/cloud-client-libraries
>
> On Wed, Jan 2, 2019 at 10:38 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> My apologies. I got the terminology entirely wrong.
>>
>> As you say, PubsubIO and other Beam components _do_ use the official
>> Google API client library (google-api-client). They do not, however, use
>> the higher-level Google Cloud libraries such as google-cloud-pubsub which
>> provide abstractions on top of the API client library.
>>
>> I am wondering whether there are technical reasons not to use the
>> higher-level service-specific libraries, or whether this is simply
>> historical.
>>
>> On Wed, Jan 2, 2019 at 12:38 PM Anton Kedin <ke...@google.com> wrote:
>>
>>> I don't have enough context to answer all of the questions, but looking
>>> at PubsubIO it seems to use the official libraries, e.g. see Pubsub doc
>>> [1]  vs Pubsub IO GRPC client [2]. Correct me if I misunderstood
>>> your question.
>>>
>>> [1]
>>> https://cloud.google.com/pubsub/docs/publisher#pubsub-publish-message-java
>>> [2]
>>> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java#L189
>>>
>>> Pubsub IO JSON client seems to use a slightly different approach but
>>> still relies on somewhat official path, e.g. Pubsub doc [3] (javadoc[4]) vs
>>> Pubsub IO JSON client [5].
>>>
>>> [3] https://developers.google.com/api-client-library/java/apis/pubsub/v1
>>> [4]
>>> https://developers.google.com/resources/api-libraries/documentation/pubsub/v1/java/latest/com/google/api/services/pubsub/Pubsub.html
>>> [5]
>>> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java#L130
>>>
>>> The latter seems to be the older library, so I would assume it's for
>>> legacy reasons.
>>>
>>> Regards,
>>> Anton
>>>
>>>
>>> On Wed, Jan 2, 2019 at 9:03 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>>
>>>> I'm building a high-volume Beam pipeline using PubsubIO and running
>>>> into some concerns over performance and delivery semantics, prompting me to
>>>> want to better understand the implementation. Reading through the library,
>>>> PubsubIO appears to be a completely separate implementation of Pubsub
>>>> client behavior from Google's own Java client. As a developer trying to
>>>> read and understand the implementation, this is a significant hurdle, since
>>>> any previous knowledge of the Google library is not applicable and is
>>>> potentially at odds with what's in PubsubIO.
>>>>
>>>> Why doesn't beam use the Google clients for PubsubIO, BigQueryIO, etc.?
>>>> Is it for historical reasons? Is there difficulty in packaging and
>>>> integration of the Google clients? Or are the needs for Beam just
>>>> substantially different from what the Google libraries provide?
>>>>
>>>

Re: Why does Beam not use the google-api-client libraries?

Posted by Chamikara Jayalath <ch...@google.com>.
Thanks Jeff for the interest in this.

I think most of the existing GCP IO connectors use Google API client
libraries [1] due to historical reasons (these were the libraries that were
available when these connectors were originally built).
We should upgrade to latest Google Cloud client libraries [2] at some point
but I don't have an exact ETA for this.

Thanks,
Cham

[1] https://developers.google.com/api-client-library/
[2] https://cloud.google.com/apis/docs/cloud-client-libraries

On Wed, Jan 2, 2019 at 10:38 AM Jeff Klukas <jk...@mozilla.com> wrote:

> My apologies. I got the terminology entirely wrong.
>
> As you say, PubsubIO and other Beam components _do_ use the official
> Google API client library (google-api-client). They do not, however, use
> the higher-level Google Cloud libraries such as google-cloud-pubsub which
> provide abstractions on top of the API client library.
>
> I am wondering whether there are technical reasons not to use the
> higher-level service-specific libraries, or whether this is simply
> historical.
>
> On Wed, Jan 2, 2019 at 12:38 PM Anton Kedin <ke...@google.com> wrote:
>
>> I don't have enough context to answer all of the questions, but looking
>> at PubsubIO it seems to use the official libraries, e.g. see Pubsub doc
>> [1]  vs Pubsub IO GRPC client [2]. Correct me if I misunderstood
>> your question.
>>
>> [1]
>> https://cloud.google.com/pubsub/docs/publisher#pubsub-publish-message-java
>> [2]
>> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java#L189
>>
>> Pubsub IO JSON client seems to use a slightly different approach but
>> still relies on somewhat official path, e.g. Pubsub doc [3] (javadoc[4]) vs
>> Pubsub IO JSON client [5].
>>
>> [3] https://developers.google.com/api-client-library/java/apis/pubsub/v1
>> [4]
>> https://developers.google.com/resources/api-libraries/documentation/pubsub/v1/java/latest/com/google/api/services/pubsub/Pubsub.html
>> [5]
>> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java#L130
>>
>> The latter seems to be the older library, so I would assume it's for
>> legacy reasons.
>>
>> Regards,
>> Anton
>>
>>
>> On Wed, Jan 2, 2019 at 9:03 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> I'm building a high-volume Beam pipeline using PubsubIO and running into
>>> some concerns over performance and delivery semantics, prompting me to want
>>> to better understand the implementation. Reading through the library,
>>> PubsubIO appears to be a completely separate implementation of Pubsub
>>> client behavior from Google's own Java client. As a developer trying to
>>> read and understand the implementation, this is a significant hurdle, since
>>> any previous knowledge of the Google library is not applicable and is
>>> potentially at odds with what's in PubsubIO.
>>>
>>> Why doesn't beam use the Google clients for PubsubIO, BigQueryIO, etc.?
>>> Is it for historical reasons? Is there difficulty in packaging and
>>> integration of the Google clients? Or are the needs for Beam just
>>> substantially different from what the Google libraries provide?
>>>
>>

Re: Why does Beam not use the google-api-client libraries?

Posted by Jeff Klukas <jk...@mozilla.com>.
My apologies. I got the terminology entirely wrong.

As you say, PubsubIO and other Beam components _do_ use the official Google
API client library (google-api-client). They do not, however, use the
higher-level Google Cloud libraries such as google-cloud-pubsub which
provide abstractions on top of the API client library.

I am wondering whether there are technical reasons not to use the
higher-level service-specific libraries, or whether this is simply
historical.

On Wed, Jan 2, 2019 at 12:38 PM Anton Kedin <ke...@google.com> wrote:

> I don't have enough context to answer all of the questions, but looking at
> PubsubIO it seems to use the official libraries, e.g. see Pubsub doc [1]
> vs Pubsub IO GRPC client [2]. Correct me if I misunderstood your question.
>
> [1]
> https://cloud.google.com/pubsub/docs/publisher#pubsub-publish-message-java
> [2]
> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java#L189
>
> Pubsub IO JSON client seems to use a slightly different approach but still
> relies on somewhat official path, e.g. Pubsub doc [3] (javadoc[4]) vs
> Pubsub IO JSON client [5].
>
> [3] https://developers.google.com/api-client-library/java/apis/pubsub/v1
> [4]
> https://developers.google.com/resources/api-libraries/documentation/pubsub/v1/java/latest/com/google/api/services/pubsub/Pubsub.html
> [5]
> https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java#L130
>
> The latter seems to be the older library, so I would assume it's for
> legacy reasons.
>
> Regards,
> Anton
>
>
> On Wed, Jan 2, 2019 at 9:03 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> I'm building a high-volume Beam pipeline using PubsubIO and running into
>> some concerns over performance and delivery semantics, prompting me to want
>> to better understand the implementation. Reading through the library,
>> PubsubIO appears to be a completely separate implementation of Pubsub
>> client behavior from Google's own Java client. As a developer trying to
>> read and understand the implementation, this is a significant hurdle, since
>> any previous knowledge of the Google library is not applicable and is
>> potentially at odds with what's in PubsubIO.
>>
>> Why doesn't beam use the Google clients for PubsubIO, BigQueryIO, etc.?
>> Is it for historical reasons? Is there difficulty in packaging and
>> integration of the Google clients? Or are the needs for Beam just
>> substantially different from what the Google libraries provide?
>>
>

Re: Why does Beam not use the google-api-client libraries?

Posted by Anton Kedin <ke...@google.com>.
I don't have enough context to answer all of the questions, but looking at
PubsubIO it seems to use the official libraries, e.g. see Pubsub doc [1]
vs Pubsub IO GRPC client [2]. Correct me if I misunderstood your question.

[1]
https://cloud.google.com/pubsub/docs/publisher#pubsub-publish-message-java
[2]
https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java#L189

Pubsub IO JSON client seems to use a slightly different approach but still
relies on somewhat official path, e.g. Pubsub doc [3] (javadoc[4]) vs
Pubsub IO JSON client [5].

[3] https://developers.google.com/api-client-library/java/apis/pubsub/v1
[4]
https://developers.google.com/resources/api-libraries/documentation/pubsub/v1/java/latest/com/google/api/services/pubsub/Pubsub.html
[5]
https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java#L130

The latter seems to be the older library, so I would assume it's for legacy
reasons.

Regards,
Anton


On Wed, Jan 2, 2019 at 9:03 AM Jeff Klukas <jk...@mozilla.com> wrote:

> I'm building a high-volume Beam pipeline using PubsubIO and running into
> some concerns over performance and delivery semantics, prompting me to want
> to better understand the implementation. Reading through the library,
> PubsubIO appears to be a completely separate implementation of Pubsub
> client behavior from Google's own Java client. As a developer trying to
> read and understand the implementation, this is a significant hurdle, since
> any previous knowledge of the Google library is not applicable and is
> potentially at odds with what's in PubsubIO.
>
> Why doesn't beam use the Google clients for PubsubIO, BigQueryIO, etc.? Is
> it for historical reasons? Is there difficulty in packaging and integration
> of the Google clients? Or are the needs for Beam just substantially
> different from what the Google libraries provide?
>