You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by David Cavazos <dc...@google.com> on 2019/11/20 01:22:56 UTC

GCP libraries up-to-date versions in Java

Hi Beamers,

I recently was a part of a discussion about some dependency
incompatibilities in the Java SDK. Specifically on the GRPC versions when
trying to use one of the Google Cloud client libraries as part of a Beam
pipeline. Their workaround was downgrading to an older version of the
client library to match Beam's version of the GRPC library. However, this
could not have been possible if they *needed* the newer version for any
reason.

I'm aware that Java development environments usually prefer to hardcode
versions to avoid breaking changes, but it would be great to have the
latest versions of dependencies that could be *shared* with other
libraries, like the GRPC libraries.

It looks like the Google Cloud client library team has been aware of this
problem, as well as the tricky interactions between the hundreds of
libraries they offer. They mentioned that they are starting to roll out a GCP
Libraries BOM
<https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM>
to
help everyone have up-to-date versions of their libraries, including *guava*,
*protobuf*, *grpc-java*, *google-http-java-client*, and *google-cloud-java*.

Would everyone feel comfortable on using the BOM to manage the Google Cloud
dependency versions? If so, is there anyone comfortable in Gradle willing
to do these changes?

Cheers!
David

Re: GCP libraries up-to-date versions in Java

Posted by Luke Cwik <lc...@google.com>.
I couldn't think of a good flow that didn't lead me to clearing
org.apache.beam artifacts in .m2 before running the analysis.

There might be a way to override the maven local path in Gradle so that it
publishes to a temporary directory but it wasn't obvious how to do this
from Gradles maven publishing plugin docs[1].

1: https://docs.gradle.org/current/userguide/publishing_maven.html

On Thu, Nov 21, 2019 at 8:43 PM Kenneth Knowles <ke...@apache.org> wrote:

> If we have a bunch of leftover junk in .m2 will that pollute the analysis?
> Should we rm -rf ~/.m2 first or does it work well anyhow?
>
> On Wed, Nov 20, 2019 at 4:52 PM Luke Cwik <lc...@google.com> wrote:
>
>> I took a look at the linkage checker and have opened up this PR[1] to
>> allow contributors to aid in performing dependency analysis within Apache
>> Beam during upgrades.
>>
>> The current PR works by compiling and publishing all the Java artifacts
>> to your local maven repo and then runs the linkage checker against it with
>> a specified list of artifacts. For example by running:
>> ./gradlew -Ppublishing
>> -PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc
>> :checkJavaLinkage
>>
>> Produces:
>> Class javax.annotation.Nullable is not found;
>>   referenced by 1 class file
>>     org.apache.beam.sdk.schemas.FieldValueTypeInformation
>> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
>> Class org.brotli.dec.BrotliInputStream is not found;
>>   referenced by 1 class file
>>
>> org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.brotli.BrotliCompressorInputStream
>> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
>> Class com.github.luben.zstd.ZstdInputStream is not found;
>>   referenced by 1 class file
>>
>> org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.zstandard.ZstdCompressorInputStream
>> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
>> ... (lots more output) ...
>>
>> I haven't tried running the linker analysis for all Apache Beam artifacts
>> yet but for anyone who is interested in doing dependency clean-up or
>> upgrades should be able to use the PR as is.
>>
>> 1: https://github.com/apache/beam/pull/10184
>>
>> On Wed, Nov 20, 2019 at 12:16 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> On Wed, Nov 20, 2019 at 4:05 AM Elliotte Rusty Harold <
>>> elharo@ibiblio.org> wrote:
>>>
>>>> BOM or no BOM is an implementation detail.
>>>
>>>
>>> Agreed for the most part.
>>>
>>>
>>>> Using com.google.cloud:libraries-bom would make dependency management
>>>> simpler for developers, but the real issue is whether Beam can continue to
>>>> work with very old versions of the many libraries it depends on. Even if
>>>> this is acceptable for Beam, it's unlikely to be feasible for anyone who
>>>> needs to mix Beam code with other code.
>>>
>>>
>>> I believe every version of Beam's dependencies has been, and should
>>> continue to be, driven by what is best for Beam's users. That does mean
>>> making it easy for them to use the latest compatible version of their
>>> favorite libraries.
>>>
>>> There should be no self-incompatibility between Google minor version
>>>> releases. All the Google libraries in question follow semantic versioning.
>>>> E.g. Pubsub 1.43 would be fully API compatible with Pubsub 1.28, though not
>>>> the reverse. However there are likely to be important bug fixes in 1.43 and
>>>> definitely new features that 1.28 would not have. If there are any edge
>>>> cases where this is not true, that's a bug and if you file it against the
>>>> repo we'll try to fix it. We're also installing tooling to make this less
>>>> likely to happen by accident. However, right now any such problem is rare.
>>>>
>>>
>>> I'm glad we share the same ideals. If things were as good as you
>>> described, then we would have two good properties:
>>>
>>> 1. Users would always be able to force a newer minor version to
>>> trivially work around Beam's deps
>>> 2. Beam could always upgrade minor versions with no code change in Beam
>>> and no code change by users
>>>
>>> My experience is that this rarely works so simply. Generally, a user
>>> forces a new version of a library and it turns out that library or its
>>> dependencies has broken compatibility.
>>>
>>> Just reiterating that if semver really holds in these cases, then this
>>> proposal is fine with me. And if semver doesn't hold, I still think we
>>> should try to support the latest, but may also need to maintain a connector
>>> to support older versions that are still in wide use.
>>>
>>>
>>>> Looking at Beam's dependencies, the only case where there are major
>>>> version changes to address is Guava.
>>>>
>>>
>>> Beam has vendored Guava so it is mostly beside the point. Upgrading the
>>> vendored Guava does not interact with any of Beam's dependencies. See
>>> https://lists.apache.org/thread.html/c477d120a4c4626cbe675f8b03d84c6fe7938e36c8e2b55c492224cf@%3Cdev.beam.apache.org%3E
>>>
>>> Only KinesisIO and the ZetaSQL-to-Calcite translator actually have
>>> essential dependencies on Guava. In these cases, the version of Guava must
>>> necessarily be compatible with the Kinesis client and ZetaSQL,
>>> respectively. They may or may not be able to interop, and that is mostly
>>> out of our hands.
>>>
>>> The remaining issues are pre-1.0 libraries. OpenCensus is a particular
>>>> thorn in my side. Ideally these should not be used, at all. However if we
>>>> must, we should not expose them on the Beam API surface and we need to move
>>>> them forward quickly as they change.
>>>>
>>>
>>> This might deserve its own thread. This sounds like it should be
>>> well-hidden, vendored, or well-marked as "experimental".
>>>
>>> Kenn
>>>
>>

Re: GCP libraries up-to-date versions in Java

Posted by Kenneth Knowles <ke...@apache.org>.
If we have a bunch of leftover junk in .m2 will that pollute the analysis?
Should we rm -rf ~/.m2 first or does it work well anyhow?

On Wed, Nov 20, 2019 at 4:52 PM Luke Cwik <lc...@google.com> wrote:

> I took a look at the linkage checker and have opened up this PR[1] to
> allow contributors to aid in performing dependency analysis within Apache
> Beam during upgrades.
>
> The current PR works by compiling and publishing all the Java artifacts to
> your local maven repo and then runs the linkage checker against it with a
> specified list of artifacts. For example by running:
> ./gradlew -Ppublishing
> -PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc
> :checkJavaLinkage
>
> Produces:
> Class javax.annotation.Nullable is not found;
>   referenced by 1 class file
>     org.apache.beam.sdk.schemas.FieldValueTypeInformation
> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
> Class org.brotli.dec.BrotliInputStream is not found;
>   referenced by 1 class file
>
> org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.brotli.BrotliCompressorInputStream
> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
> Class com.github.luben.zstd.ZstdInputStream is not found;
>   referenced by 1 class file
>
> org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.zstandard.ZstdCompressorInputStream
> (beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
> ... (lots more output) ...
>
> I haven't tried running the linker analysis for all Apache Beam artifacts
> yet but for anyone who is interested in doing dependency clean-up or
> upgrades should be able to use the PR as is.
>
> 1: https://github.com/apache/beam/pull/10184
>
> On Wed, Nov 20, 2019 at 12:16 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> On Wed, Nov 20, 2019 at 4:05 AM Elliotte Rusty Harold <el...@ibiblio.org>
>> wrote:
>>
>>> BOM or no BOM is an implementation detail.
>>
>>
>> Agreed for the most part.
>>
>>
>>> Using com.google.cloud:libraries-bom would make dependency management
>>> simpler for developers, but the real issue is whether Beam can continue to
>>> work with very old versions of the many libraries it depends on. Even if
>>> this is acceptable for Beam, it's unlikely to be feasible for anyone who
>>> needs to mix Beam code with other code.
>>
>>
>> I believe every version of Beam's dependencies has been, and should
>> continue to be, driven by what is best for Beam's users. That does mean
>> making it easy for them to use the latest compatible version of their
>> favorite libraries.
>>
>> There should be no self-incompatibility between Google minor version
>>> releases. All the Google libraries in question follow semantic versioning.
>>> E.g. Pubsub 1.43 would be fully API compatible with Pubsub 1.28, though not
>>> the reverse. However there are likely to be important bug fixes in 1.43 and
>>> definitely new features that 1.28 would not have. If there are any edge
>>> cases where this is not true, that's a bug and if you file it against the
>>> repo we'll try to fix it. We're also installing tooling to make this less
>>> likely to happen by accident. However, right now any such problem is rare.
>>>
>>
>> I'm glad we share the same ideals. If things were as good as you
>> described, then we would have two good properties:
>>
>> 1. Users would always be able to force a newer minor version to trivially
>> work around Beam's deps
>> 2. Beam could always upgrade minor versions with no code change in Beam
>> and no code change by users
>>
>> My experience is that this rarely works so simply. Generally, a user
>> forces a new version of a library and it turns out that library or its
>> dependencies has broken compatibility.
>>
>> Just reiterating that if semver really holds in these cases, then this
>> proposal is fine with me. And if semver doesn't hold, I still think we
>> should try to support the latest, but may also need to maintain a connector
>> to support older versions that are still in wide use.
>>
>>
>>> Looking at Beam's dependencies, the only case where there are major
>>> version changes to address is Guava.
>>>
>>
>> Beam has vendored Guava so it is mostly beside the point. Upgrading the
>> vendored Guava does not interact with any of Beam's dependencies. See
>> https://lists.apache.org/thread.html/c477d120a4c4626cbe675f8b03d84c6fe7938e36c8e2b55c492224cf@%3Cdev.beam.apache.org%3E
>>
>> Only KinesisIO and the ZetaSQL-to-Calcite translator actually have
>> essential dependencies on Guava. In these cases, the version of Guava must
>> necessarily be compatible with the Kinesis client and ZetaSQL,
>> respectively. They may or may not be able to interop, and that is mostly
>> out of our hands.
>>
>> The remaining issues are pre-1.0 libraries. OpenCensus is a particular
>>> thorn in my side. Ideally these should not be used, at all. However if we
>>> must, we should not expose them on the Beam API surface and we need to move
>>> them forward quickly as they change.
>>>
>>
>> This might deserve its own thread. This sounds like it should be
>> well-hidden, vendored, or well-marked as "experimental".
>>
>> Kenn
>>
>

Re: GCP libraries up-to-date versions in Java

Posted by Luke Cwik <lc...@google.com>.
I took a look at the linkage checker and have opened up this PR[1] to allow
contributors to aid in performing dependency analysis within Apache Beam
during upgrades.

The current PR works by compiling and publishing all the Java artifacts to
your local maven repo and then runs the linkage checker against it with a
specified list of artifacts. For example by running:
./gradlew -Ppublishing
-PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc
:checkJavaLinkage

Produces:
Class javax.annotation.Nullable is not found;
  referenced by 1 class file
    org.apache.beam.sdk.schemas.FieldValueTypeInformation
(beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
Class org.brotli.dec.BrotliInputStream is not found;
  referenced by 1 class file

org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.brotli.BrotliCompressorInputStream
(beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
Class com.github.luben.zstd.ZstdInputStream is not found;
  referenced by 1 class file

org.apache.beam.repackaged.core.org.apache.commons.compress.compressors.zstandard.ZstdCompressorInputStream
(beam-sdks-java-core-2.18.0-SNAPSHOT.jar)
... (lots more output) ...

I haven't tried running the linker analysis for all Apache Beam artifacts
yet but for anyone who is interested in doing dependency clean-up or
upgrades should be able to use the PR as is.

1: https://github.com/apache/beam/pull/10184

On Wed, Nov 20, 2019 at 12:16 PM Kenneth Knowles <ke...@apache.org> wrote:

> On Wed, Nov 20, 2019 at 4:05 AM Elliotte Rusty Harold <el...@ibiblio.org>
> wrote:
>
>> BOM or no BOM is an implementation detail.
>
>
> Agreed for the most part.
>
>
>> Using com.google.cloud:libraries-bom would make dependency management
>> simpler for developers, but the real issue is whether Beam can continue to
>> work with very old versions of the many libraries it depends on. Even if
>> this is acceptable for Beam, it's unlikely to be feasible for anyone who
>> needs to mix Beam code with other code.
>
>
> I believe every version of Beam's dependencies has been, and should
> continue to be, driven by what is best for Beam's users. That does mean
> making it easy for them to use the latest compatible version of their
> favorite libraries.
>
> There should be no self-incompatibility between Google minor version
>> releases. All the Google libraries in question follow semantic versioning.
>> E.g. Pubsub 1.43 would be fully API compatible with Pubsub 1.28, though not
>> the reverse. However there are likely to be important bug fixes in 1.43 and
>> definitely new features that 1.28 would not have. If there are any edge
>> cases where this is not true, that's a bug and if you file it against the
>> repo we'll try to fix it. We're also installing tooling to make this less
>> likely to happen by accident. However, right now any such problem is rare.
>>
>
> I'm glad we share the same ideals. If things were as good as you
> described, then we would have two good properties:
>
> 1. Users would always be able to force a newer minor version to trivially
> work around Beam's deps
> 2. Beam could always upgrade minor versions with no code change in Beam
> and no code change by users
>
> My experience is that this rarely works so simply. Generally, a user
> forces a new version of a library and it turns out that library or its
> dependencies has broken compatibility.
>
> Just reiterating that if semver really holds in these cases, then this
> proposal is fine with me. And if semver doesn't hold, I still think we
> should try to support the latest, but may also need to maintain a connector
> to support older versions that are still in wide use.
>
>
>> Looking at Beam's dependencies, the only case where there are major
>> version changes to address is Guava.
>>
>
> Beam has vendored Guava so it is mostly beside the point. Upgrading the
> vendored Guava does not interact with any of Beam's dependencies. See
> https://lists.apache.org/thread.html/c477d120a4c4626cbe675f8b03d84c6fe7938e36c8e2b55c492224cf@%3Cdev.beam.apache.org%3E
>
> Only KinesisIO and the ZetaSQL-to-Calcite translator actually have
> essential dependencies on Guava. In these cases, the version of Guava must
> necessarily be compatible with the Kinesis client and ZetaSQL,
> respectively. They may or may not be able to interop, and that is mostly
> out of our hands.
>
> The remaining issues are pre-1.0 libraries. OpenCensus is a particular
>> thorn in my side. Ideally these should not be used, at all. However if we
>> must, we should not expose them on the Beam API surface and we need to move
>> them forward quickly as they change.
>>
>
> This might deserve its own thread. This sounds like it should be
> well-hidden, vendored, or well-marked as "experimental".
>
> Kenn
>

Re: GCP libraries up-to-date versions in Java

Posted by Kenneth Knowles <ke...@apache.org>.
On Wed, Nov 20, 2019 at 4:05 AM Elliotte Rusty Harold <el...@ibiblio.org>
wrote:

> BOM or no BOM is an implementation detail.


Agreed for the most part.


> Using com.google.cloud:libraries-bom would make dependency management
> simpler for developers, but the real issue is whether Beam can continue to
> work with very old versions of the many libraries it depends on. Even if
> this is acceptable for Beam, it's unlikely to be feasible for anyone who
> needs to mix Beam code with other code.


I believe every version of Beam's dependencies has been, and should
continue to be, driven by what is best for Beam's users. That does mean
making it easy for them to use the latest compatible version of their
favorite libraries.

There should be no self-incompatibility between Google minor version
> releases. All the Google libraries in question follow semantic versioning.
> E.g. Pubsub 1.43 would be fully API compatible with Pubsub 1.28, though not
> the reverse. However there are likely to be important bug fixes in 1.43 and
> definitely new features that 1.28 would not have. If there are any edge
> cases where this is not true, that's a bug and if you file it against the
> repo we'll try to fix it. We're also installing tooling to make this less
> likely to happen by accident. However, right now any such problem is rare.
>

I'm glad we share the same ideals. If things were as good as you described,
then we would have two good properties:

1. Users would always be able to force a newer minor version to trivially
work around Beam's deps
2. Beam could always upgrade minor versions with no code change in Beam and
no code change by users

My experience is that this rarely works so simply. Generally, a user forces
a new version of a library and it turns out that library or its
dependencies has broken compatibility.

Just reiterating that if semver really holds in these cases, then this
proposal is fine with me. And if semver doesn't hold, I still think we
should try to support the latest, but may also need to maintain a connector
to support older versions that are still in wide use.


> Looking at Beam's dependencies, the only case where there are major
> version changes to address is Guava.
>

Beam has vendored Guava so it is mostly beside the point. Upgrading the
vendored Guava does not interact with any of Beam's dependencies. See
https://lists.apache.org/thread.html/c477d120a4c4626cbe675f8b03d84c6fe7938e36c8e2b55c492224cf@%3Cdev.beam.apache.org%3E

Only KinesisIO and the ZetaSQL-to-Calcite translator actually have
essential dependencies on Guava. In these cases, the version of Guava must
necessarily be compatible with the Kinesis client and ZetaSQL,
respectively. They may or may not be able to interop, and that is mostly
out of our hands.

The remaining issues are pre-1.0 libraries. OpenCensus is a particular
> thorn in my side. Ideally these should not be used, at all. However if we
> must, we should not expose them on the Beam API surface and we need to move
> them forward quickly as they change.
>

This might deserve its own thread. This sounds like it should be
well-hidden, vendored, or well-marked as "experimental".

Kenn

Re: GCP libraries up-to-date versions in Java

Posted by Elliotte Rusty Harold <el...@ibiblio.org>.
On Wed, Nov 20, 2019 at 1:43 PM Luke Cwik <lc...@google.com> wrote:
>
> Minor note that Gradle 5 added support for BOMs[1].
>
> I think attempting to perform the upgrade (whether to use BOM or not) will be a concerted effort every time to minimize the amount of breakage to users while maximizing compatibility with the OSS ecosystem. Unfortunately I'm not aware of any dependency analysis tooling that can perform some validation stating that something is safe or not. If such a tool existed, it would make it much easier for projects to perform upgrades and would also help users as well.

Funny you should ask. We've been working on tooling like that such as
the linkage monitor and the Maven enforcer rule:

https://github.com/GoogleCloudPlatform/cloud-opensource-java

Mostly Maven based for the time being, but updates are possible.

At the end of the day, though, this is no substitute for extensive
unit and integration test suites. If we don't have those, then
dependencies are the least of our worries. If we do have those, we can
move forward  in reasonable confidence that minor version upgrades
won't break anything without causing a test to fail.

-- 
Elliotte Rusty Harold
elharo@ibiblio.org

Re: GCP libraries up-to-date versions in Java

Posted by Luke Cwik <lc...@google.com>.
Minor note that Gradle 5 added support for BOMs[1].

I think attempting to perform the upgrade (whether to use BOM or not) will
be a concerted effort every time to minimize the amount of breakage to
users while maximizing compatibility with the OSS ecosystem. Unfortunately
I'm not aware of any dependency analysis tooling that can perform some
validation stating that something is safe or not. If such a tool existed,
it would make it much easier for projects to perform upgrades and would
also help users as well.

1:
https://dzone.com/articles/gradle-goodness-use-bill-of-materials-bom-as-depen

On Wed, Nov 20, 2019 at 4:05 AM Elliotte Rusty Harold <el...@ibiblio.org>
wrote:

> BOM or no BOM is an implementation detail. Using
> com.google.clou:libraries-bom would make dependency management simpler
> for developers, but the real issue is whether Beam can continue to
> work with very old versions of the many libraries it depends on. Even
> if this is acceptable for Beam, it's unlikely to be feasible for
> anyone who needs to mix Beam code with other code.
>
> There should be no self-incompatibility between Google minor version
> releases. All the Google libraries in question follow semantic
> versioning. E.g. Pubsub 1.43 would be fully API compatible with Pubsub
> 1.28, though not the reverse. However there are likely to be important
> bug fixes in 1.43 and definitely new features that 1.28 would not
> have. If there are any edge cases where this is not true, that's a bug
> and if you file it against the repo we'll try to fix it. We're also
> installing tooling to make this less likely to happen by accident.
> However, right now any such problem is rare.
>
> Behavior differences are another story. It is entirely possible that
> something like Pubsub 1.28 would simply no longer function due to
> changes at the backend. There's a deprecation cycle, announcements,
> and transition periods in all such cases; but a project like Beam
> can't stay on old versions forever. If they try, the backends will
> shift out from under them.
>
> Looking at Beam's dependencies, the only case where there are major
> version changes to address is Guava.
> This will take some work, but not an excessive amount. We should be
> able to move this up to 28.1-android with few code changes and no
> further API breaking changes in that library are planned for the
> future.
>
> The remaining issues are pre-1.0 libraries. OpenCensus is a particular
> thorn in my side. Ideally these should not be used, at all. However if
> we must, we should not expose them on the Beam API surface and we need
> to move them forward quickly as they change.
>
> --
> Elliotte Rusty Harold
> elharo@ibiblio.org
>

Re: GCP libraries up-to-date versions in Java

Posted by Elliotte Rusty Harold <el...@ibiblio.org>.
BOM or no BOM is an implementation detail. Using
com.google.clou:libraries-bom would make dependency management simpler
for developers, but the real issue is whether Beam can continue to
work with very old versions of the many libraries it depends on. Even
if this is acceptable for Beam, it's unlikely to be feasible for
anyone who needs to mix Beam code with other code.

There should be no self-incompatibility between Google minor version
releases. All the Google libraries in question follow semantic
versioning. E.g. Pubsub 1.43 would be fully API compatible with Pubsub
1.28, though not the reverse. However there are likely to be important
bug fixes in 1.43 and definitely new features that 1.28 would not
have. If there are any edge cases where this is not true, that's a bug
and if you file it against the repo we'll try to fix it. We're also
installing tooling to make this less likely to happen by accident.
However, right now any such problem is rare.

Behavior differences are another story. It is entirely possible that
something like Pubsub 1.28 would simply no longer function due to
changes at the backend. There's a deprecation cycle, announcements,
and transition periods in all such cases; but a project like Beam
can't stay on old versions forever. If they try, the backends will
shift out from under them.

Looking at Beam's dependencies, the only case where there are major
version changes to address is Guava.
This will take some work, but not an excessive amount. We should be
able to move this up to 28.1-android with few code changes and no
further API breaking changes in that library are planned for the
future.

The remaining issues are pre-1.0 libraries. OpenCensus is a particular
thorn in my side. Ideally these should not be used, at all. However if
we must, we should not expose them on the Beam API surface and we need
to move them forward quickly as they change.

-- 
Elliotte Rusty Harold
elharo@ibiblio.org

Re: GCP libraries up-to-date versions in Java

Posted by Kenneth Knowles <ke...@apache.org>.
Hi David,

This requires some thought. Avoiding breaking changes is more than a
preference.

As I understand it, the problem is not dependency incompatibilities in the
Beam Java SDK, but self-incompatibility in Google's libraries across
releases. It makes sense - inside Google's "monorepo" one does not need to
be as careful, so these things are bound to happen.

Nonetheless, in practice, breaking changes == forks. The term "upgrade" and
"downgrade" are misleading in this case. The situation is more like a
milder version of Python 2 vs 3.

So taking "Pubsub 1.28" and "Pubsub 1.43" which are simply different*,
which one(s) should Beam support? Should we choose to support only the
versions of the libraries that Google suggests via its BOM? Can we manage
to cleverly support them with a single PubsubIO like we do w/ Flink and
ElasticSearch? Given Google's track record, there will surely be more
mutually incompatible versions to come. Any popular storage system that
wants to make breaking changes will need a connector at least for the prior
dominant version and the new ascending version, or else harm users.

If we can pin to a version of the BOM with no breaking changes in Beam or
its dependencies, then this is all a non-issue.

If adopting the bom is a breaking change, then it would be a radical new
policy. It makes some sense, since Beam users who care about GCP connectors
are probably willing and interested in adhering to Google's recommendations
even if they have to adjust their code one time. But notably, there are
plenty of Beam users who upgrade their Beam jars without recompilation, so
the breakage is more severe. And presumably any change to the BOM version
would include breaking changes - would we establish a policy of changing
it? Or basically never change it once we do the first breaking change to
pin?

I think the scope of this proposal must be severely limited. It should have
zero impact on users outside of the GCP connectors and Dataflow runner -
any other Beam use of Google's OSS utility libraries is out of scope. Even
so, I am not sure this is best for users overall.

There's a lot to consider and flesh out. I'm not even sure how / if we can
get the data needed to guide these decisions.

Kenn

*I just made up the version numbers

On Tue, Nov 19, 2019 at 5:23 PM David Cavazos <dc...@google.com> wrote:

> Hi Beamers,
>
> I recently was a part of a discussion about some dependency
> incompatibilities in the Java SDK. Specifically on the GRPC versions when
> trying to use one of the Google Cloud client libraries as part of a Beam
> pipeline. Their workaround was downgrading to an older version of the
> client library to match Beam's version of the GRPC library. However, this
> could not have been possible if they *needed* the newer version for any
> reason.
>
> I'm aware that Java development environments usually prefer to hardcode
> versions to avoid breaking changes, but it would be great to have the
> latest versions of dependencies that could be *shared* with other
> libraries, like the GRPC libraries.
>
> It looks like the Google Cloud client library team has been aware of this
> problem, as well as the tricky interactions between the hundreds of
> libraries they offer. They mentioned that they are starting to roll out a GCP
> Libraries BOM
> <https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM> to
> help everyone have up-to-date versions of their libraries, including
> *guava*, *protobuf*, *grpc-java*, *google-http-java-client*, and
> *google-cloud-java*.
>
> Would everyone feel comfortable on using the BOM to manage the Google
> Cloud dependency versions? If so, is there anyone comfortable in Gradle
> willing to do these changes?
>
> Cheers!
> David
>