You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Daniel Collins <dp...@google.com> on 2021/06/17 18:55:08 UTC

Aliasing Pub/Sub Lite IO in external repo

Hello beam developers,

I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get some
feedback on a change to the model for hosting this I/O in beam. Our team
has been frustrated by the fact that we have no way to release features or
fixes for bugs to customers on time scales shorter than the 1-2 months of
the beam release cycle, and that those fixes are necessarily coupled with a
beam version upgrade. To work around this, I forked the I/O in beam to our
own repo about 6 months ago and have been maintaining both copies in
parallel.

I'd like to retain our ability to quickly fix and improve the I/O while
retaining end-user visibility within the beam repo. To do this, I'd like
to remove all the implementation from the beam repo, and leave the I/O
there implemented as:

```
class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO {}
````
, and add a dependency on our beam artifact.

This enables beam users who want to just use the
beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
also track the canonical version separately in our repo to get fixes and
improvements at a faster rate. All static methods from the parent class
would be available on the class in the beam repo.

I'd be interested to hear anyones thoughts and suggestions surrounding this.

-Daniel

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Ahmet Altay <al...@google.com>.
Thank you. I was just curious. No rush.

On Thu, Aug 19, 2021 at 11:23 AM Manu Menzella <me...@google.com> wrote:

> I took a look at this early on but have not reviewed it recently; I'm
> sorry for the delay. I'll get back to this tomorrow or Monday.
>
> On Thu, Aug 19, 2021 at 2:19 PM Ahmet Altay <al...@google.com> wrote:
>
>> Curious about the status of this. Was there a consensus in the PR or in
>> the doc?
>>
>> On Tue, Jul 6, 2021 at 10:25 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> I don't think that every IO (or transform, etc.) needs to be a part of
>>> Beam (and it's release cycle, etc.) There's a point of maturity where
>>> growing the ecosystem makes more sense than growing the repository. This is
>>> one of the reasons we try to build IOs, etc. on the public API rather than
>>> needing some "secret sauce." I don't think this part of the API is fragile
>>> or changing enough to be an issue (though testing against nightlies is a
>>> good idea).
>>>
>>> As for discoverability, it makes sense to have some pointers, but I
>>> don't see stubs from Beam to an external repository as necessarily or
>>> desirable. As long as it's well documented it'll be easy to search for for
>>> exactly the set of people that are interested in it.
>>>
>>>
>>>
>>> On Tue, Jul 6, 2021 at 5:53 PM Chamikara Jayalath <ch...@google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jul 6, 2021 at 8:01 AM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> > I think this is premature perhaps: as far as I know, there is no
>>>>> such plan in place for many other I/Os which exist in beam and pull in
>>>>> other maven dependencies which may cause dependency conflicts. This should
>>>>> not make such conflicts any worse.
>>>>>
>>>>> To my knowledge there exists no other IO in Beam that depends on an
>>>>> external library that also depends on Beam. That is where I believe you are
>>>>> going into uncharted territory. I am fine with the plan being a full
>>>>> rollback (or even just a copy of the external repo into Beam) if we hit
>>>>> this problem. It sounds like you have it covered.
>>>>>
>>>>
>>>> I agree with Andrew. Other external I/O connectors do not depend on
>>>> Beam in a circular way AFAIK.
>>>> Additionally, this means that a user that just uses the Beam stub has
>>>> no idea regarding the API surface or backwards compatibility guarantees
>>>> since the API is completely inherited from a class that is in an external
>>>> repo.
>>>>
>>>>  If release velocity is an issue, it might be cleaner to just move
>>>> Pub/Sub Lite I/O completely out of Beam instead of leaving a stub in the
>>>> Beam codebase. Pub/Sub Lite connector is experimental. So backwards
>>>> compatibility when moving the code should not be an issue.
>>>>
>>>> In-fact we have a BigTable connector that is completely external to
>>>> Beam which has worked fine for customers so far I believe.
>>>>
>>>> https://github.com/googleapis/java-bigtable-hbase/tree/master/bigtable-dataflow-parent/bigtable-hbase-beam
>>>>
>>>> Agree that discoverability can be an issue though. May be listing
>>>> recommended external I/O connector here might help:
>>>> https://beam.apache.org/documentation/io/built-in/
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>>>
>>>>> On Fri, Jul 2, 2021 at 8:22 PM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm on vacation, so sorry if I missed a lot of discussion here. Going
>>>>>> to try to reply to a bunch of the comments in this thread:
>>>>>>
>>>>>> > One more question of my own: Do you expect pubsub lite io to
>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>> keeping the io external might become irrelevant.
>>>>>>
>>>>>> We expect the I/O to receive updates over the short term to handle
>>>>>> the availability of dataflow-runner-v2 for java without an opt-in
>>>>>> allowlist, but in the longer term (1 year +) we don't expect to keep
>>>>>> updating this significantly. However, one of the primary reasons we want to
>>>>>> do this it to have the ability to respond to user requests and bug reports
>>>>>> on shorter timetables- even if we don't plan to make changes, it's hard to
>>>>>> predict either bug reports or feature requests.
>>>>>>
>>>>>> > The real issue is 3rd party dependency convergence and managing a
>>>>>> BOM that works for your users.
>>>>>>
>>>>>> Agreed- however, our library <largely> uses the google cloud BOM
>>>>>> (actually its underlying dependency list in google-shared-dependencies) for
>>>>>> shared dependencies, so this should be mostly a non-issue, even more so if
>>>>>> beam eventually moves to use the cloud BOM.
>>>>>>
>>>>>> > The core SDK does not depend on any IO (and we should keep it this
>>>>>> way, for sure).
>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>> > I have to also push on whether we can do this the "normal" way:
>>>>>> refer to it in docs, and have examples for users to copy/paste/modify that
>>>>>> already includes the needed deps.
>>>>>>
>>>>>> We can, the primary issue with this is discoverability: the current
>>>>>> expectation of users is that they pull in the google-cloud-platform
>>>>>> artifact and get all I/Os available. Without the alias, we need to tell
>>>>>> users to use the different artifact, and they may not know that they even
>>>>>> need to look for the different artifact; instead assuming it doesn't exist.
>>>>>> This is purely for U/X purposes.
>>>>>>
>>>>>> > 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>>>> changes to Beam.
>>>>>>
>>>>>> Agreed, and in https://github.com/apache/beam/pull/15076 I've added
>>>>>> one. If you have any more concrete suggestions for testing that would
>>>>>> better ensure compatibility, I'd appreciate them.
>>>>>>
>>>>>> >  I think we also need a plan to back this out if it gets us in a
>>>>>> bad state. For example, there is potentially a state where we need to make
>>>>>> a change to Beam core (such as updating a dependency) but can't make it
>>>>>> because it requires this IO to be recompiled.
>>>>>>
>>>>>> As stated above, nearly all (except flogger in the current state
>>>>>> IIRC) are using the versions from the google-shared-dependencies BOM. A
>>>>>> dependency version bump should not introduce a compatibility issue without
>>>>>> also breaking many other google dependencies. I'd also note: this issue
>>>>>> already exists, since the beam repo already needs to depend on the Pub/Sub
>>>>>> Lite client library, which has 80+% of the code from this repo.
>>>>>>
>>>>>> > I think we also need a plan to back this out if it gets us in a bad
>>>>>> state
>>>>>>
>>>>>> I think this is premature perhaps: as far as I know, there is no such
>>>>>> plan in place for many other I/Os which exist in beam and pull in other
>>>>>> maven dependencies which may cause dependency conflicts. This should not
>>>>>> make such conflicts any worse.
>>>>>>
>>>>>> I don't think there's a good workaround for breaking changes in
>>>>>> public beam API surfaces; if some hypothetical beam API changes the name of
>>>>>> a method, and is used by Pub/Sub Lite, this would break compilation in the
>>>>>> beam presubmit. However... I'm not entirely sure that's a bad thing? It
>>>>>> would seem to be a fairly good change detector for breaking changes in the
>>>>>> public API surfaces it uses. As long as Pub/Sub Lite doesn't use internal
>>>>>> API surfaces, this should be fine. I don't intend to use internal API
>>>>>> surfaces for this: indeed, any registrations that occur that use @Internal
>>>>>> annotated surfaces I plan on leaving in the beam repo.
>>>>>>
>>>>>> I think the worst-case workaround is a clone back from our repo to
>>>>>> the beam repo and acknowledgement that this deployment strategy doesn't
>>>>>> work (full rollback): and I'm willing to take responsibility for doing this
>>>>>> if it becomes necessary.
>>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:
>>>>>>
>>>>>>> Just want to let everyone know that I'm drafting a doc for this. It
>>>>>>> will be great to have both teams' reviews+sign-offs on a final decision.
>>>>>>> Thank you all.
>>>>>>>
>>>>>>>
>>>>>>>  PubsubLiteIO Release Strategy
>>>>>>> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If Beam is dependent on a library that is also dependent on Beam it
>>>>>>>> would be impossible to update dependencies in either. Beam is released as a
>>>>>>>> single atomic unit, we can't decouple beam-sdks-java-core
>>>>>>>> from beam-sdks-java-io-google-cloud-platform in our current release
>>>>>>>> process. (This is different from existing external IOs which only depend on
>>>>>>>> Beam.)
>>>>>>>>
>>>>>>>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The reverse could also happen. If the IO needs a new version of
>>>>>>>>> core GCP libraries, realistically it can't be updated until Beam itself has
>>>>>>>>> updated its dependencies.
>>>>>>>>>
>>>>>>>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <
>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> To clarify my "circular dependency" concern, I may have used poor
>>>>>>>>>> terminology to describe it. We have no tests to ensure we don't break
>>>>>>>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>>>>>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>>>>>>>> recompiled. To mitigate:
>>>>>>>>>>
>>>>>>>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken
>>>>>>>>>> by changes to Beam.
>>>>>>>>>> 2. I think we also need a plan to back this out if it gets us in
>>>>>>>>>> a bad state. For example, there is potentially a state where we need to
>>>>>>>>>> make a change to Beam core (such as updating a dependency) but can't make
>>>>>>>>>> it because it requires this IO to be recompiled. If this IO depends on a
>>>>>>>>>> new Beam release to be recompiled this would be impossible. I don't want to
>>>>>>>>>> push that friction down to Beam core developers.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> To add to Luke's concern - compatibility of GCP libraries has
>>>>>>>>>>> been a huge headache, and keeping GCP modules together helps at least a
>>>>>>>>>>> bit. It has happened not infrequently that users experience incompatibility
>>>>>>>>>>> between proto or grpc versions, because they link a library that wants one
>>>>>>>>>>> version and Beam depends on another version. Moving PubsubLiteIO outside of
>>>>>>>>>>> Beam means that you as the package maintainer will have to deal with these
>>>>>>>>>>> issues.
>>>>>>>>>>>
>>>>>>>>>>> Reuven
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I think the goals are good:
>>>>>>>>>>>>
>>>>>>>>>>>>  - be able to release fixes quicker
>>>>>>>>>>>>  - have users discover PubsubLiteIO
>>>>>>>>>>>>
>>>>>>>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>>>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>>>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>>>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>>>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>>>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>>>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>>>>>>>> snapshot.
>>>>>>>>>>>>
>>>>>>>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist
>>>>>>>>>>>> and it is fine. Decoupled releases are the hard part. I've had a few
>>>>>>>>>>>> discussions about decoupled releases within the same repo. It has all the
>>>>>>>>>>>> same problems whether it is in the same repo or not. In some ways it is
>>>>>>>>>>>> easier outside the repo because it removes the temptation to couple things
>>>>>>>>>>>> too much. I think getting good version compatibility test matrix and
>>>>>>>>>>>> benchmarking might be the big task here. And you'd want to have much more
>>>>>>>>>>>> automation in the release. Incidentally, fixes already do not have to be
>>>>>>>>>>>> coupled with an upgrade of all of Beam. You can have a different version
>>>>>>>>>>>> for an IO. Or you can choose the snapshot just for an IO dep. The missing
>>>>>>>>>>>> piece is just the testing mentioned. You want to be sure your new version
>>>>>>>>>>>> of the IO is going to work with old versions of the core SDK.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding the circular dep; I agree that there should not be
>>>>>>>>>>>> one: in your proposal,
>>>>>>>>>>>> org.apache.beam:beam-sdks-java-io-google-cloud-platform depends on
>>>>>>>>>>>> com.google.pubsublite:google-beam-pubsublite, and both of those modules
>>>>>>>>>>>> depend on org.apache.beam:beam-sdks-java-core. The core SDK does not depend
>>>>>>>>>>>> on any IO (and we should keep it this way, for sure).
>>>>>>>>>>>>
>>>>>>>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>>>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>>>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>>>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>>>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>>>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>>>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>>>>>>>> start from that.
>>>>>>>>>>>>
>>>>>>>>>>>> Kenn
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <
>>>>>>>>>>>>> evan.galpin@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any possibility of changing the build cadence
>>>>>>>>>>>>>> allowing for builds released as alpha versions or similar? It’s not too
>>>>>>>>>>>>>> uncommon for projects to have nightly builds for example. Could that help
>>>>>>>>>>>>>> deliver fixes more quickly to customers, while also avoiding the nuisances
>>>>>>>>>>>>>> mentioned in this thread?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Evan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I wouldn't say this is uncharted territory as there are
>>>>>>>>>>>>>>> Apache Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The most annoying aspects will be the versioning story, i.e.
>>>>>>>>>>>>>>> users will want to use the library with different versions of Apache Beam
>>>>>>>>>>>>>>> since some people won't want to upgrade since they have something working
>>>>>>>>>>>>>>> and others will want it against the latest version since they want some
>>>>>>>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <
>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you are in a better place to make this decision.
>>>>>>>>>>>>>>>> You are the primary contributor and maintainer for this IO and you clearly
>>>>>>>>>>>>>>>> know the pubsub lite user base as well. If you think this is the best
>>>>>>>>>>>>>>>> course of action I will support that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That said, afaik you are moving into uncharted territory.
>>>>>>>>>>>>>>>> The questions raised here are about support, testing, discoverability,
>>>>>>>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>>>>>>>> that happens.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I like that this model still allows discoverability through
>>>>>>>>>>>>>>>> Beam and by default supports an out of the box tested version already. I
>>>>>>>>>>>>>>>> guess that will be good enough for most beam + pubsub lite users.  And I
>>>>>>>>>>>>>>>> hope the model will, as you predict, give you a quick way to address user
>>>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more question of my own: Do you expect pubsub lite io
>>>>>>>>>>>>>>>> to continue to receive frequent updates in the long term? (For example,
>>>>>>>>>>>>>>>> afaik pubsub io no longer needs or gets frequent updates.). If not,
>>>>>>>>>>>>>>>> eventually keeping the io external might become irrelevant.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you need from this community to make progress on
>>>>>>>>>>>>>>>> this question?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this.
>>>>>>>>>>>>>>>>> If they get the one subject to the long release cycle, that's usually okay,
>>>>>>>>>>>>>>>>> unless they need recently added features/fixes. Pub/Sub Lite's
>>>>>>>>>>>>>>>>> documentation will state to prefer the one from our artifact, but the
>>>>>>>>>>>>>>>>> expectation is the one in beam will work fine in recent releases.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > Will it just be documented somewhere that users should
>>>>>>>>>>>>>>>>> prefer com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent
>>>>>>>>>>>>>>>>> fix they need?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, both in our public docs and the docstring for the
>>>>>>>>>>>>>>>>> beam PubsubLiteIO.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> An interesting side effect of subclassing in this way is
>>>>>>>>>>>>>>>>> that if the user adds a newer version of the PubsubLiteIO
>>>>>>>>>>>>>>>>> implementation-specific artifact in their pom, they won't actually need to
>>>>>>>>>>>>>>>>> make any code changes: the beam PubsubLiteIO will transparently refer to
>>>>>>>>>>>>>>>>> the new implementation version.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How will this be communicated to the user? The idea is
>>>>>>>>>>>>>>>>>> that they will discover PubsubLiteIO through their IDE as you described,
>>>>>>>>>>>>>>>>>> but that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>>>>>>>> need?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I wonder if a similar result could be achieved just by
>>>>>>>>>>>>>>>>>> making Beam's PubsubLiteIO a stub with no implementation that directs users
>>>>>>>>>>>>>>>>>> to the com.google.cloud one somehow?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent
>>>>>>>>>>>>>>>>>> here. I have been warned many times by
>>>>>>>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like to run the integration tests in both locations.
>>>>>>>>>>>>>>>>>>> They would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Visibility and autocomplete. It means the core class
>>>>>>>>>>>>>>>>>>> will be in the beam javadoc and if you type `import
>>>>>>>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>>>>> (You helped me apply some change to this strange setup
>>>>>>>>>>>>>>>>>>>> a few months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The backward compatibility came to my mind but I
>>>>>>>>>>>>>>>>>>>> thought you may have more reasons.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries
>>>>>>>>>>>>>>>>>>>> BOM (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem-
>>>>>>>>>>>>>>>>>>>>> wouldn't it override and cause it to use beam-sdks-java-core:2.30.0 (at
>>>>>>>>>>>>>>>>>>>>> least until beam goes to 3.X.X)?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> How do you plan to address the circular dependency?
>>>>>>>>>>>>>>>>>>>>>> Won't this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and
>>>>>>>>>>>>>>>>>>>>>>> I'd like to get some feedback on a change to the model for hosting this I/O
>>>>>>>>>>>>>>>>>>>>>>> in beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and
>>>>>>>>>>>>>>>>>>>>>>> improve the I/O while retaining end-user visibility within the beam
>>>>>>>>>>>>>>>>>>>>>>> repo. To do this, I'd like to remove all the implementation from the beam
>>>>>>>>>>>>>>>>>>>>>>> repo, and leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and
>>>>>>>>>>>>>>>>>>>>>>> suggestions surrounding this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Ahmet Altay <al...@google.com>.
Curious about the status of this. Was there a consensus in the PR or in the
doc?

On Tue, Jul 6, 2021 at 10:25 PM Robert Bradshaw <ro...@google.com> wrote:

> I don't think that every IO (or transform, etc.) needs to be a part of
> Beam (and it's release cycle, etc.) There's a point of maturity where
> growing the ecosystem makes more sense than growing the repository. This is
> one of the reasons we try to build IOs, etc. on the public API rather than
> needing some "secret sauce." I don't think this part of the API is fragile
> or changing enough to be an issue (though testing against nightlies is a
> good idea).
>
> As for discoverability, it makes sense to have some pointers, but I don't
> see stubs from Beam to an external repository as necessarily or desirable.
> As long as it's well documented it'll be easy to search for for exactly the
> set of people that are interested in it.
>
>
>
> On Tue, Jul 6, 2021 at 5:53 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>>
>>
>> On Tue, Jul 6, 2021 at 8:01 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> > I think this is premature perhaps: as far as I know, there is no such
>>> plan in place for many other I/Os which exist in beam and pull in other
>>> maven dependencies which may cause dependency conflicts. This should not
>>> make such conflicts any worse.
>>>
>>> To my knowledge there exists no other IO in Beam that depends on an
>>> external library that also depends on Beam. That is where I believe you are
>>> going into uncharted territory. I am fine with the plan being a full
>>> rollback (or even just a copy of the external repo into Beam) if we hit
>>> this problem. It sounds like you have it covered.
>>>
>>
>> I agree with Andrew. Other external I/O connectors do not depend on Beam
>> in a circular way AFAIK.
>> Additionally, this means that a user that just uses the Beam stub has no
>> idea regarding the API surface or backwards compatibility guarantees since
>> the API is completely inherited from a class that is in an external repo.
>>
>>  If release velocity is an issue, it might be cleaner to just move
>> Pub/Sub Lite I/O completely out of Beam instead of leaving a stub in the
>> Beam codebase. Pub/Sub Lite connector is experimental. So backwards
>> compatibility when moving the code should not be an issue.
>>
>> In-fact we have a BigTable connector that is completely external to Beam
>> which has worked fine for customers so far I believe.
>>
>> https://github.com/googleapis/java-bigtable-hbase/tree/master/bigtable-dataflow-parent/bigtable-hbase-beam
>>
>> Agree that discoverability can be an issue though. May be listing
>> recommended external I/O connector here might help:
>> https://beam.apache.org/documentation/io/built-in/
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> On Fri, Jul 2, 2021 at 8:22 PM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm on vacation, so sorry if I missed a lot of discussion here. Going
>>>> to try to reply to a bunch of the comments in this thread:
>>>>
>>>> > One more question of my own: Do you expect pubsub lite io to continue
>>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>>> io external might become irrelevant.
>>>>
>>>> We expect the I/O to receive updates over the short term to handle the
>>>> availability of dataflow-runner-v2 for java without an opt-in allowlist,
>>>> but in the longer term (1 year +) we don't expect to keep updating this
>>>> significantly. However, one of the primary reasons we want to do this it to
>>>> have the ability to respond to user requests and bug reports on shorter
>>>> timetables- even if we don't plan to make changes, it's hard to predict
>>>> either bug reports or feature requests.
>>>>
>>>> > The real issue is 3rd party dependency convergence and managing a BOM
>>>> that works for your users.
>>>>
>>>> Agreed- however, our library <largely> uses the google cloud BOM
>>>> (actually its underlying dependency list in google-shared-dependencies) for
>>>> shared dependencies, so this should be mostly a non-issue, even more so if
>>>> beam eventually moves to use the cloud BOM.
>>>>
>>>> > The core SDK does not depend on any IO (and we should keep it this
>>>> way, for sure).
>>>>
>>>> Agreed.
>>>>
>>>> > I have to also push on whether we can do this the "normal" way: refer
>>>> to it in docs, and have examples for users to copy/paste/modify that
>>>> already includes the needed deps.
>>>>
>>>> We can, the primary issue with this is discoverability: the current
>>>> expectation of users is that they pull in the google-cloud-platform
>>>> artifact and get all I/Os available. Without the alias, we need to tell
>>>> users to use the different artifact, and they may not know that they even
>>>> need to look for the different artifact; instead assuming it doesn't exist.
>>>> This is purely for U/X purposes.
>>>>
>>>> > 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>> changes to Beam.
>>>>
>>>> Agreed, and in https://github.com/apache/beam/pull/15076 I've added
>>>> one. If you have any more concrete suggestions for testing that would
>>>> better ensure compatibility, I'd appreciate them.
>>>>
>>>> >  I think we also need a plan to back this out if it gets us in a bad
>>>> state. For example, there is potentially a state where we need to make a
>>>> change to Beam core (such as updating a dependency) but can't make it
>>>> because it requires this IO to be recompiled.
>>>>
>>>> As stated above, nearly all (except flogger in the current state IIRC)
>>>> are using the versions from the google-shared-dependencies BOM. A
>>>> dependency version bump should not introduce a compatibility issue without
>>>> also breaking many other google dependencies. I'd also note: this issue
>>>> already exists, since the beam repo already needs to depend on the Pub/Sub
>>>> Lite client library, which has 80+% of the code from this repo.
>>>>
>>>> > I think we also need a plan to back this out if it gets us in a bad
>>>> state
>>>>
>>>> I think this is premature perhaps: as far as I know, there is no such
>>>> plan in place for many other I/Os which exist in beam and pull in other
>>>> maven dependencies which may cause dependency conflicts. This should not
>>>> make such conflicts any worse.
>>>>
>>>> I don't think there's a good workaround for breaking changes in public
>>>> beam API surfaces; if some hypothetical beam API changes the name of a
>>>> method, and is used by Pub/Sub Lite, this would break compilation in the
>>>> beam presubmit. However... I'm not entirely sure that's a bad thing? It
>>>> would seem to be a fairly good change detector for breaking changes in the
>>>> public API surfaces it uses. As long as Pub/Sub Lite doesn't use internal
>>>> API surfaces, this should be fine. I don't intend to use internal API
>>>> surfaces for this: indeed, any registrations that occur that use @Internal
>>>> annotated surfaces I plan on leaving in the beam repo.
>>>>
>>>> I think the worst-case workaround is a clone back from our repo to the
>>>> beam repo and acknowledgement that this deployment strategy doesn't work
>>>> (full rollback): and I'm willing to take responsibility for doing this if
>>>> it becomes necessary.
>>>>
>>>> -Daniel
>>>>
>>>>
>>>> On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:
>>>>
>>>>> Just want to let everyone know that I'm drafting a doc for this. It
>>>>> will be great to have both teams' reviews+sign-offs on a final decision.
>>>>> Thank you all.
>>>>>
>>>>>
>>>>>  PubsubLiteIO Release Strategy
>>>>> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>>>>>
>>>>>
>>>>> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> If Beam is dependent on a library that is also dependent on Beam it
>>>>>> would be impossible to update dependencies in either. Beam is released as a
>>>>>> single atomic unit, we can't decouple beam-sdks-java-core
>>>>>> from beam-sdks-java-io-google-cloud-platform in our current release
>>>>>> process. (This is different from existing external IOs which only depend on
>>>>>> Beam.)
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> The reverse could also happen. If the IO needs a new version of core
>>>>>>> GCP libraries, realistically it can't be updated until Beam itself has
>>>>>>> updated its dependencies.
>>>>>>>
>>>>>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To clarify my "circular dependency" concern, I may have used poor
>>>>>>>> terminology to describe it. We have no tests to ensure we don't break
>>>>>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>>>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>>>>>> recompiled. To mitigate:
>>>>>>>>
>>>>>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>>>>>> changes to Beam.
>>>>>>>> 2. I think we also need a plan to back this out if it gets us in a
>>>>>>>> bad state. For example, there is potentially a state where we need to make
>>>>>>>> a change to Beam core (such as updating a dependency) but can't make it
>>>>>>>> because it requires this IO to be recompiled. If this IO depends on a new
>>>>>>>> Beam release to be recompiled this would be impossible. I don't want to
>>>>>>>> push that friction down to Beam core developers.
>>>>>>>>
>>>>>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> To add to Luke's concern - compatibility of GCP libraries has been
>>>>>>>>> a huge headache, and keeping GCP modules together helps at least a bit. It
>>>>>>>>> has happened not infrequently that users experience incompatibility between
>>>>>>>>> proto or grpc versions, because they link a library that wants one version
>>>>>>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>>>>>>> means that you as the package maintainer will have to deal with these
>>>>>>>>> issues.
>>>>>>>>>
>>>>>>>>> Reuven
>>>>>>>>>
>>>>>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I think the goals are good:
>>>>>>>>>>
>>>>>>>>>>  - be able to release fixes quicker
>>>>>>>>>>  - have users discover PubsubLiteIO
>>>>>>>>>>
>>>>>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>>>>>> snapshot.
>>>>>>>>>>
>>>>>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and
>>>>>>>>>> it is fine. Decoupled releases are the hard part. I've had a few
>>>>>>>>>> discussions about decoupled releases within the same repo. It has all the
>>>>>>>>>> same problems whether it is in the same repo or not. In some ways it is
>>>>>>>>>> easier outside the repo because it removes the temptation to couple things
>>>>>>>>>> too much. I think getting good version compatibility test matrix and
>>>>>>>>>> benchmarking might be the big task here. And you'd want to have much more
>>>>>>>>>> automation in the release. Incidentally, fixes already do not have to be
>>>>>>>>>> coupled with an upgrade of all of Beam. You can have a different version
>>>>>>>>>> for an IO. Or you can choose the snapshot just for an IO dep. The missing
>>>>>>>>>> piece is just the testing mentioned. You want to be sure your new version
>>>>>>>>>> of the IO is going to work with old versions of the core SDK.
>>>>>>>>>>
>>>>>>>>>> Regarding the circular dep; I agree that there should not be one:
>>>>>>>>>> in your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>>>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>>>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>>>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>>>>>>
>>>>>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>>>>>> start from that.
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <
>>>>>>>>>>> evan.galpin@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Is there any possibility of changing the build cadence allowing
>>>>>>>>>>>> for builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>>>>>>> this thread?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Evan
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't say this is uncharted territory as there are Apache
>>>>>>>>>>>>> Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The most annoying aspects will be the versioning story, i.e.
>>>>>>>>>>>>> users will want to use the library with different versions of Apache Beam
>>>>>>>>>>>>> since some people won't want to upgrade since they have something working
>>>>>>>>>>>>> and others will want it against the latest version since they want some
>>>>>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>>>>>> users.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you are in a better place to make this decision. You
>>>>>>>>>>>>>> are the primary contributor and maintainer for this IO and you clearly know
>>>>>>>>>>>>>> the pubsub lite user base as well. If you think this is the best course of
>>>>>>>>>>>>>> action I will support that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>>>>>> that happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I like that this model still allows discoverability through
>>>>>>>>>>>>>> Beam and by default supports an out of the box tested version already. I
>>>>>>>>>>>>>> guess that will be good enough for most beam + pubsub lite users.  And I
>>>>>>>>>>>>>> hope the model will, as you predict, give you a quick way to address user
>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>>>>>>> question?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If
>>>>>>>>>>>>>>> they get the one subject to the long release cycle, that's usually okay,
>>>>>>>>>>>>>>> unless they need recently added features/fixes. Pub/Sub Lite's
>>>>>>>>>>>>>>> documentation will state to prefer the one from our artifact, but the
>>>>>>>>>>>>>>> expectation is the one in beam will work fine in recent releases.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Will it just be documented somewhere that users should
>>>>>>>>>>>>>>> prefer com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent
>>>>>>>>>>>>>>> fix they need?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> An interesting side effect of subclassing in this way is
>>>>>>>>>>>>>>> that if the user adds a newer version of the PubsubLiteIO
>>>>>>>>>>>>>>> implementation-specific artifact in their pom, they won't actually need to
>>>>>>>>>>>>>>> make any code changes: the beam PubsubLiteIO will transparently refer to
>>>>>>>>>>>>>>> the new implementation version.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How will this be communicated to the user? The idea is that
>>>>>>>>>>>>>>>> they will discover PubsubLiteIO through their IDE as you described, but
>>>>>>>>>>>>>>>> that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>>>>>> need?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I wonder if a similar result could be achieved just by
>>>>>>>>>>>>>>>> making Beam's PubsubLiteIO a stub with no implementation that directs users
>>>>>>>>>>>>>>>> to the com.google.cloud one somehow?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent
>>>>>>>>>>>>>>>> here. I have been warned many times by
>>>>>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to run the integration tests in both locations.
>>>>>>>>>>>>>>>>> They would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Visibility and autocomplete. It means the core class will
>>>>>>>>>>>>>>>>> be in the beam javadoc and if you type `import
>>>>>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>>> (You helped me apply some change to this strange setup a
>>>>>>>>>>>>>>>>>> few months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The backward compatibility came to my mind but I thought
>>>>>>>>>>>>>>>>>> you may have more reasons.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM
>>>>>>>>>>>>>>>>>> (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem-
>>>>>>>>>>>>>>>>>>> wouldn't it override and cause it to use beam-sdks-java-core:2.30.0 (at
>>>>>>>>>>>>>>>>>>> least until beam goes to 3.X.X)?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> How do you plan to address the circular dependency?
>>>>>>>>>>>>>>>>>>>> Won't this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and
>>>>>>>>>>>>>>>>>>>>> I'd like to get some feedback on a change to the model for hosting this I/O
>>>>>>>>>>>>>>>>>>>>> in beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and
>>>>>>>>>>>>>>>>>>>>> improve the I/O while retaining end-user visibility within the beam
>>>>>>>>>>>>>>>>>>>>> repo. To do this, I'd like to remove all the implementation from the beam
>>>>>>>>>>>>>>>>>>>>> repo, and leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and
>>>>>>>>>>>>>>>>>>>>> suggestions surrounding this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Robert Bradshaw <ro...@google.com>.
I don't think that every IO (or transform, etc.) needs to be a part of Beam
(and it's release cycle, etc.) There's a point of maturity where growing
the ecosystem makes more sense than growing the repository. This is one of
the reasons we try to build IOs, etc. on the public API rather than needing
some "secret sauce." I don't think this part of the API is fragile or
changing enough to be an issue (though testing against nightlies is a good
idea).

As for discoverability, it makes sense to have some pointers, but I don't
see stubs from Beam to an external repository as necessarily or desirable.
As long as it's well documented it'll be easy to search for for exactly the
set of people that are interested in it.



On Tue, Jul 6, 2021 at 5:53 PM Chamikara Jayalath <ch...@google.com>
wrote:

>
>
> On Tue, Jul 6, 2021 at 8:01 AM Andrew Pilloud <ap...@google.com> wrote:
>
>> > I think this is premature perhaps: as far as I know, there is no such
>> plan in place for many other I/Os which exist in beam and pull in other
>> maven dependencies which may cause dependency conflicts. This should not
>> make such conflicts any worse.
>>
>> To my knowledge there exists no other IO in Beam that depends on an
>> external library that also depends on Beam. That is where I believe you are
>> going into uncharted territory. I am fine with the plan being a full
>> rollback (or even just a copy of the external repo into Beam) if we hit
>> this problem. It sounds like you have it covered.
>>
>
> I agree with Andrew. Other external I/O connectors do not depend on Beam
> in a circular way AFAIK.
> Additionally, this means that a user that just uses the Beam stub has no
> idea regarding the API surface or backwards compatibility guarantees since
> the API is completely inherited from a class that is in an external repo.
>
>  If release velocity is an issue, it might be cleaner to just move Pub/Sub
> Lite I/O completely out of Beam instead of leaving a stub in the Beam
> codebase. Pub/Sub Lite connector is experimental. So backwards
> compatibility when moving the code should not be an issue.
>
> In-fact we have a BigTable connector that is completely external to Beam
> which has worked fine for customers so far I believe.
>
> https://github.com/googleapis/java-bigtable-hbase/tree/master/bigtable-dataflow-parent/bigtable-hbase-beam
>
> Agree that discoverability can be an issue though. May be listing
> recommended external I/O connector here might help:
> https://beam.apache.org/documentation/io/built-in/
>
> Thanks,
> Cham
>
>
>>
>> On Fri, Jul 2, 2021 at 8:22 PM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm on vacation, so sorry if I missed a lot of discussion here. Going to
>>> try to reply to a bunch of the comments in this thread:
>>>
>>> > One more question of my own: Do you expect pubsub lite io to continue
>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>> io external might become irrelevant.
>>>
>>> We expect the I/O to receive updates over the short term to handle the
>>> availability of dataflow-runner-v2 for java without an opt-in allowlist,
>>> but in the longer term (1 year +) we don't expect to keep updating this
>>> significantly. However, one of the primary reasons we want to do this it to
>>> have the ability to respond to user requests and bug reports on shorter
>>> timetables- even if we don't plan to make changes, it's hard to predict
>>> either bug reports or feature requests.
>>>
>>> > The real issue is 3rd party dependency convergence and managing a BOM
>>> that works for your users.
>>>
>>> Agreed- however, our library <largely> uses the google cloud BOM
>>> (actually its underlying dependency list in google-shared-dependencies) for
>>> shared dependencies, so this should be mostly a non-issue, even more so if
>>> beam eventually moves to use the cloud BOM.
>>>
>>> > The core SDK does not depend on any IO (and we should keep it this
>>> way, for sure).
>>>
>>> Agreed.
>>>
>>> > I have to also push on whether we can do this the "normal" way: refer
>>> to it in docs, and have examples for users to copy/paste/modify that
>>> already includes the needed deps.
>>>
>>> We can, the primary issue with this is discoverability: the current
>>> expectation of users is that they pull in the google-cloud-platform
>>> artifact and get all I/Os available. Without the alias, we need to tell
>>> users to use the different artifact, and they may not know that they even
>>> need to look for the different artifact; instead assuming it doesn't exist.
>>> This is purely for U/X purposes.
>>>
>>> > 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>> changes to Beam.
>>>
>>> Agreed, and in https://github.com/apache/beam/pull/15076 I've added
>>> one. If you have any more concrete suggestions for testing that would
>>> better ensure compatibility, I'd appreciate them.
>>>
>>> >  I think we also need a plan to back this out if it gets us in a bad
>>> state. For example, there is potentially a state where we need to make a
>>> change to Beam core (such as updating a dependency) but can't make it
>>> because it requires this IO to be recompiled.
>>>
>>> As stated above, nearly all (except flogger in the current state IIRC)
>>> are using the versions from the google-shared-dependencies BOM. A
>>> dependency version bump should not introduce a compatibility issue without
>>> also breaking many other google dependencies. I'd also note: this issue
>>> already exists, since the beam repo already needs to depend on the Pub/Sub
>>> Lite client library, which has 80+% of the code from this repo.
>>>
>>> > I think we also need a plan to back this out if it gets us in a bad
>>> state
>>>
>>> I think this is premature perhaps: as far as I know, there is no such
>>> plan in place for many other I/Os which exist in beam and pull in other
>>> maven dependencies which may cause dependency conflicts. This should not
>>> make such conflicts any worse.
>>>
>>> I don't think there's a good workaround for breaking changes in public
>>> beam API surfaces; if some hypothetical beam API changes the name of a
>>> method, and is used by Pub/Sub Lite, this would break compilation in the
>>> beam presubmit. However... I'm not entirely sure that's a bad thing? It
>>> would seem to be a fairly good change detector for breaking changes in the
>>> public API surfaces it uses. As long as Pub/Sub Lite doesn't use internal
>>> API surfaces, this should be fine. I don't intend to use internal API
>>> surfaces for this: indeed, any registrations that occur that use @Internal
>>> annotated surfaces I plan on leaving in the beam repo.
>>>
>>> I think the worst-case workaround is a clone back from our repo to the
>>> beam repo and acknowledgement that this deployment strategy doesn't work
>>> (full rollback): and I'm willing to take responsibility for doing this if
>>> it becomes necessary.
>>>
>>> -Daniel
>>>
>>>
>>> On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:
>>>
>>>> Just want to let everyone know that I'm drafting a doc for this. It
>>>> will be great to have both teams' reviews+sign-offs on a final decision.
>>>> Thank you all.
>>>>
>>>>
>>>>  PubsubLiteIO Release Strategy
>>>> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>>>>
>>>>
>>>> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> If Beam is dependent on a library that is also dependent on Beam it
>>>>> would be impossible to update dependencies in either. Beam is released as a
>>>>> single atomic unit, we can't decouple beam-sdks-java-core
>>>>> from beam-sdks-java-io-google-cloud-platform in our current release
>>>>> process. (This is different from existing external IOs which only depend on
>>>>> Beam.)
>>>>>
>>>>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> The reverse could also happen. If the IO needs a new version of core
>>>>>> GCP libraries, realistically it can't be updated until Beam itself has
>>>>>> updated its dependencies.
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> To clarify my "circular dependency" concern, I may have used poor
>>>>>>> terminology to describe it. We have no tests to ensure we don't break
>>>>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>>>>> recompiled. To mitigate:
>>>>>>>
>>>>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>>>>> changes to Beam.
>>>>>>> 2. I think we also need a plan to back this out if it gets us in a
>>>>>>> bad state. For example, there is potentially a state where we need to make
>>>>>>> a change to Beam core (such as updating a dependency) but can't make it
>>>>>>> because it requires this IO to be recompiled. If this IO depends on a new
>>>>>>> Beam release to be recompiled this would be impossible. I don't want to
>>>>>>> push that friction down to Beam core developers.
>>>>>>>
>>>>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> To add to Luke's concern - compatibility of GCP libraries has been
>>>>>>>> a huge headache, and keeping GCP modules together helps at least a bit. It
>>>>>>>> has happened not infrequently that users experience incompatibility between
>>>>>>>> proto or grpc versions, because they link a library that wants one version
>>>>>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>>>>>> means that you as the package maintainer will have to deal with these
>>>>>>>> issues.
>>>>>>>>
>>>>>>>> Reuven
>>>>>>>>
>>>>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think the goals are good:
>>>>>>>>>
>>>>>>>>>  - be able to release fixes quicker
>>>>>>>>>  - have users discover PubsubLiteIO
>>>>>>>>>
>>>>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>>>>> snapshot.
>>>>>>>>>
>>>>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and
>>>>>>>>> it is fine. Decoupled releases are the hard part. I've had a few
>>>>>>>>> discussions about decoupled releases within the same repo. It has all the
>>>>>>>>> same problems whether it is in the same repo or not. In some ways it is
>>>>>>>>> easier outside the repo because it removes the temptation to couple things
>>>>>>>>> too much. I think getting good version compatibility test matrix and
>>>>>>>>> benchmarking might be the big task here. And you'd want to have much more
>>>>>>>>> automation in the release. Incidentally, fixes already do not have to be
>>>>>>>>> coupled with an upgrade of all of Beam. You can have a different version
>>>>>>>>> for an IO. Or you can choose the snapshot just for an IO dep. The missing
>>>>>>>>> piece is just the testing mentioned. You want to be sure your new version
>>>>>>>>> of the IO is going to work with old versions of the core SDK.
>>>>>>>>>
>>>>>>>>> Regarding the circular dep; I agree that there should not be one:
>>>>>>>>> in your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>>>>>
>>>>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>>>>> start from that.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Is there any possibility of changing the build cadence allowing
>>>>>>>>>>> for builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>>>>>> this thread?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Evan
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I wouldn't say this is uncharted territory as there are Apache
>>>>>>>>>>>> Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>>>>
>>>>>>>>>>>> The most annoying aspects will be the versioning story, i.e.
>>>>>>>>>>>> users will want to use the library with different versions of Apache Beam
>>>>>>>>>>>> since some people won't want to upgrade since they have something working
>>>>>>>>>>>> and others will want it against the latest version since they want some
>>>>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>>>>> users.
>>>>>>>>>>>>
>>>>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you are in a better place to make this decision. You
>>>>>>>>>>>>> are the primary contributor and maintainer for this IO and you clearly know
>>>>>>>>>>>>> the pubsub lite user base as well. If you think this is the best course of
>>>>>>>>>>>>> action I will support that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>>>>> that happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I like that this model still allows discoverability through
>>>>>>>>>>>>> Beam and by default supports an out of the box tested version already. I
>>>>>>>>>>>>> guess that will be good enough for most beam + pubsub lite users.  And I
>>>>>>>>>>>>> hope the model will, as you predict, give you a quick way to address user
>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>>>>>> question?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If
>>>>>>>>>>>>>> they get the one subject to the long release cycle, that's usually okay,
>>>>>>>>>>>>>> unless they need recently added features/fixes. Pub/Sub Lite's
>>>>>>>>>>>>>> documentation will state to prefer the one from our artifact, but the
>>>>>>>>>>>>>> expectation is the one in beam will work fine in recent releases.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Will it just be documented somewhere that users should
>>>>>>>>>>>>>> prefer com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent
>>>>>>>>>>>>>> fix they need?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> An interesting side effect of subclassing in this way is that
>>>>>>>>>>>>>> if the user adds a newer version of the PubsubLiteIO
>>>>>>>>>>>>>> implementation-specific artifact in their pom, they won't actually need to
>>>>>>>>>>>>>> make any code changes: the beam PubsubLiteIO will transparently refer to
>>>>>>>>>>>>>> the new implementation version.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How will this be communicated to the user? The idea is that
>>>>>>>>>>>>>>> they will discover PubsubLiteIO through their IDE as you described, but
>>>>>>>>>>>>>>> that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>>>>> need?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I wonder if a similar result could be achieved just by
>>>>>>>>>>>>>>> making Beam's PubsubLiteIO a stub with no implementation that directs users
>>>>>>>>>>>>>>> to the com.google.cloud one somehow?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent here.
>>>>>>>>>>>>>>> I have been warned many times by
>>>>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to run the integration tests in both locations.
>>>>>>>>>>>>>>>> They would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Visibility and autocomplete. It means the core class will
>>>>>>>>>>>>>>>> be in the beam javadoc and if you type `import
>>>>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>> (You helped me apply some change to this strange setup a
>>>>>>>>>>>>>>>>> few months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The backward compatibility came to my mind but I thought
>>>>>>>>>>>>>>>>> you may have more reasons.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM
>>>>>>>>>>>>>>>>> (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem-
>>>>>>>>>>>>>>>>>> wouldn't it override and cause it to use beam-sdks-java-core:2.30.0 (at
>>>>>>>>>>>>>>>>>> least until beam goes to 3.X.X)?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How do you plan to address the circular dependency?
>>>>>>>>>>>>>>>>>>> Won't this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd
>>>>>>>>>>>>>>>>>>>> like to get some feedback on a change to the model for hosting this I/O in
>>>>>>>>>>>>>>>>>>>> beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and
>>>>>>>>>>>>>>>>>>>> improve the I/O while retaining end-user visibility within the beam
>>>>>>>>>>>>>>>>>>>> repo. To do this, I'd like to remove all the implementation from the beam
>>>>>>>>>>>>>>>>>>>> repo, and leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and
>>>>>>>>>>>>>>>>>>>> suggestions surrounding this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Jul 6, 2021 at 8:01 AM Andrew Pilloud <ap...@google.com> wrote:

> > I think this is premature perhaps: as far as I know, there is no such
> plan in place for many other I/Os which exist in beam and pull in other
> maven dependencies which may cause dependency conflicts. This should not
> make such conflicts any worse.
>
> To my knowledge there exists no other IO in Beam that depends on an
> external library that also depends on Beam. That is where I believe you are
> going into uncharted territory. I am fine with the plan being a full
> rollback (or even just a copy of the external repo into Beam) if we hit
> this problem. It sounds like you have it covered.
>

I agree with Andrew. Other external I/O connectors do not depend on Beam in
a circular way AFAIK.
Additionally, this means that a user that just uses the Beam stub has no
idea regarding the API surface or backwards compatibility guarantees since
the API is completely inherited from a class that is in an external repo.

 If release velocity is an issue, it might be cleaner to just move Pub/Sub
Lite I/O completely out of Beam instead of leaving a stub in the Beam
codebase. Pub/Sub Lite connector is experimental. So backwards
compatibility when moving the code should not be an issue.

In-fact we have a BigTable connector that is completely external to Beam
which has worked fine for customers so far I believe.
https://github.com/googleapis/java-bigtable-hbase/tree/master/bigtable-dataflow-parent/bigtable-hbase-beam

Agree that discoverability can be an issue though. May be listing
recommended external I/O connector here might help:
https://beam.apache.org/documentation/io/built-in/

Thanks,
Cham


>
> On Fri, Jul 2, 2021 at 8:22 PM Daniel Collins <dp...@google.com>
> wrote:
>
>> Hi All,
>>
>> I'm on vacation, so sorry if I missed a lot of discussion here. Going to
>> try to reply to a bunch of the comments in this thread:
>>
>> > One more question of my own: Do you expect pubsub lite io to continue
>> to receive frequent updates in the long term? (For example, afaik pubsub io
>> no longer needs or gets frequent updates.). If not, eventually keeping the
>> io external might become irrelevant.
>>
>> We expect the I/O to receive updates over the short term to handle the
>> availability of dataflow-runner-v2 for java without an opt-in allowlist,
>> but in the longer term (1 year +) we don't expect to keep updating this
>> significantly. However, one of the primary reasons we want to do this it to
>> have the ability to respond to user requests and bug reports on shorter
>> timetables- even if we don't plan to make changes, it's hard to predict
>> either bug reports or feature requests.
>>
>> > The real issue is 3rd party dependency convergence and managing a BOM
>> that works for your users.
>>
>> Agreed- however, our library <largely> uses the google cloud BOM
>> (actually its underlying dependency list in google-shared-dependencies) for
>> shared dependencies, so this should be mostly a non-issue, even more so if
>> beam eventually moves to use the cloud BOM.
>>
>> > The core SDK does not depend on any IO (and we should keep it this way,
>> for sure).
>>
>> Agreed.
>>
>> > I have to also push on whether we can do this the "normal" way: refer
>> to it in docs, and have examples for users to copy/paste/modify that
>> already includes the needed deps.
>>
>> We can, the primary issue with this is discoverability: the current
>> expectation of users is that they pull in the google-cloud-platform
>> artifact and get all I/Os available. Without the alias, we need to tell
>> users to use the different artifact, and they may not know that they even
>> need to look for the different artifact; instead assuming it doesn't exist.
>> This is purely for U/X purposes.
>>
>> > 1. There needs to be tests in Beam to ensure the IO isn't broken by
>> changes to Beam.
>>
>> Agreed, and in https://github.com/apache/beam/pull/15076 I've added one.
>> If you have any more concrete suggestions for testing that would better
>> ensure compatibility, I'd appreciate them.
>>
>> >  I think we also need a plan to back this out if it gets us in a bad
>> state. For example, there is potentially a state where we need to make a
>> change to Beam core (such as updating a dependency) but can't make it
>> because it requires this IO to be recompiled.
>>
>> As stated above, nearly all (except flogger in the current state IIRC)
>> are using the versions from the google-shared-dependencies BOM. A
>> dependency version bump should not introduce a compatibility issue without
>> also breaking many other google dependencies. I'd also note: this issue
>> already exists, since the beam repo already needs to depend on the Pub/Sub
>> Lite client library, which has 80+% of the code from this repo.
>>
>> > I think we also need a plan to back this out if it gets us in a bad
>> state
>>
>> I think this is premature perhaps: as far as I know, there is no such
>> plan in place for many other I/Os which exist in beam and pull in other
>> maven dependencies which may cause dependency conflicts. This should not
>> make such conflicts any worse.
>>
>> I don't think there's a good workaround for breaking changes in public
>> beam API surfaces; if some hypothetical beam API changes the name of a
>> method, and is used by Pub/Sub Lite, this would break compilation in the
>> beam presubmit. However... I'm not entirely sure that's a bad thing? It
>> would seem to be a fairly good change detector for breaking changes in the
>> public API surfaces it uses. As long as Pub/Sub Lite doesn't use internal
>> API surfaces, this should be fine. I don't intend to use internal API
>> surfaces for this: indeed, any registrations that occur that use @Internal
>> annotated surfaces I plan on leaving in the beam repo.
>>
>> I think the worst-case workaround is a clone back from our repo to the
>> beam repo and acknowledgement that this deployment strategy doesn't work
>> (full rollback): and I'm willing to take responsibility for doing this if
>> it becomes necessary.
>>
>> -Daniel
>>
>>
>> On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:
>>
>>> Just want to let everyone know that I'm drafting a doc for this. It will
>>> be great to have both teams' reviews+sign-offs on a final decision. Thank
>>> you all.
>>>
>>>
>>>  PubsubLiteIO Release Strategy
>>> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>>>
>>>
>>> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> If Beam is dependent on a library that is also dependent on Beam it
>>>> would be impossible to update dependencies in either. Beam is released as a
>>>> single atomic unit, we can't decouple beam-sdks-java-core
>>>> from beam-sdks-java-io-google-cloud-platform in our current release
>>>> process. (This is different from existing external IOs which only depend on
>>>> Beam.)
>>>>
>>>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> The reverse could also happen. If the IO needs a new version of core
>>>>> GCP libraries, realistically it can't be updated until Beam itself has
>>>>> updated its dependencies.
>>>>>
>>>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> To clarify my "circular dependency" concern, I may have used poor
>>>>>> terminology to describe it. We have no tests to ensure we don't break
>>>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>>>> recompiled. To mitigate:
>>>>>>
>>>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>>>> changes to Beam.
>>>>>> 2. I think we also need a plan to back this out if it gets us in a
>>>>>> bad state. For example, there is potentially a state where we need to make
>>>>>> a change to Beam core (such as updating a dependency) but can't make it
>>>>>> because it requires this IO to be recompiled. If this IO depends on a new
>>>>>> Beam release to be recompiled this would be impossible. I don't want to
>>>>>> push that friction down to Beam core developers.
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> To add to Luke's concern - compatibility of GCP libraries has been a
>>>>>>> huge headache, and keeping GCP modules together helps at least a bit. It
>>>>>>> has happened not infrequently that users experience incompatibility between
>>>>>>> proto or grpc versions, because they link a library that wants one version
>>>>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>>>>> means that you as the package maintainer will have to deal with these
>>>>>>> issues.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think the goals are good:
>>>>>>>>
>>>>>>>>  - be able to release fixes quicker
>>>>>>>>  - have users discover PubsubLiteIO
>>>>>>>>
>>>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>>>> snapshot.
>>>>>>>>
>>>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and
>>>>>>>> it is fine. Decoupled releases are the hard part. I've had a few
>>>>>>>> discussions about decoupled releases within the same repo. It has all the
>>>>>>>> same problems whether it is in the same repo or not. In some ways it is
>>>>>>>> easier outside the repo because it removes the temptation to couple things
>>>>>>>> too much. I think getting good version compatibility test matrix and
>>>>>>>> benchmarking might be the big task here. And you'd want to have much more
>>>>>>>> automation in the release. Incidentally, fixes already do not have to be
>>>>>>>> coupled with an upgrade of all of Beam. You can have a different version
>>>>>>>> for an IO. Or you can choose the snapshot just for an IO dep. The missing
>>>>>>>> piece is just the testing mentioned. You want to be sure your new version
>>>>>>>> of the IO is going to work with old versions of the core SDK.
>>>>>>>>
>>>>>>>> Regarding the circular dep; I agree that there should not be one:
>>>>>>>> in your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>>>>
>>>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>>>> start from that.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>>
>>>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Is there any possibility of changing the build cadence allowing
>>>>>>>>>> for builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>>>>> this thread?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Evan
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I wouldn't say this is uncharted territory as there are Apache
>>>>>>>>>>> Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>>>
>>>>>>>>>>> The most annoying aspects will be the versioning story, i.e.
>>>>>>>>>>> users will want to use the library with different versions of Apache Beam
>>>>>>>>>>> since some people won't want to upgrade since they have something working
>>>>>>>>>>> and others will want it against the latest version since they want some
>>>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>>>> users.
>>>>>>>>>>>
>>>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> I think you are in a better place to make this decision. You
>>>>>>>>>>>> are the primary contributor and maintainer for this IO and you clearly know
>>>>>>>>>>>> the pubsub lite user base as well. If you think this is the best course of
>>>>>>>>>>>> action I will support that.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>>>> that happens.
>>>>>>>>>>>>
>>>>>>>>>>>> I like that this model still allows discoverability through
>>>>>>>>>>>> Beam and by default supports an out of the box tested version already. I
>>>>>>>>>>>> guess that will be good enough for most beam + pubsub lite users.  And I
>>>>>>>>>>>> hope the model will, as you predict, give you a quick way to address user
>>>>>>>>>>>> requests.
>>>>>>>>>>>>
>>>>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>>>>> question?
>>>>>>>>>>>>
>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If
>>>>>>>>>>>>> they get the one subject to the long release cycle, that's usually okay,
>>>>>>>>>>>>> unless they need recently added features/fixes. Pub/Sub Lite's
>>>>>>>>>>>>> documentation will state to prefer the one from our artifact, but the
>>>>>>>>>>>>> expectation is the one in beam will work fine in recent releases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Will it just be documented somewhere that users should
>>>>>>>>>>>>> prefer com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent
>>>>>>>>>>>>> fix they need?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> An interesting side effect of subclassing in this way is that
>>>>>>>>>>>>> if the user adds a newer version of the PubsubLiteIO
>>>>>>>>>>>>> implementation-specific artifact in their pom, they won't actually need to
>>>>>>>>>>>>> make any code changes: the beam PubsubLiteIO will transparently refer to
>>>>>>>>>>>>> the new implementation version.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> How will this be communicated to the user? The idea is that
>>>>>>>>>>>>>> they will discover PubsubLiteIO through their IDE as you described, but
>>>>>>>>>>>>>> that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>>>> need?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent here.
>>>>>>>>>>>>>> I have been warned many times by
>>>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to run the integration tests in both locations.
>>>>>>>>>>>>>>> They would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Visibility and autocomplete. It means the core class will be
>>>>>>>>>>>>>>> in the beam javadoc and if you type `import
>>>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>> (You helped me apply some change to this strange setup a
>>>>>>>>>>>>>>>> few months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not
>>>>>>>>>>>>>>>> trigger Beam repo's CI. You want to deliver things to your customers after
>>>>>>>>>>>>>>>> they are tested as much as possible.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The backward compatibility came to my mind but I thought
>>>>>>>>>>>>>>>> you may have more reasons.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM
>>>>>>>>>>>>>>>> (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem-
>>>>>>>>>>>>>>>>> wouldn't it override and cause it to use beam-sdks-java-core:2.30.0 (at
>>>>>>>>>>>>>>>>> least until beam goes to 3.X.X)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't
>>>>>>>>>>>>>>>>>> this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd
>>>>>>>>>>>>>>>>>>> like to get some feedback on a change to the model for hosting this I/O in
>>>>>>>>>>>>>>>>>>> beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and
>>>>>>>>>>>>>>>>>>> improve the I/O while retaining end-user visibility within the beam
>>>>>>>>>>>>>>>>>>> repo. To do this, I'd like to remove all the implementation from the beam
>>>>>>>>>>>>>>>>>>> repo, and leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and
>>>>>>>>>>>>>>>>>>> suggestions surrounding this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Andrew Pilloud <ap...@google.com>.
> I think this is premature perhaps: as far as I know, there is no such
plan in place for many other I/Os which exist in beam and pull in other
maven dependencies which may cause dependency conflicts. This should not
make such conflicts any worse.

To my knowledge there exists no other IO in Beam that depends on an
external library that also depends on Beam. That is where I believe you are
going into uncharted territory. I am fine with the plan being a full
rollback (or even just a copy of the external repo into Beam) if we hit
this problem. It sounds like you have it covered.

On Fri, Jul 2, 2021 at 8:22 PM Daniel Collins <dp...@google.com> wrote:

> Hi All,
>
> I'm on vacation, so sorry if I missed a lot of discussion here. Going to
> try to reply to a bunch of the comments in this thread:
>
> > One more question of my own: Do you expect pubsub lite io to continue to
> receive frequent updates in the long term? (For example, afaik pubsub io no
> longer needs or gets frequent updates.). If not, eventually keeping the io
> external might become irrelevant.
>
> We expect the I/O to receive updates over the short term to handle the
> availability of dataflow-runner-v2 for java without an opt-in allowlist,
> but in the longer term (1 year +) we don't expect to keep updating this
> significantly. However, one of the primary reasons we want to do this it to
> have the ability to respond to user requests and bug reports on shorter
> timetables- even if we don't plan to make changes, it's hard to predict
> either bug reports or feature requests.
>
> > The real issue is 3rd party dependency convergence and managing a BOM
> that works for your users.
>
> Agreed- however, our library <largely> uses the google cloud BOM (actually
> its underlying dependency list in google-shared-dependencies) for shared
> dependencies, so this should be mostly a non-issue, even more so if beam
> eventually moves to use the cloud BOM.
>
> > The core SDK does not depend on any IO (and we should keep it this way,
> for sure).
>
> Agreed.
>
> > I have to also push on whether we can do this the "normal" way: refer to
> it in docs, and have examples for users to copy/paste/modify that already
> includes the needed deps.
>
> We can, the primary issue with this is discoverability: the current
> expectation of users is that they pull in the google-cloud-platform
> artifact and get all I/Os available. Without the alias, we need to tell
> users to use the different artifact, and they may not know that they even
> need to look for the different artifact; instead assuming it doesn't exist.
> This is purely for U/X purposes.
>
> > 1. There needs to be tests in Beam to ensure the IO isn't broken by
> changes to Beam.
>
> Agreed, and in https://github.com/apache/beam/pull/15076 I've added one.
> If you have any more concrete suggestions for testing that would better
> ensure compatibility, I'd appreciate them.
>
> >  I think we also need a plan to back this out if it gets us in a bad
> state. For example, there is potentially a state where we need to make a
> change to Beam core (such as updating a dependency) but can't make it
> because it requires this IO to be recompiled.
>
> As stated above, nearly all (except flogger in the current state IIRC) are
> using the versions from the google-shared-dependencies BOM. A dependency
> version bump should not introduce a compatibility issue without also
> breaking many other google dependencies. I'd also note: this issue already
> exists, since the beam repo already needs to depend on the Pub/Sub Lite
> client library, which has 80+% of the code from this repo.
>
> > I think we also need a plan to back this out if it gets us in a bad state
>
> I think this is premature perhaps: as far as I know, there is no such plan
> in place for many other I/Os which exist in beam and pull in other maven
> dependencies which may cause dependency conflicts. This should not make
> such conflicts any worse.
>
> I don't think there's a good workaround for breaking changes in public
> beam API surfaces; if some hypothetical beam API changes the name of a
> method, and is used by Pub/Sub Lite, this would break compilation in the
> beam presubmit. However... I'm not entirely sure that's a bad thing? It
> would seem to be a fairly good change detector for breaking changes in the
> public API surfaces it uses. As long as Pub/Sub Lite doesn't use internal
> API surfaces, this should be fine. I don't intend to use internal API
> surfaces for this: indeed, any registrations that occur that use @Internal
> annotated surfaces I plan on leaving in the beam repo.
>
> I think the worst-case workaround is a clone back from our repo to the
> beam repo and acknowledgement that this deployment strategy doesn't work
> (full rollback): and I'm willing to take responsibility for doing this if
> it becomes necessary.
>
> -Daniel
>
>
> On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:
>
>> Just want to let everyone know that I'm drafting a doc for this. It will
>> be great to have both teams' reviews+sign-offs on a final decision. Thank
>> you all.
>>
>>
>>  PubsubLiteIO Release Strategy
>> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>>
>>
>> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> If Beam is dependent on a library that is also dependent on Beam it
>>> would be impossible to update dependencies in either. Beam is released as a
>>> single atomic unit, we can't decouple beam-sdks-java-core
>>> from beam-sdks-java-io-google-cloud-platform in our current release
>>> process. (This is different from existing external IOs which only depend on
>>> Beam.)
>>>
>>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> The reverse could also happen. If the IO needs a new version of core
>>>> GCP libraries, realistically it can't be updated until Beam itself has
>>>> updated its dependencies.
>>>>
>>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> To clarify my "circular dependency" concern, I may have used poor
>>>>> terminology to describe it. We have no tests to ensure we don't break
>>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>>> recompiled. To mitigate:
>>>>>
>>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>>> changes to Beam.
>>>>> 2. I think we also need a plan to back this out if it gets us in a bad
>>>>> state. For example, there is potentially a state where we need to make a
>>>>> change to Beam core (such as updating a dependency) but can't make it
>>>>> because it requires this IO to be recompiled. If this IO depends on a new
>>>>> Beam release to be recompiled this would be impossible. I don't want to
>>>>> push that friction down to Beam core developers.
>>>>>
>>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> To add to Luke's concern - compatibility of GCP libraries has been a
>>>>>> huge headache, and keeping GCP modules together helps at least a bit. It
>>>>>> has happened not infrequently that users experience incompatibility between
>>>>>> proto or grpc versions, because they link a library that wants one version
>>>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>>>> means that you as the package maintainer will have to deal with these
>>>>>> issues.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the goals are good:
>>>>>>>
>>>>>>>  - be able to release fixes quicker
>>>>>>>  - have users discover PubsubLiteIO
>>>>>>>
>>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>>> snapshot.
>>>>>>>
>>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and it
>>>>>>> is fine. Decoupled releases are the hard part. I've had a few discussions
>>>>>>> about decoupled releases within the same repo. It has all the same problems
>>>>>>> whether it is in the same repo or not. In some ways it is easier outside
>>>>>>> the repo because it removes the temptation to couple things too much. I
>>>>>>> think getting good version compatibility test matrix and benchmarking might
>>>>>>> be the big task here. And you'd want to have much more automation in the
>>>>>>> release. Incidentally, fixes already do not have to be coupled with an
>>>>>>> upgrade of all of Beam. You can have a different version for an IO. Or you
>>>>>>> can choose the snapshot just for an IO dep. The missing piece is just the
>>>>>>> testing mentioned. You want to be sure your new version of the IO is going
>>>>>>> to work with old versions of the core SDK.
>>>>>>>
>>>>>>> Regarding the circular dep; I agree that there should not be one: in
>>>>>>> your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>>>
>>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>>> start from that.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>>
>>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is there any possibility of changing the build cadence allowing
>>>>>>>>> for builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>>>> this thread?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Evan
>>>>>>>>>
>>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I wouldn't say this is uncharted territory as there are Apache
>>>>>>>>>> Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>>
>>>>>>>>>> The most annoying aspects will be the versioning story, i.e.
>>>>>>>>>> users will want to use the library with different versions of Apache Beam
>>>>>>>>>> since some people won't want to upgrade since they have something working
>>>>>>>>>> and others will want it against the latest version since they want some
>>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>>> users.
>>>>>>>>>>
>>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> I think you are in a better place to make this decision. You are
>>>>>>>>>>> the primary contributor and maintainer for this IO and you clearly know the
>>>>>>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>>>>>>> action I will support that.
>>>>>>>>>>>
>>>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>>> that happens.
>>>>>>>>>>>
>>>>>>>>>>> I like that this model still allows discoverability through Beam
>>>>>>>>>>> and by default supports an out of the box tested version already. I guess
>>>>>>>>>>> that will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>>>>>>
>>>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>>>
>>>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>>>> question?
>>>>>>>>>>>
>>>>>>>>>>> Ahmet
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>>
>>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If
>>>>>>>>>>>> they get the one subject to the long release cycle, that's usually okay,
>>>>>>>>>>>> unless they need recently added features/fixes. Pub/Sub Lite's
>>>>>>>>>>>> documentation will state to prefer the one from our artifact, but the
>>>>>>>>>>>> expectation is the one in beam will work fine in recent releases.
>>>>>>>>>>>>
>>>>>>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>> need?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>
>>>>>>>>>>>> An interesting side effect of subclassing in this way is that
>>>>>>>>>>>> if the user adds a newer version of the PubsubLiteIO
>>>>>>>>>>>> implementation-specific artifact in their pom, they won't actually need to
>>>>>>>>>>>> make any code changes: the beam PubsubLiteIO will transparently refer to
>>>>>>>>>>>> the new implementation version.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> How will this be communicated to the user? The idea is that
>>>>>>>>>>>>> they will discover PubsubLiteIO through their IDE as you described, but
>>>>>>>>>>>>> that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>>> need?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>>>>>
>>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent here. I
>>>>>>>>>>>>> have been warned many times by
>>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Visibility and autocomplete. It means the core class will be
>>>>>>>>>>>>>> in the beam javadoc and if you type `import
>>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The backward compatibility came to my mind but I thought you
>>>>>>>>>>>>>>> may have more reasons.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM
>>>>>>>>>>>>>>> (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't
>>>>>>>>>>>>>>>> it override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't
>>>>>>>>>>>>>>>>> this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd
>>>>>>>>>>>>>>>>>> like to get some feedback on a change to the model for hosting this I/O in
>>>>>>>>>>>>>>>>>> beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve
>>>>>>>>>>>>>>>>>> the I/O while retaining end-user visibility within the beam repo. To do
>>>>>>>>>>>>>>>>>> this, I'd like to remove all the implementation from the beam repo, and
>>>>>>>>>>>>>>>>>> leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and
>>>>>>>>>>>>>>>>>> suggestions surrounding this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Daniel Collins <dp...@google.com>.
Hi All,

I'm on vacation, so sorry if I missed a lot of discussion here. Going to
try to reply to a bunch of the comments in this thread:

> One more question of my own: Do you expect pubsub lite io to continue to
receive frequent updates in the long term? (For example, afaik pubsub io no
longer needs or gets frequent updates.). If not, eventually keeping the io
external might become irrelevant.

We expect the I/O to receive updates over the short term to handle the
availability of dataflow-runner-v2 for java without an opt-in allowlist,
but in the longer term (1 year +) we don't expect to keep updating this
significantly. However, one of the primary reasons we want to do this it to
have the ability to respond to user requests and bug reports on shorter
timetables- even if we don't plan to make changes, it's hard to predict
either bug reports or feature requests.

> The real issue is 3rd party dependency convergence and managing a BOM
that works for your users.

Agreed- however, our library <largely> uses the google cloud BOM (actually
its underlying dependency list in google-shared-dependencies) for shared
dependencies, so this should be mostly a non-issue, even more so if beam
eventually moves to use the cloud BOM.

> The core SDK does not depend on any IO (and we should keep it this way,
for sure).

Agreed.

> I have to also push on whether we can do this the "normal" way: refer to
it in docs, and have examples for users to copy/paste/modify that already
includes the needed deps.

We can, the primary issue with this is discoverability: the current
expectation of users is that they pull in the google-cloud-platform
artifact and get all I/Os available. Without the alias, we need to tell
users to use the different artifact, and they may not know that they even
need to look for the different artifact; instead assuming it doesn't exist.
This is purely for U/X purposes.

> 1. There needs to be tests in Beam to ensure the IO isn't broken by
changes to Beam.

Agreed, and in https://github.com/apache/beam/pull/15076 I've added one. If
you have any more concrete suggestions for testing that would better ensure
compatibility, I'd appreciate them.

>  I think we also need a plan to back this out if it gets us in a bad
state. For example, there is potentially a state where we need to make a
change to Beam core (such as updating a dependency) but can't make it
because it requires this IO to be recompiled.

As stated above, nearly all (except flogger in the current state IIRC) are
using the versions from the google-shared-dependencies BOM. A dependency
version bump should not introduce a compatibility issue without also
breaking many other google dependencies. I'd also note: this issue already
exists, since the beam repo already needs to depend on the Pub/Sub Lite
client library, which has 80+% of the code from this repo.

> I think we also need a plan to back this out if it gets us in a bad state

I think this is premature perhaps: as far as I know, there is no such plan
in place for many other I/Os which exist in beam and pull in other maven
dependencies which may cause dependency conflicts. This should not make
such conflicts any worse.

I don't think there's a good workaround for breaking changes in public beam
API surfaces; if some hypothetical beam API changes the name of a method,
and is used by Pub/Sub Lite, this would break compilation in the beam
presubmit. However... I'm not entirely sure that's a bad thing? It would
seem to be a fairly good change detector for breaking changes in the public
API surfaces it uses. As long as Pub/Sub Lite doesn't use internal API
surfaces, this should be fine. I don't intend to use internal API surfaces
for this: indeed, any registrations that occur that use @Internal annotated
surfaces I plan on leaving in the beam repo.

I think the worst-case workaround is a clone back from our repo to the beam
repo and acknowledgement that this deployment strategy doesn't work (full
rollback): and I'm willing to take responsibility for doing this if it
becomes necessary.

-Daniel


On Fri, Jul 2, 2021 at 8:24 PM Tianzi Cai <ti...@google.com> wrote:

> Just want to let everyone know that I'm drafting a doc for this. It will
> be great to have both teams' reviews+sign-offs on a final decision. Thank
> you all.
>
>
>  PubsubLiteIO Release Strategy
> <https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>
>
>
> On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> If Beam is dependent on a library that is also dependent on Beam it would
>> be impossible to update dependencies in either. Beam is released as a
>> single atomic unit, we can't decouple beam-sdks-java-core
>> from beam-sdks-java-io-google-cloud-platform in our current release
>> process. (This is different from existing external IOs which only depend on
>> Beam.)
>>
>> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>>
>>> The reverse could also happen. If the IO needs a new version of core GCP
>>> libraries, realistically it can't be updated until Beam itself has updated
>>> its dependencies.
>>>
>>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> To clarify my "circular dependency" concern, I may have used poor
>>>> terminology to describe it. We have no tests to ensure we don't break
>>>> binary compatibility between versions of Beam. There is no guarantee that a
>>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>>> recompiled. To mitigate:
>>>>
>>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>>> changes to Beam.
>>>> 2. I think we also need a plan to back this out if it gets us in a bad
>>>> state. For example, there is potentially a state where we need to make a
>>>> change to Beam core (such as updating a dependency) but can't make it
>>>> because it requires this IO to be recompiled. If this IO depends on a new
>>>> Beam release to be recompiled this would be impossible. I don't want to
>>>> push that friction down to Beam core developers.
>>>>
>>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> To add to Luke's concern - compatibility of GCP libraries has been a
>>>>> huge headache, and keeping GCP modules together helps at least a bit. It
>>>>> has happened not infrequently that users experience incompatibility between
>>>>> proto or grpc versions, because they link a library that wants one version
>>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>>> means that you as the package maintainer will have to deal with these
>>>>> issues.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I think the goals are good:
>>>>>>
>>>>>>  - be able to release fixes quicker
>>>>>>  - have users discover PubsubLiteIO
>>>>>>
>>>>>> Just to clarify a little - a user currently has to depend on
>>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>>> org.apache.beam:beam-runners-direct-java,
>>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>>> snapshot.
>>>>>>
>>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and it
>>>>>> is fine. Decoupled releases are the hard part. I've had a few discussions
>>>>>> about decoupled releases within the same repo. It has all the same problems
>>>>>> whether it is in the same repo or not. In some ways it is easier outside
>>>>>> the repo because it removes the temptation to couple things too much. I
>>>>>> think getting good version compatibility test matrix and benchmarking might
>>>>>> be the big task here. And you'd want to have much more automation in the
>>>>>> release. Incidentally, fixes already do not have to be coupled with an
>>>>>> upgrade of all of Beam. You can have a different version for an IO. Or you
>>>>>> can choose the snapshot just for an IO dep. The missing piece is just the
>>>>>> testing mentioned. You want to be sure your new version of the IO is going
>>>>>> to work with old versions of the core SDK.
>>>>>>
>>>>>> Regarding the circular dep; I agree that there should not be one: in
>>>>>> your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>>
>>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>>> are integrated with our build system rather than being standalone, but it
>>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>>> something, including the working pom.xml, and I expect most users would
>>>>>> start from that.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> There already is a nightly snapshot that users can use.
>>>>>>>
>>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is there any possibility of changing the build cadence allowing for
>>>>>>>> builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>>> this thread?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Evan
>>>>>>>>
>>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I wouldn't say this is uncharted territory as there are Apache
>>>>>>>>> Beam IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>>
>>>>>>>>> The most annoying aspects will be the versioning story, i.e. users
>>>>>>>>> will want to use the library with different versions of Apache Beam since
>>>>>>>>> some people won't want to upgrade since they have something working and
>>>>>>>>> others will want it against the latest version since they want some
>>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>>> users.
>>>>>>>>>
>>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>>
>>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> I think you are in a better place to make this decision. You are
>>>>>>>>>> the primary contributor and maintainer for this IO and you clearly know the
>>>>>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>>>>>> action I will support that.
>>>>>>>>>>
>>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>>> that happens.
>>>>>>>>>>
>>>>>>>>>> I like that this model still allows discoverability through Beam
>>>>>>>>>> and by default supports an out of the box tested version already. I guess
>>>>>>>>>> that will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>>>>>
>>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>>
>>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>>> question?
>>>>>>>>>>
>>>>>>>>>> Ahmet
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>>
>>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they
>>>>>>>>>>> get the one subject to the long release cycle, that's usually okay, unless
>>>>>>>>>>> they need recently added features/fixes. Pub/Sub Lite's documentation will
>>>>>>>>>>> state to prefer the one from our artifact, but the expectation is the one
>>>>>>>>>>> in beam will work fine in recent releases.
>>>>>>>>>>>
>>>>>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>> need?
>>>>>>>>>>>
>>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>
>>>>>>>>>>> An interesting side effect of subclassing in this way is that if
>>>>>>>>>>> the user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>>>>>>> version.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How will this be communicated to the user? The idea is that
>>>>>>>>>>>> they will discover PubsubLiteIO through their IDE as you described, but
>>>>>>>>>>>> that will get them to the Beam one that's subject to the long release
>>>>>>>>>>>> cycle. Will it just be documented somewhere that users should prefer
>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>>> need?
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>>>>
>>>>>>>>>>>> junit's matcher interface comes to mind as a precedent here. I
>>>>>>>>>>>> have been warned many times by
>>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>>
>>>>>>>>>>>> Brian
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of
>>>>>>>>>>>>> keeping the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Visibility and autocomplete. It means the core class will be
>>>>>>>>>>>>> in the beam javadoc and if you type `import
>>>>>>>>>>>>> org.apache.beam.sdk.io.gcp.pubsu` in an IDE you'll see pubsublite and
>>>>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <
>>>>>>>>>>>>> suztomo@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The backward compatibility came to my mind but I thought you
>>>>>>>>>>>>>> may have more reasons.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM
>>>>>>>>>>>>>> (yet) because of its pre-1.0 status.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't
>>>>>>>>>>>>>>> it override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't
>>>>>>>>>>>>>>>> this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd
>>>>>>>>>>>>>>>>> like to get some feedback on a change to the model for hosting this I/O in
>>>>>>>>>>>>>>>>> beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve
>>>>>>>>>>>>>>>>> the I/O while retaining end-user visibility within the beam repo. To do
>>>>>>>>>>>>>>>>> this, I'd like to remove all the implementation from the beam repo, and
>>>>>>>>>>>>>>>>> leave the I/O there implemented as:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Tianzi Cai <ti...@google.com>.
Just want to let everyone know that I'm drafting a doc for this. It will be
great to have both teams' reviews+sign-offs on a final decision. Thank you
all.


 PubsubLiteIO Release Strategy
<https://docs.google.com/document/d/1oTc0m3dQlrSMXGvUEhQ_GUBJM_IxLpf-6HnOLtDtxHw/edit?usp=drive_web>


On Fri, Jul 2, 2021 at 11:22 AM Andrew Pilloud <ap...@google.com> wrote:

> If Beam is dependent on a library that is also dependent on Beam it would
> be impossible to update dependencies in either. Beam is released as a
> single atomic unit, we can't decouple beam-sdks-java-core
> from beam-sdks-java-io-google-cloud-platform in our current release
> process. (This is different from existing external IOs which only depend on
> Beam.)
>
> On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:
>
>> The reverse could also happen. If the IO needs a new version of core GCP
>> libraries, realistically it can't be updated until Beam itself has updated
>> its dependencies.
>>
>> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> To clarify my "circular dependency" concern, I may have used poor
>>> terminology to describe it. We have no tests to ensure we don't break
>>> binary compatibility between versions of Beam. There is no guarantee that a
>>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>>> recompiled. To mitigate:
>>>
>>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>>> changes to Beam.
>>> 2. I think we also need a plan to back this out if it gets us in a bad
>>> state. For example, there is potentially a state where we need to make a
>>> change to Beam core (such as updating a dependency) but can't make it
>>> because it requires this IO to be recompiled. If this IO depends on a new
>>> Beam release to be recompiled this would be impossible. I don't want to
>>> push that friction down to Beam core developers.
>>>
>>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> To add to Luke's concern - compatibility of GCP libraries has been a
>>>> huge headache, and keeping GCP modules together helps at least a bit. It
>>>> has happened not infrequently that users experience incompatibility between
>>>> proto or grpc versions, because they link a library that wants one version
>>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>>> means that you as the package maintainer will have to deal with these
>>>> issues.
>>>>
>>>> Reuven
>>>>
>>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> I think the goals are good:
>>>>>
>>>>>  - be able to release fixes quicker
>>>>>  - have users discover PubsubLiteIO
>>>>>
>>>>> Just to clarify a little - a user currently has to depend on
>>>>> (probably) org.apache.beam:beam-sdks-java-core,
>>>>> org.apache.beam:beam-runners-direct-java,
>>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>>> snapshot.
>>>>>
>>>>> As Luke mentioned, IOs outside of the Beam repo already exist and it
>>>>> is fine. Decoupled releases are the hard part. I've had a few discussions
>>>>> about decoupled releases within the same repo. It has all the same problems
>>>>> whether it is in the same repo or not. In some ways it is easier outside
>>>>> the repo because it removes the temptation to couple things too much. I
>>>>> think getting good version compatibility test matrix and benchmarking might
>>>>> be the big task here. And you'd want to have much more automation in the
>>>>> release. Incidentally, fixes already do not have to be coupled with an
>>>>> upgrade of all of Beam. You can have a different version for an IO. Or you
>>>>> can choose the snapshot just for an IO dep. The missing piece is just the
>>>>> testing mentioned. You want to be sure your new version of the IO is going
>>>>> to work with old versions of the core SDK.
>>>>>
>>>>> Regarding the circular dep; I agree that there should not be one: in
>>>>> your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>>
>>>>> But in addition to Reuven's simple idea, I have to also push on
>>>>> whether we can do this the "normal" way: refer to it in docs, and have
>>>>> examples for users to copy/paste/modify that already includes the needed
>>>>> deps. Our current example pipelines do not serve this purpose because they
>>>>> are integrated with our build system rather than being standalone, but it
>>>>> is very easy to make an example "PubsubLite to blobstore" pipeline or
>>>>> something, including the working pom.xml, and I expect most users would
>>>>> start from that.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> There already is a nightly snapshot that users can use.
>>>>>>
>>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is there any possibility of changing the build cadence allowing for
>>>>>>> builds released as alpha versions or similar? It’s not too uncommon for
>>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>>> this thread?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Evan
>>>>>>>
>>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>>>>>>> IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>>
>>>>>>>> The most annoying aspects will be the versioning story, i.e. users
>>>>>>>> will want to use the library with different versions of Apache Beam since
>>>>>>>> some people won't want to upgrade since they have something working and
>>>>>>>> others will want it against the latest version since they want some
>>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>>> users.
>>>>>>>>
>>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>>
>>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> I think you are in a better place to make this decision. You are
>>>>>>>>> the primary contributor and maintainer for this IO and you clearly know the
>>>>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>>>>> action I will support that.
>>>>>>>>>
>>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>>> that happens.
>>>>>>>>>
>>>>>>>>> I like that this model still allows discoverability through Beam
>>>>>>>>> and by default supports an out of the box tested version already. I guess
>>>>>>>>> that will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>>>>
>>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>>
>>>>>>>>> What do you need from this community to make progress on this
>>>>>>>>> question?
>>>>>>>>>
>>>>>>>>> Ahmet
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>>
>>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they
>>>>>>>>>> get the one subject to the long release cycle, that's usually okay, unless
>>>>>>>>>> they need recently added features/fixes. Pub/Sub Lite's documentation will
>>>>>>>>>> state to prefer the one from our artifact, but the expectation is the one
>>>>>>>>>> in beam will work fine in recent releases.
>>>>>>>>>>
>>>>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>> need?
>>>>>>>>>>
>>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>>> PubsubLiteIO.
>>>>>>>>>>
>>>>>>>>>> An interesting side effect of subclassing in this way is that if
>>>>>>>>>> the user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>>>>>> version.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How will this be communicated to the user? The idea is that they
>>>>>>>>>>> will discover PubsubLiteIO through their IDE as you described, but that
>>>>>>>>>>> will get them to the Beam one that's subject to the long release cycle.
>>>>>>>>>>> Will it just be documented somewhere that users should prefer
>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>>> need?
>>>>>>>>>>>
>>>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>>>
>>>>>>>>>>> junit's matcher interface comes to mind as a precedent here. I
>>>>>>>>>>> have been warned many times by
>>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>>
>>>>>>>>>>>> > Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>
>>>>>>>>>>>> Visibility and autocomplete. It means the core class will be in
>>>>>>>>>>>> the beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu`
>>>>>>>>>>>> in an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>> ````
>>>>>>>>>>>>>
>>>>>>>>>>>>> The backward compatibility came to my mind but I thought you
>>>>>>>>>>>>> may have more reasons.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My memo:
>>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>>>>>>> because of its pre-1.0 status.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't
>>>>>>>>>>>>>> it override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't
>>>>>>>>>>>>>>> this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd
>>>>>>>>>>>>>>>> like to get some feedback on a change to the model for hosting this I/O in
>>>>>>>>>>>>>>>> beam. Our team has been frustrated by the fact that we have no way to
>>>>>>>>>>>>>>>> release features or fixes for bugs to customers on time scales shorter than
>>>>>>>>>>>>>>>> the 1-2 months of the beam release cycle, and that those fixes are
>>>>>>>>>>>>>>>> necessarily coupled with a beam version upgrade. To work around this, I
>>>>>>>>>>>>>>>> forked the I/O in beam to our own repo about 6 months ago and have been
>>>>>>>>>>>>>>>> maintaining both copies in parallel.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve
>>>>>>>>>>>>>>>> the I/O while retaining end-user visibility within the beam repo. To do
>>>>>>>>>>>>>>>> this, I'd like to remove all the implementation from the beam repo, and
>>>>>>>>>>>>>>>> leave the I/O there implemented as:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Tomo
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Andrew Pilloud <ap...@google.com>.
If Beam is dependent on a library that is also dependent on Beam it would
be impossible to update dependencies in either. Beam is released as a
single atomic unit, we can't decouple beam-sdks-java-core
from beam-sdks-java-io-google-cloud-platform in our current release
process. (This is different from existing external IOs which only depend on
Beam.)

On Fri, Jul 2, 2021 at 11:11 AM Reuven Lax <re...@google.com> wrote:

> The reverse could also happen. If the IO needs a new version of core GCP
> libraries, realistically it can't be updated until Beam itself has updated
> its dependencies.
>
> On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> To clarify my "circular dependency" concern, I may have used poor
>> terminology to describe it. We have no tests to ensure we don't break
>> binary compatibility between versions of Beam. There is no guarantee that a
>> jar compiled against Beam 2.30 will work against Beam 2.31 without being
>> recompiled. To mitigate:
>>
>> 1. There needs to be tests in Beam to ensure the IO isn't broken by
>> changes to Beam.
>> 2. I think we also need a plan to back this out if it gets us in a bad
>> state. For example, there is potentially a state where we need to make a
>> change to Beam core (such as updating a dependency) but can't make it
>> because it requires this IO to be recompiled. If this IO depends on a new
>> Beam release to be recompiled this would be impossible. I don't want to
>> push that friction down to Beam core developers.
>>
>> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>>
>>> To add to Luke's concern - compatibility of GCP libraries has been a
>>> huge headache, and keeping GCP modules together helps at least a bit. It
>>> has happened not infrequently that users experience incompatibility between
>>> proto or grpc versions, because they link a library that wants one version
>>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>>> means that you as the package maintainer will have to deal with these
>>> issues.
>>>
>>> Reuven
>>>
>>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> I think the goals are good:
>>>>
>>>>  - be able to release fixes quicker
>>>>  - have users discover PubsubLiteIO
>>>>
>>>> Just to clarify a little - a user currently has to depend on (probably)
>>>> org.apache.beam:beam-sdks-java-core,
>>>> org.apache.beam:beam-runners-direct-java,
>>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>>> snapshot.
>>>>
>>>> As Luke mentioned, IOs outside of the Beam repo already exist and it is
>>>> fine. Decoupled releases are the hard part. I've had a few discussions
>>>> about decoupled releases within the same repo. It has all the same problems
>>>> whether it is in the same repo or not. In some ways it is easier outside
>>>> the repo because it removes the temptation to couple things too much. I
>>>> think getting good version compatibility test matrix and benchmarking might
>>>> be the big task here. And you'd want to have much more automation in the
>>>> release. Incidentally, fixes already do not have to be coupled with an
>>>> upgrade of all of Beam. You can have a different version for an IO. Or you
>>>> can choose the snapshot just for an IO dep. The missing piece is just the
>>>> testing mentioned. You want to be sure your new version of the IO is going
>>>> to work with old versions of the core SDK.
>>>>
>>>> Regarding the circular dep; I agree that there should not be one: in
>>>> your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>>> not depend on any IO (and we should keep it this way, for sure).
>>>>
>>>> But in addition to Reuven's simple idea, I have to also push on whether
>>>> we can do this the "normal" way: refer to it in docs, and have examples for
>>>> users to copy/paste/modify that already includes the needed deps. Our
>>>> current example pipelines do not serve this purpose because they are
>>>> integrated with our build system rather than being standalone, but it is
>>>> very easy to make an example "PubsubLite to blobstore" pipeline or
>>>> something, including the working pom.xml, and I expect most users would
>>>> start from that.
>>>>
>>>> Kenn
>>>>
>>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> There already is a nightly snapshot that users can use.
>>>>>
>>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Is there any possibility of changing the build cadence allowing for
>>>>>> builds released as alpha versions or similar? It’s not too uncommon for
>>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>>> this thread?
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>>>>>> IOs[1] that live outside of the Apache Beam git repo.
>>>>>>>
>>>>>>> The most annoying aspects will be the versioning story, i.e. users
>>>>>>> will want to use the library with different versions of Apache Beam since
>>>>>>> some people won't want to upgrade since they have something working and
>>>>>>> others will want it against the latest version since they want some
>>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>>> users.
>>>>>>>
>>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>>
>>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> I think you are in a better place to make this decision. You are
>>>>>>>> the primary contributor and maintainer for this IO and you clearly know the
>>>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>>>> action I will support that.
>>>>>>>>
>>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>>> that happens.
>>>>>>>>
>>>>>>>> I like that this model still allows discoverability through Beam
>>>>>>>> and by default supports an out of the box tested version already. I guess
>>>>>>>> that will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>>>
>>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>>> keeping the io external might become irrelevant.
>>>>>>>>
>>>>>>>> What do you need from this community to make progress on this
>>>>>>>> question?
>>>>>>>>
>>>>>>>> Ahmet
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>
>>>>>>>>> > How will this be communicated to the user?
>>>>>>>>>
>>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they
>>>>>>>>> get the one subject to the long release cycle, that's usually okay, unless
>>>>>>>>> they need recently added features/fixes. Pub/Sub Lite's documentation will
>>>>>>>>> state to prefer the one from our artifact, but the expectation is the one
>>>>>>>>> in beam will work fine in recent releases.
>>>>>>>>>
>>>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>> need?
>>>>>>>>>
>>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>>> PubsubLiteIO.
>>>>>>>>>
>>>>>>>>> An interesting side effect of subclassing in this way is that if
>>>>>>>>> the user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>>>>> version.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How will this be communicated to the user? The idea is that they
>>>>>>>>>> will discover PubsubLiteIO through their IDE as you described, but that
>>>>>>>>>> will get them to the Beam one that's subject to the long release cycle.
>>>>>>>>>> Will it just be documented somewhere that users should prefer
>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>>> need?
>>>>>>>>>>
>>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>>
>>>>>>>>>> junit's matcher interface comes to mind as a precedent here. I
>>>>>>>>>> have been warned many times by
>>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>
>>>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>>
>>>>>>>>>>> > Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>
>>>>>>>>>>> Visibility and autocomplete. It means the core class will be in
>>>>>>>>>>> the beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu`
>>>>>>>>>>> in an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>>
>>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>>
>>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>> ````
>>>>>>>>>>>>
>>>>>>>>>>>> The backward compatibility came to my mind but I thought you
>>>>>>>>>>>> may have more reasons.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> My memo:
>>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>>> beam repo has:
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>>>>>> because of its pre-1.0 status.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't
>>>>>>>>>>>>>> this end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like
>>>>>>>>>>>>>>> to get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve
>>>>>>>>>>>>>>> the I/O while retaining end-user visibility within the beam repo. To do
>>>>>>>>>>>>>>> this, I'd like to remove all the implementation from the beam repo, and
>>>>>>>>>>>>>>> leave the I/O there implemented as:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Tomo
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Reuven Lax <re...@google.com>.
The reverse could also happen. If the IO needs a new version of core GCP
libraries, realistically it can't be updated until Beam itself has updated
its dependencies.

On Fri, Jul 2, 2021 at 11:02 AM Andrew Pilloud <ap...@google.com> wrote:

> To clarify my "circular dependency" concern, I may have used poor
> terminology to describe it. We have no tests to ensure we don't break
> binary compatibility between versions of Beam. There is no guarantee that a
> jar compiled against Beam 2.30 will work against Beam 2.31 without being
> recompiled. To mitigate:
>
> 1. There needs to be tests in Beam to ensure the IO isn't broken by
> changes to Beam.
> 2. I think we also need a plan to back this out if it gets us in a bad
> state. For example, there is potentially a state where we need to make a
> change to Beam core (such as updating a dependency) but can't make it
> because it requires this IO to be recompiled. If this IO depends on a new
> Beam release to be recompiled this would be impossible. I don't want to
> push that friction down to Beam core developers.
>
> On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:
>
>> To add to Luke's concern - compatibility of GCP libraries has been a huge
>> headache, and keeping GCP modules together helps at least a bit. It has
>> happened not infrequently that users experience incompatibility between
>> proto or grpc versions, because they link a library that wants one version
>> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
>> means that you as the package maintainer will have to deal with these
>> issues.
>>
>> Reuven
>>
>> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I think the goals are good:
>>>
>>>  - be able to release fixes quicker
>>>  - have users discover PubsubLiteIO
>>>
>>> Just to clarify a little - a user currently has to depend on (probably)
>>> org.apache.beam:beam-sdks-java-core,
>>> org.apache.beam:beam-runners-direct-java,
>>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>>> anyhow. So the proposal is almost entirely to avoid the user having to add
>>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>>> snapshot.
>>>
>>> As Luke mentioned, IOs outside of the Beam repo already exist and it is
>>> fine. Decoupled releases are the hard part. I've had a few discussions
>>> about decoupled releases within the same repo. It has all the same problems
>>> whether it is in the same repo or not. In some ways it is easier outside
>>> the repo because it removes the temptation to couple things too much. I
>>> think getting good version compatibility test matrix and benchmarking might
>>> be the big task here. And you'd want to have much more automation in the
>>> release. Incidentally, fixes already do not have to be coupled with an
>>> upgrade of all of Beam. You can have a different version for an IO. Or you
>>> can choose the snapshot just for an IO dep. The missing piece is just the
>>> testing mentioned. You want to be sure your new version of the IO is going
>>> to work with old versions of the core SDK.
>>>
>>> Regarding the circular dep; I agree that there should not be one: in
>>> your proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform
>>> depends on com.google.pubsublite:google-beam-pubsublite, and both of those
>>> modules depend on org.apache.beam:beam-sdks-java-core. The core SDK does
>>> not depend on any IO (and we should keep it this way, for sure).
>>>
>>> But in addition to Reuven's simple idea, I have to also push on whether
>>> we can do this the "normal" way: refer to it in docs, and have examples for
>>> users to copy/paste/modify that already includes the needed deps. Our
>>> current example pipelines do not serve this purpose because they are
>>> integrated with our build system rather than being standalone, but it is
>>> very easy to make an example "PubsubLite to blobstore" pipeline or
>>> something, including the working pom.xml, and I expect most users would
>>> start from that.
>>>
>>> Kenn
>>>
>>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> There already is a nightly snapshot that users can use.
>>>>
>>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>>> wrote:
>>>>
>>>>> Is there any possibility of changing the build cadence allowing for
>>>>> builds released as alpha versions or similar? It’s not too uncommon for
>>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>>> this thread?
>>>>>
>>>>> Thanks,
>>>>> Evan
>>>>>
>>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>>>>> IOs[1] that live outside of the Apache Beam git repo.
>>>>>>
>>>>>> The most annoying aspects will be the versioning story, i.e. users
>>>>>> will want to use the library with different versions of Apache Beam since
>>>>>> some people won't want to upgrade since they have something working and
>>>>>> others will want it against the latest version since they want some
>>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>>> users.
>>>>>>
>>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>>
>>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> I think you are in a better place to make this decision. You are the
>>>>>>> primary contributor and maintainer for this IO and you clearly know the
>>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>>> action I will support that.
>>>>>>>
>>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>>> questions raised here are about support, testing, discoverability,
>>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>>> that happens.
>>>>>>>
>>>>>>> I like that this model still allows discoverability through Beam and
>>>>>>> by default supports an out of the box tested version already. I guess that
>>>>>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>>
>>>>>>> One more question of my own: Do you expect pubsub lite io to
>>>>>>> continue to receive frequent updates in the long term? (For example, afaik
>>>>>>> pubsub io no longer needs or gets frequent updates.). If not, eventually
>>>>>>> keeping the io external might become irrelevant.
>>>>>>>
>>>>>>> What do you need from this community to make progress on this
>>>>>>> question?
>>>>>>>
>>>>>>> Ahmet
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <
>>>>>>> dpcollins@google.com> wrote:
>>>>>>>
>>>>>>>> > How will this be communicated to the user?
>>>>>>>>
>>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they
>>>>>>>> get the one subject to the long release cycle, that's usually okay, unless
>>>>>>>> they need recently added features/fixes. Pub/Sub Lite's documentation will
>>>>>>>> state to prefer the one from our artifact, but the expectation is the one
>>>>>>>> in beam will work fine in recent releases.
>>>>>>>>
>>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>> need?
>>>>>>>>
>>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>>> PubsubLiteIO.
>>>>>>>>
>>>>>>>> An interesting side effect of subclassing in this way is that if
>>>>>>>> the user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>>>> version.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How will this be communicated to the user? The idea is that they
>>>>>>>>> will discover PubsubLiteIO through their IDE as you described, but that
>>>>>>>>> will get them to the Beam one that's subject to the long release cycle.
>>>>>>>>> Will it just be documented somewhere that users should prefer
>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>>> need?
>>>>>>>>>
>>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>>> com.google.cloud one somehow?
>>>>>>>>>
>>>>>>>>> junit's matcher interface comes to mind as a precedent here. I
>>>>>>>>> have been warned many times by
>>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>> tested as much as possible.
>>>>>>>>>>
>>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>>> version bump on the I/O.
>>>>>>>>>>
>>>>>>>>>> > Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>
>>>>>>>>>> Visibility and autocomplete. It means the core class will be in
>>>>>>>>>> the beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu`
>>>>>>>>>> in an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>>
>>>>>>>>>>> I like that idea overall.
>>>>>>>>>>>
>>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger
>>>>>>>>>>> Beam repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>>> tested as much as possible.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Question2 : in the code below, what is the purpose of keeping
>>>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>> ````
>>>>>>>>>>>
>>>>>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>>>>>> have more reasons.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> My memo:
>>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>>> beam repo has:
>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>>>>> because of its pre-1.0 status.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>>
>>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> How do you plan to address the circular dependency? Won't this
>>>>>>>>>>>>> end up with Beam depending on older versions of itself?
>>>>>>>>>>>>>
>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like
>>>>>>>>>>>>>> to get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the
>>>>>>>>>>>>>> I/O while retaining end-user visibility within the beam repo. To do this,
>>>>>>>>>>>>>> I'd like to remove all the implementation from the beam repo, and leave the
>>>>>>>>>>>>>> I/O there implemented as:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>>> ````
>>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> Tomo
>>>>>>>>>>>
>>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Andrew Pilloud <ap...@google.com>.
To clarify my "circular dependency" concern, I may have used poor
terminology to describe it. We have no tests to ensure we don't break
binary compatibility between versions of Beam. There is no guarantee that a
jar compiled against Beam 2.30 will work against Beam 2.31 without being
recompiled. To mitigate:

1. There needs to be tests in Beam to ensure the IO isn't broken by changes
to Beam.
2. I think we also need a plan to back this out if it gets us in a bad
state. For example, there is potentially a state where we need to make a
change to Beam core (such as updating a dependency) but can't make it
because it requires this IO to be recompiled. If this IO depends on a new
Beam release to be recompiled this would be impossible. I don't want to
push that friction down to Beam core developers.

On Fri, Jul 2, 2021 at 10:38 AM Reuven Lax <re...@google.com> wrote:

> To add to Luke's concern - compatibility of GCP libraries has been a huge
> headache, and keeping GCP modules together helps at least a bit. It has
> happened not infrequently that users experience incompatibility between
> proto or grpc versions, because they link a library that wants one version
> and Beam depends on another version. Moving PubsubLiteIO outside of Beam
> means that you as the package maintainer will have to deal with these
> issues.
>
> Reuven
>
> On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I think the goals are good:
>>
>>  - be able to release fixes quicker
>>  - have users discover PubsubLiteIO
>>
>> Just to clarify a little - a user currently has to depend on (probably)
>> org.apache.beam:beam-sdks-java-core,
>> org.apache.beam:beam-runners-direct-java,
>> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
>> runner. Without the GCP IO dependency, there will be no IDE autocomplete
>> anyhow. So the proposal is almost entirely to avoid the user having to add
>> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
>> snapshot.
>>
>> As Luke mentioned, IOs outside of the Beam repo already exist and it is
>> fine. Decoupled releases are the hard part. I've had a few discussions
>> about decoupled releases within the same repo. It has all the same problems
>> whether it is in the same repo or not. In some ways it is easier outside
>> the repo because it removes the temptation to couple things too much. I
>> think getting good version compatibility test matrix and benchmarking might
>> be the big task here. And you'd want to have much more automation in the
>> release. Incidentally, fixes already do not have to be coupled with an
>> upgrade of all of Beam. You can have a different version for an IO. Or you
>> can choose the snapshot just for an IO dep. The missing piece is just the
>> testing mentioned. You want to be sure your new version of the IO is going
>> to work with old versions of the core SDK.
>>
>> Regarding the circular dep; I agree that there should not be one: in your
>> proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform depends
>> on com.google.pubsublite:google-beam-pubsublite, and both of those modules
>> depend on org.apache.beam:beam-sdks-java-core. The core SDK does not depend
>> on any IO (and we should keep it this way, for sure).
>>
>> But in addition to Reuven's simple idea, I have to also push on whether
>> we can do this the "normal" way: refer to it in docs, and have examples for
>> users to copy/paste/modify that already includes the needed deps. Our
>> current example pipelines do not serve this purpose because they are
>> integrated with our build system rather than being standalone, but it is
>> very easy to make an example "PubsubLite to blobstore" pipeline or
>> something, including the working pom.xml, and I expect most users would
>> start from that.
>>
>> Kenn
>>
>> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>>
>>> There already is a nightly snapshot that users can use.
>>>
>>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com>
>>> wrote:
>>>
>>>> Is there any possibility of changing the build cadence allowing for
>>>> builds released as alpha versions or similar? It’s not too uncommon for
>>>> projects to have nightly builds for example. Could that help deliver fixes
>>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>>> this thread?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>>>> IOs[1] that live outside of the Apache Beam git repo.
>>>>>
>>>>> The most annoying aspects will be the versioning story, i.e. users
>>>>> will want to use the library with different versions of Apache Beam since
>>>>> some people won't want to upgrade since they have something working and
>>>>> others will want it against the latest version since they want some
>>>>> feature. Apache Beam has had a pretty good track record of maintaining API
>>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>>> users.
>>>>>
>>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>>
>>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I think you are in a better place to make this decision. You are the
>>>>>> primary contributor and maintainer for this IO and you clearly know the
>>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>>> action I will support that.
>>>>>>
>>>>>> That said, afaik you are moving into uncharted territory. The
>>>>>> questions raised here are about support, testing, discoverability,
>>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>>> them (or some other unknowns) might still become problematic and result in
>>>>>> user confusion and frustration and you will have to address those if/when
>>>>>> that happens.
>>>>>>
>>>>>> I like that this model still allows discoverability through Beam and
>>>>>> by default supports an out of the box tested version already. I guess that
>>>>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>>
>>>>>> One more question of my own: Do you expect pubsub lite io to continue
>>>>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>>>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>>>>> io external might become irrelevant.
>>>>>>
>>>>>> What do you need from this community to make progress on this
>>>>>> question?
>>>>>>
>>>>>> Ahmet
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > How will this be communicated to the user?
>>>>>>>
>>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they get
>>>>>>> the one subject to the long release cycle, that's usually okay, unless they
>>>>>>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>>>>>>> to prefer the one from our artifact, but the expectation is the one in beam
>>>>>>> will work fine in recent releases.
>>>>>>>
>>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>> need?
>>>>>>>
>>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>>> PubsubLiteIO.
>>>>>>>
>>>>>>> An interesting side effect of subclassing in this way is that if the
>>>>>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>>> version.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How will this be communicated to the user? The idea is that they
>>>>>>>> will discover PubsubLiteIO through their IDE as you described, but that
>>>>>>>> will get them to the Beam one that's subject to the long release cycle.
>>>>>>>> Will it just be documented somewhere that users should prefer
>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>>> need?
>>>>>>>>
>>>>>>>> I wonder if a similar result could be achieved just by making
>>>>>>>> Beam's PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>>> com.google.cloud one somehow?
>>>>>>>>
>>>>>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>>>>>> been warned many times by
>>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <
>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>
>>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>>> tested as much as possible.
>>>>>>>>>
>>>>>>>>> I'd like to run the integration tests in both locations. They
>>>>>>>>> would only be meaningful in the beam setup when we went to validate a
>>>>>>>>> version bump on the I/O.
>>>>>>>>>
>>>>>>>>> > Question2 : in the code below, what is the purpose of keeping
>>>>>>>>> the PubsubLiteIO in the Beam repo?
>>>>>>>>>
>>>>>>>>> Visibility and autocomplete. It means the core class will be in
>>>>>>>>> the beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu`
>>>>>>>>> in an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>>
>>>>>>>>>> I like that idea overall.
>>>>>>>>>>
>>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>>>> tested as much as possible.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>> ````
>>>>>>>>>>
>>>>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>>>>> have more reasons.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My memo:
>>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>>> beam repo has:
>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>>> (and other files in the same directory)
>>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>>>> because of its pre-1.0 status.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>>
>>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How do you plan to address the circular dependency? Won't this
>>>>>>>>>>>> end up with Beam depending on older versions of itself?
>>>>>>>>>>>>
>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like
>>>>>>>>>>>>> to get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the
>>>>>>>>>>>>> I/O while retaining end-user visibility within the beam repo. To do this,
>>>>>>>>>>>>> I'd like to remove all the implementation from the beam repo, and leave the
>>>>>>>>>>>>> I/O there implemented as:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>>> ````
>>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Tomo
>>>>>>>>>>
>>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Reuven Lax <re...@google.com>.
To add to Luke's concern - compatibility of GCP libraries has been a huge
headache, and keeping GCP modules together helps at least a bit. It has
happened not infrequently that users experience incompatibility between
proto or grpc versions, because they link a library that wants one version
and Beam depends on another version. Moving PubsubLiteIO outside of Beam
means that you as the package maintainer will have to deal with these
issues.

Reuven

On Fri, Jul 2, 2021 at 10:31 AM Kenneth Knowles <ke...@apache.org> wrote:

> I think the goals are good:
>
>  - be able to release fixes quicker
>  - have users discover PubsubLiteIO
>
> Just to clarify a little - a user currently has to depend on (probably)
> org.apache.beam:beam-sdks-java-core,
> org.apache.beam:beam-runners-direct-java,
> org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
> runner. Without the GCP IO dependency, there will be no IDE autocomplete
> anyhow. So the proposal is almost entirely to avoid the user having to add
> com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
> snapshot.
>
> As Luke mentioned, IOs outside of the Beam repo already exist and it is
> fine. Decoupled releases are the hard part. I've had a few discussions
> about decoupled releases within the same repo. It has all the same problems
> whether it is in the same repo or not. In some ways it is easier outside
> the repo because it removes the temptation to couple things too much. I
> think getting good version compatibility test matrix and benchmarking might
> be the big task here. And you'd want to have much more automation in the
> release. Incidentally, fixes already do not have to be coupled with an
> upgrade of all of Beam. You can have a different version for an IO. Or you
> can choose the snapshot just for an IO dep. The missing piece is just the
> testing mentioned. You want to be sure your new version of the IO is going
> to work with old versions of the core SDK.
>
> Regarding the circular dep; I agree that there should not be one: in your
> proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform depends
> on com.google.pubsublite:google-beam-pubsublite, and both of those modules
> depend on org.apache.beam:beam-sdks-java-core. The core SDK does not depend
> on any IO (and we should keep it this way, for sure).
>
> But in addition to Reuven's simple idea, I have to also push on whether we
> can do this the "normal" way: refer to it in docs, and have examples for
> users to copy/paste/modify that already includes the needed deps. Our
> current example pipelines do not serve this purpose because they are
> integrated with our build system rather than being standalone, but it is
> very easy to make an example "PubsubLite to blobstore" pipeline or
> something, including the working pom.xml, and I expect most users would
> start from that.
>
> Kenn
>
> On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:
>
>> There already is a nightly snapshot that users can use.
>>
>> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com> wrote:
>>
>>> Is there any possibility of changing the build cadence allowing for
>>> builds released as alpha versions or similar? It’s not too uncommon for
>>> projects to have nightly builds for example. Could that help deliver fixes
>>> more quickly to customers, while also avoiding the nuisances mentioned in
>>> this thread?
>>>
>>> Thanks,
>>> Evan
>>>
>>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>>
>>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>>> IOs[1] that live outside of the Apache Beam git repo.
>>>>
>>>> The most annoying aspects will be the versioning story, i.e. users will
>>>> want to use the library with different versions of Apache Beam since some
>>>> people won't want to upgrade since they have something working and others
>>>> will want it against the latest version since they want some feature.
>>>> Apache Beam has had a pretty good track record of maintaining API
>>>> compatibility so I wouldn't be too worried about that. The real issue is
>>>> 3rd party dependency convergence and managing a BOM that works for your
>>>> users.
>>>>
>>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>>
>>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> I think you are in a better place to make this decision. You are the
>>>>> primary contributor and maintainer for this IO and you clearly know the
>>>>> pubsub lite user base as well. If you think this is the best course of
>>>>> action I will support that.
>>>>>
>>>>> That said, afaik you are moving into uncharted territory. The
>>>>> questions raised here are about support, testing, discoverability,
>>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>>> this model. You might have good answers to them, but nevertheless some of
>>>>> them (or some other unknowns) might still become problematic and result in
>>>>> user confusion and frustration and you will have to address those if/when
>>>>> that happens.
>>>>>
>>>>> I like that this model still allows discoverability through Beam and
>>>>> by default supports an out of the box tested version already. I guess that
>>>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>>>> model will, as you predict, give you a quick way to address user requests.
>>>>>
>>>>> One more question of my own: Do you expect pubsub lite io to continue
>>>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>>>> io external might become irrelevant.
>>>>>
>>>>> What do you need from this community to make progress on this question?
>>>>>
>>>>> Ahmet
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> > How will this be communicated to the user?
>>>>>>
>>>>>> The docstring on PubsubLiteIO in beam will mention this. If they get
>>>>>> the one subject to the long release cycle, that's usually okay, unless they
>>>>>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>>>>>> to prefer the one from our artifact, but the expectation is the one in beam
>>>>>> will work fine in recent releases.
>>>>>>
>>>>>> > Will it just be documented somewhere that users should prefer
>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>> need?
>>>>>>
>>>>>> Yes, both in our public docs and the docstring for the beam
>>>>>> PubsubLiteIO.
>>>>>>
>>>>>> An interesting side effect of subclassing in this way is that if the
>>>>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>>> version.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How will this be communicated to the user? The idea is that they
>>>>>>> will discover PubsubLiteIO through their IDE as you described, but that
>>>>>>> will get them to the Beam one that's subject to the long release cycle.
>>>>>>> Will it just be documented somewhere that users should prefer
>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>>> need?
>>>>>>>
>>>>>>> I wonder if a similar result could be achieved just by making Beam's
>>>>>>> PubsubLiteIO a stub with no implementation that directs users to the
>>>>>>> com.google.cloud one somehow?
>>>>>>>
>>>>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>>>>> been warned many times by
>>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>> tested as much as possible.
>>>>>>>>
>>>>>>>> I'd like to run the integration tests in both locations. They would
>>>>>>>> only be meaningful in the beam setup when we went to validate a version
>>>>>>>> bump on the I/O.
>>>>>>>>
>>>>>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>>
>>>>>>>> Visibility and autocomplete. It means the core class will be in the
>>>>>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in
>>>>>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>> (You helped me apply some change to this strange setup a few
>>>>>>>>> months back. Thank you for working on rectifying the situation.)
>>>>>>>>>
>>>>>>>>> I like that idea overall.
>>>>>>>>>
>>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>>> tested as much as possible.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>> ````
>>>>>>>>>
>>>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>>>> have more reasons.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My memo:
>>>>>>>>> java-pubsublite repsitory has:
>>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>>> beam repo has:
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>>> (and other files in the same directory)
>>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>>> because of its pre-1.0 status.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>>
>>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How do you plan to address the circular dependency? Won't this
>>>>>>>>>>> end up with Beam depending on older versions of itself?
>>>>>>>>>>>
>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to
>>>>>>>>>>>> get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the
>>>>>>>>>>>> I/O while retaining end-user visibility within the beam repo. To do this,
>>>>>>>>>>>> I'd like to remove all the implementation from the beam repo, and leave the
>>>>>>>>>>>> I/O there implemented as:
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>>> ````
>>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>>
>>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>>> surrounding this.
>>>>>>>>>>>>
>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Tomo
>>>>>>>>>
>>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Kenneth Knowles <ke...@apache.org>.
I think the goals are good:

 - be able to release fixes quicker
 - have users discover PubsubLiteIO

Just to clarify a little - a user currently has to depend on (probably)
org.apache.beam:beam-sdks-java-core,
org.apache.beam:beam-runners-direct-java,
org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
runner. Without the GCP IO dependency, there will be no IDE autocomplete
anyhow. So the proposal is almost entirely to avoid the user having to add
com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
snapshot.

As Luke mentioned, IOs outside of the Beam repo already exist and it is
fine. Decoupled releases are the hard part. I've had a few discussions
about decoupled releases within the same repo. It has all the same problems
whether it is in the same repo or not. In some ways it is easier outside
the repo because it removes the temptation to couple things too much. I
think getting good version compatibility test matrix and benchmarking might
be the big task here. And you'd want to have much more automation in the
release. Incidentally, fixes already do not have to be coupled with an
upgrade of all of Beam. You can have a different version for an IO. Or you
can choose the snapshot just for an IO dep. The missing piece is just the
testing mentioned. You want to be sure your new version of the IO is going
to work with old versions of the core SDK.

Regarding the circular dep; I agree that there should not be one: in your
proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform depends
on com.google.pubsublite:google-beam-pubsublite, and both of those modules
depend on org.apache.beam:beam-sdks-java-core. The core SDK does not depend
on any IO (and we should keep it this way, for sure).

But in addition to Reuven's simple idea, I have to also push on whether we
can do this the "normal" way: refer to it in docs, and have examples for
users to copy/paste/modify that already includes the needed deps. Our
current example pipelines do not serve this purpose because they are
integrated with our build system rather than being standalone, but it is
very easy to make an example "PubsubLite to blobstore" pipeline or
something, including the working pom.xml, and I expect most users would
start from that.

Kenn

On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <re...@google.com> wrote:

> There already is a nightly snapshot that users can use.
>
> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com> wrote:
>
>> Is there any possibility of changing the build cadence allowing for
>> builds released as alpha versions or similar? It’s not too uncommon for
>> projects to have nightly builds for example. Could that help deliver fixes
>> more quickly to customers, while also avoiding the nuisances mentioned in
>> this thread?
>>
>> Thanks,
>> Evan
>>
>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>>
>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>> IOs[1] that live outside of the Apache Beam git repo.
>>>
>>> The most annoying aspects will be the versioning story, i.e. users will
>>> want to use the library with different versions of Apache Beam since some
>>> people won't want to upgrade since they have something working and others
>>> will want it against the latest version since they want some feature.
>>> Apache Beam has had a pretty good track record of maintaining API
>>> compatibility so I wouldn't be too worried about that. The real issue is
>>> 3rd party dependency convergence and managing a BOM that works for your
>>> users.
>>>
>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>
>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I think you are in a better place to make this decision. You are the
>>>> primary contributor and maintainer for this IO and you clearly know the
>>>> pubsub lite user base as well. If you think this is the best course of
>>>> action I will support that.
>>>>
>>>> That said, afaik you are moving into uncharted territory. The
>>>> questions raised here are about support, testing, discoverability,
>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>> this model. You might have good answers to them, but nevertheless some of
>>>> them (or some other unknowns) might still become problematic and result in
>>>> user confusion and frustration and you will have to address those if/when
>>>> that happens.
>>>>
>>>> I like that this model still allows discoverability through Beam and by
>>>> default supports an out of the box tested version already. I guess that
>>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>>> model will, as you predict, give you a quick way to address user requests.
>>>>
>>>> One more question of my own: Do you expect pubsub lite io to continue
>>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>>> io external might become irrelevant.
>>>>
>>>> What do you need from this community to make progress on this question?
>>>>
>>>> Ahmet
>>>>
>>>>
>>>>
>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
>>>> wrote:
>>>>
>>>>> > How will this be communicated to the user?
>>>>>
>>>>> The docstring on PubsubLiteIO in beam will mention this. If they get
>>>>> the one subject to the long release cycle, that's usually okay, unless they
>>>>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>>>>> to prefer the one from our artifact, but the expectation is the one in beam
>>>>> will work fine in recent releases.
>>>>>
>>>>> > Will it just be documented somewhere that users should prefer
>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>> need?
>>>>>
>>>>> Yes, both in our public docs and the docstring for the beam
>>>>> PubsubLiteIO.
>>>>>
>>>>> An interesting side effect of subclassing in this way is that if the
>>>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>> version.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> How will this be communicated to the user? The idea is that they will
>>>>>> discover PubsubLiteIO through their IDE as you described, but that will get
>>>>>> them to the Beam one that's subject to the long release cycle. Will it just
>>>>>> be documented somewhere that users should prefer
>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>>> need?
>>>>>>
>>>>>> I wonder if a similar result could be achieved just by making Beam's
>>>>>> PubsubLiteIO a stub with no implementation that directs users to the
>>>>>> com.google.cloud one somehow?
>>>>>>
>>>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>>>> been warned many times by
>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>
>>>>>> [1]
>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>> tested as much as possible.
>>>>>>>
>>>>>>> I'd like to run the integration tests in both locations. They would
>>>>>>> only be meaningful in the beam setup when we went to validate a version
>>>>>>> bump on the I/O.
>>>>>>>
>>>>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>
>>>>>>> Visibility and autocomplete. It means the core class will be in the
>>>>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in
>>>>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>> (You helped me apply some change to this strange setup a few months
>>>>>>>> back. Thank you for working on rectifying the situation.)
>>>>>>>>
>>>>>>>> I like that idea overall.
>>>>>>>>
>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>> tested as much as possible.
>>>>>>>>
>>>>>>>>
>>>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>>
>>>>>>>> ```
>>>>>>>> class PubsubLiteIO extends
>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>> ````
>>>>>>>>
>>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>>> have more reasons.
>>>>>>>>
>>>>>>>>
>>>>>>>> My memo:
>>>>>>>> java-pubsublite repsitory has:
>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>> beam repo has:
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>> (and other files in the same directory)
>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>> because of its pre-1.0 status.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>
>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>
>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> How do you plan to address the circular dependency? Won't this
>>>>>>>>>> end up with Beam depending on older versions of itself?
>>>>>>>>>>
>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>
>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to
>>>>>>>>>>> get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>
>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the
>>>>>>>>>>> I/O while retaining end-user visibility within the beam repo. To do this,
>>>>>>>>>>> I'd like to remove all the implementation from the beam repo, and leave the
>>>>>>>>>>> I/O there implemented as:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>> ````
>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>
>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>
>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>> surrounding this.
>>>>>>>>>>>
>>>>>>>>>>> -Daniel
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Tomo
>>>>>>>>
>>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Reuven Lax <re...@google.com>.
There already is a nightly snapshot that users can use.

On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <ev...@gmail.com> wrote:

> Is there any possibility of changing the build cadence allowing for builds
> released as alpha versions or similar? It’s not too uncommon for projects
> to have nightly builds for example. Could that help deliver fixes more
> quickly to customers, while also avoiding the nuisances mentioned in this
> thread?
>
> Thanks,
> Evan
>
> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:
>
>> I wouldn't say this is uncharted territory as there are Apache Beam
>> IOs[1] that live outside of the Apache Beam git repo.
>>
>> The most annoying aspects will be the versioning story, i.e. users will
>> want to use the library with different versions of Apache Beam since some
>> people won't want to upgrade since they have something working and others
>> will want it against the latest version since they want some feature.
>> Apache Beam has had a pretty good track record of maintaining API
>> compatibility so I wouldn't be too worried about that. The real issue is
>> 3rd party dependency convergence and managing a BOM that works for your
>> users.
>>
>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>
>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> Hi Daniel,
>>>
>>> I think you are in a better place to make this decision. You are the
>>> primary contributor and maintainer for this IO and you clearly know the
>>> pubsub lite user base as well. If you think this is the best course of
>>> action I will support that.
>>>
>>> That said, afaik you are moving into uncharted territory. The
>>> questions raised here are about support, testing, discoverability,
>>> compatibility, potential circular dependencies issues are all unknowns for
>>> this model. You might have good answers to them, but nevertheless some of
>>> them (or some other unknowns) might still become problematic and result in
>>> user confusion and frustration and you will have to address those if/when
>>> that happens.
>>>
>>> I like that this model still allows discoverability through Beam and by
>>> default supports an out of the box tested version already. I guess that
>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>> model will, as you predict, give you a quick way to address user requests.
>>>
>>> One more question of my own: Do you expect pubsub lite io to continue to
>>> receive frequent updates in the long term? (For example, afaik pubsub io no
>>> longer needs or gets frequent updates.). If not, eventually keeping the io
>>> external might become irrelevant.
>>>
>>> What do you need from this community to make progress on this question?
>>>
>>> Ahmet
>>>
>>>
>>>
>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> > How will this be communicated to the user?
>>>>
>>>> The docstring on PubsubLiteIO in beam will mention this. If they get
>>>> the one subject to the long release cycle, that's usually okay, unless they
>>>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>>>> to prefer the one from our artifact, but the expectation is the one in beam
>>>> will work fine in recent releases.
>>>>
>>>> > Will it just be documented somewhere that users should prefer
>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>> need?
>>>>
>>>> Yes, both in our public docs and the docstring for the beam
>>>> PubsubLiteIO.
>>>>
>>>> An interesting side effect of subclassing in this way is that if the
>>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>>> artifact in their pom, they won't actually need to make any code changes:
>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>> version.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> How will this be communicated to the user? The idea is that they will
>>>>> discover PubsubLiteIO through their IDE as you described, but that will get
>>>>> them to the Beam one that's subject to the long release cycle. Will it just
>>>>> be documented somewhere that users should prefer
>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>> need?
>>>>>
>>>>> I wonder if a similar result could be achieved just by making Beam's
>>>>> PubsubLiteIO a stub with no implementation that directs users to the
>>>>> com.google.cloud one somehow?
>>>>>
>>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>>> been warned many times by
>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>
>>>>> [1]
>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>
>>>>> Brian
>>>>>
>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>> tested as much as possible.
>>>>>>
>>>>>> I'd like to run the integration tests in both locations. They would
>>>>>> only be meaningful in the beam setup when we went to validate a version
>>>>>> bump on the I/O.
>>>>>>
>>>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>
>>>>>> Visibility and autocomplete. It means the core class will be in the
>>>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in
>>>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>> (You helped me apply some change to this strange setup a few months
>>>>>>> back. Thank you for working on rectifying the situation.)
>>>>>>>
>>>>>>> I like that idea overall.
>>>>>>>
>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>> tested as much as possible.
>>>>>>>
>>>>>>>
>>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>
>>>>>>> ```
>>>>>>> class PubsubLiteIO extends
>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>> ````
>>>>>>>
>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>> have more reasons.
>>>>>>>
>>>>>>>
>>>>>>> My memo:
>>>>>>> java-pubsublite repsitory has:
>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>> beam repo has:
>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>> (and other files in the same directory)
>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>> because of its pre-1.0 status.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>
>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How do you plan to address the circular dependency? Won't this end
>>>>>>>>> up with Beam depending on older versions of itself?
>>>>>>>>>
>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello beam developers,
>>>>>>>>>>
>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to
>>>>>>>>>> get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>>> copies in parallel.
>>>>>>>>>>
>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>>>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>>>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>>>>>>> there implemented as:
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>> ````
>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>
>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>
>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>> surrounding this.
>>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Tomo
>>>>>>>
>>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Evan Galpin <ev...@gmail.com>.
Is there any possibility of changing the build cadence allowing for builds
released as alpha versions or similar? It’s not too uncommon for projects
to have nightly builds for example. Could that help deliver fixes more
quickly to customers, while also avoiding the nuisances mentioned in this
thread?

Thanks,
Evan

On Thu, Jul 1, 2021 at 12:39 Luke Cwik <lc...@google.com> wrote:

> I wouldn't say this is uncharted territory as there are Apache Beam IOs[1]
> that live outside of the Apache Beam git repo.
>
> The most annoying aspects will be the versioning story, i.e. users will
> want to use the library with different versions of Apache Beam since some
> people won't want to upgrade since they have something working and others
> will want it against the latest version since they want some feature.
> Apache Beam has had a pretty good track record of maintaining API
> compatibility so I wouldn't be too worried about that. The real issue is
> 3rd party dependency convergence and managing a BOM that works for your
> users.
>
> 1: https://github.com/SolaceProducts/solace-apache-beam
>
> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:
>
>> Hi Daniel,
>>
>> I think you are in a better place to make this decision. You are the
>> primary contributor and maintainer for this IO and you clearly know the
>> pubsub lite user base as well. If you think this is the best course of
>> action I will support that.
>>
>> That said, afaik you are moving into uncharted territory. The
>> questions raised here are about support, testing, discoverability,
>> compatibility, potential circular dependencies issues are all unknowns for
>> this model. You might have good answers to them, but nevertheless some of
>> them (or some other unknowns) might still become problematic and result in
>> user confusion and frustration and you will have to address those if/when
>> that happens.
>>
>> I like that this model still allows discoverability through Beam and by
>> default supports an out of the box tested version already. I guess that
>> will be good enough for most beam + pubsub lite users.  And I hope the
>> model will, as you predict, give you a quick way to address user requests.
>>
>> One more question of my own: Do you expect pubsub lite io to continue to
>> receive frequent updates in the long term? (For example, afaik pubsub io no
>> longer needs or gets frequent updates.). If not, eventually keeping the io
>> external might become irrelevant.
>>
>> What do you need from this community to make progress on this question?
>>
>> Ahmet
>>
>>
>>
>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> > How will this be communicated to the user?
>>>
>>> The docstring on PubsubLiteIO in beam will mention this. If they get the
>>> one subject to the long release cycle, that's usually okay, unless they
>>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>>> to prefer the one from our artifact, but the expectation is the one in beam
>>> will work fine in recent releases.
>>>
>>> > Will it just be documented somewhere that users should prefer
>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>> need?
>>>
>>> Yes, both in our public docs and the docstring for the beam PubsubLiteIO.
>>>
>>> An interesting side effect of subclassing in this way is that if the
>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>> artifact in their pom, they won't actually need to make any code changes:
>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>> version.
>>>
>>>
>>>
>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> How will this be communicated to the user? The idea is that they will
>>>> discover PubsubLiteIO through their IDE as you described, but that will get
>>>> them to the Beam one that's subject to the long release cycle. Will it just
>>>> be documented somewhere that users should prefer
>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>> need?
>>>>
>>>> I wonder if a similar result could be achieved just by making Beam's
>>>> PubsubLiteIO a stub with no implementation that directs users to the
>>>> com.google.cloud one somehow?
>>>>
>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>> been warned many times by
>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>
>>>> [1]
>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>
>>>> Brian
>>>>
>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>>>> wrote:
>>>>
>>>>> > Question 1: How are you going to approach testing/CI?
>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>> tested as much as possible.
>>>>>
>>>>> I'd like to run the integration tests in both locations. They would
>>>>> only be meaningful in the beam setup when we went to validate a version
>>>>> bump on the I/O.
>>>>>
>>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>>> PubsubLiteIO in the Beam repo?
>>>>>
>>>>> Visibility and autocomplete. It means the core class will be in the
>>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in
>>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>
>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Daniel,
>>>>>> (You helped me apply some change to this strange setup a few months
>>>>>> back. Thank you for working on rectifying the situation.)
>>>>>>
>>>>>> I like that idea overall.
>>>>>>
>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>> tested as much as possible.
>>>>>>
>>>>>>
>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>
>>>>>> ```
>>>>>> class PubsubLiteIO extends
>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>> ````
>>>>>>
>>>>>> The backward compatibility came to my mind but I thought you may have
>>>>>> more reasons.
>>>>>>
>>>>>>
>>>>>> My memo:
>>>>>> java-pubsublite repsitory has:
>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>> beam repo has:
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>> (and other files in the same directory)
>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>> because of its pre-1.0 status.
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>>> beam goes to 3.X.X)?
>>>>>>>
>>>>>>> Something we can do if this is an issue is mark pubsublite-beam-io's
>>>>>>> dep on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and
>>>>>>> just let overriding fix it if that works.
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How do you plan to address the circular dependency? Won't this end
>>>>>>>> up with Beam depending on older versions of itself?
>>>>>>>>
>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>> dpcollins@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hello beam developers,
>>>>>>>>>
>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to
>>>>>>>>> get some feedback on a change to the model for hosting this I/O in beam.
>>>>>>>>> Our team has been frustrated by the fact that we have no way to release
>>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>>> copies in parallel.
>>>>>>>>>
>>>>>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>>>>>> there implemented as:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>> ````
>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>
>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>
>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>> surrounding this.
>>>>>>>>>
>>>>>>>>> -Daniel
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Tomo
>>>>>>
>>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Luke Cwik <lc...@google.com>.
I wouldn't say this is uncharted territory as there are Apache Beam IOs[1]
that live outside of the Apache Beam git repo.

The most annoying aspects will be the versioning story, i.e. users will
want to use the library with different versions of Apache Beam since some
people won't want to upgrade since they have something working and others
will want it against the latest version since they want some feature.
Apache Beam has had a pretty good track record of maintaining API
compatibility so I wouldn't be too worried about that. The real issue is
3rd party dependency convergence and managing a BOM that works for your
users.

1: https://github.com/SolaceProducts/solace-apache-beam

On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <al...@google.com> wrote:

> Hi Daniel,
>
> I think you are in a better place to make this decision. You are the
> primary contributor and maintainer for this IO and you clearly know the
> pubsub lite user base as well. If you think this is the best course of
> action I will support that.
>
> That said, afaik you are moving into uncharted territory. The
> questions raised here are about support, testing, discoverability,
> compatibility, potential circular dependencies issues are all unknowns for
> this model. You might have good answers to them, but nevertheless some of
> them (or some other unknowns) might still become problematic and result in
> user confusion and frustration and you will have to address those if/when
> that happens.
>
> I like that this model still allows discoverability through Beam and by
> default supports an out of the box tested version already. I guess that
> will be good enough for most beam + pubsub lite users.  And I hope the
> model will, as you predict, give you a quick way to address user requests.
>
> One more question of my own: Do you expect pubsub lite io to continue to
> receive frequent updates in the long term? (For example, afaik pubsub io no
> longer needs or gets frequent updates.). If not, eventually keeping the io
> external might become irrelevant.
>
> What do you need from this community to make progress on this question?
>
> Ahmet
>
>
>
> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
> wrote:
>
>> > How will this be communicated to the user?
>>
>> The docstring on PubsubLiteIO in beam will mention this. If they get the
>> one subject to the long release cycle, that's usually okay, unless they
>> need recently added features/fixes. Pub/Sub Lite's documentation will state
>> to prefer the one from our artifact, but the expectation is the one in beam
>> will work fine in recent releases.
>>
>> > Will it just be documented somewhere that users should prefer
>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>> need?
>>
>> Yes, both in our public docs and the docstring for the beam PubsubLiteIO.
>>
>> An interesting side effect of subclassing in this way is that if the user
>> adds a newer version of the PubsubLiteIO implementation-specific artifact
>> in their pom, they won't actually need to make any code changes: the beam
>> PubsubLiteIO will transparently refer to the new implementation version.
>>
>>
>>
>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> How will this be communicated to the user? The idea is that they will
>>> discover PubsubLiteIO through their IDE as you described, but that will get
>>> them to the Beam one that's subject to the long release cycle. Will it just
>>> be documented somewhere that users should prefer
>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>> need?
>>>
>>> I wonder if a similar result could be achieved just by making Beam's
>>> PubsubLiteIO a stub with no implementation that directs users to the
>>> com.google.cloud one somehow?
>>>
>>> junit's matcher interface comes to mind as a precedent here. I have been
>>> warned many times by
>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>
>>> [1]
>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>
>>> Brian
>>>
>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> > Question 1: How are you going to approach testing/CI?
>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>> repo's CI. You want to deliver things to your customers after they are
>>>> tested as much as possible.
>>>>
>>>> I'd like to run the integration tests in both locations. They would
>>>> only be meaningful in the beam setup when we went to validate a version
>>>> bump on the I/O.
>>>>
>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>> PubsubLiteIO in the Beam repo?
>>>>
>>>> Visibility and autocomplete. It means the core class will be in the
>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in
>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>
>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com> wrote:
>>>>
>>>>> Hi Daniel,
>>>>> (You helped me apply some change to this strange setup a few months
>>>>> back. Thank you for working on rectifying the situation.)
>>>>>
>>>>> I like that idea overall.
>>>>>
>>>>> Question 1: How are you going to approach testing/CI?
>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>> tested as much as possible.
>>>>>
>>>>>
>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>> PubsubLiteIO in the Beam repo?
>>>>>
>>>>> ```
>>>>> class PubsubLiteIO extends
>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>> ````
>>>>>
>>>>> The backward compatibility came to my mind but I thought you may have
>>>>> more reasons.
>>>>>
>>>>>
>>>>> My memo:
>>>>> java-pubsublite repsitory has:
>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>> beam repo has:
>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>> (and other files in the same directory)
>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet) because
>>>>> of its pre-1.0 status.
>>>>>
>>>>>
>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>>> beam goes to 3.X.X)?
>>>>>>
>>>>>> Something we can do if this is an issue is mark pubsublite-beam-io's
>>>>>> dep on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and
>>>>>> just let overriding fix it if that works.
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How do you plan to address the circular dependency? Won't this end
>>>>>>> up with Beam depending on older versions of itself?
>>>>>>>
>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>> dpcollins@google.com> wrote:
>>>>>>>
>>>>>>>> Hello beam developers,
>>>>>>>>
>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get
>>>>>>>> some feedback on a change to the model for hosting this I/O in beam. Our
>>>>>>>> team has been frustrated by the fact that we have no way to release
>>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>>> copies in parallel.
>>>>>>>>
>>>>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>>>>> there implemented as:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> class PubsubLiteIO extends
>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>> ````
>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>
>>>>>>>> This enables beam users who want to just use the
>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>
>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>> surrounding this.
>>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Tomo
>>>>>
>>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Ahmet Altay <al...@google.com>.
Hi Daniel,

I think you are in a better place to make this decision. You are the
primary contributor and maintainer for this IO and you clearly know the
pubsub lite user base as well. If you think this is the best course of
action I will support that.

That said, afaik you are moving into uncharted territory. The
questions raised here are about support, testing, discoverability,
compatibility, potential circular dependencies issues are all unknowns for
this model. You might have good answers to them, but nevertheless some of
them (or some other unknowns) might still become problematic and result in
user confusion and frustration and you will have to address those if/when
that happens.

I like that this model still allows discoverability through Beam and by
default supports an out of the box tested version already. I guess that
will be good enough for most beam + pubsub lite users.  And I hope the
model will, as you predict, give you a quick way to address user requests.

One more question of my own: Do you expect pubsub lite io to continue to
receive frequent updates in the long term? (For example, afaik pubsub io no
longer needs or gets frequent updates.). If not, eventually keeping the io
external might become irrelevant.

What do you need from this community to make progress on this question?

Ahmet



On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <dp...@google.com>
wrote:

> > How will this be communicated to the user?
>
> The docstring on PubsubLiteIO in beam will mention this. If they get the
> one subject to the long release cycle, that's usually okay, unless they
> need recently added features/fixes. Pub/Sub Lite's documentation will state
> to prefer the one from our artifact, but the expectation is the one in beam
> will work fine in recent releases.
>
> > Will it just be documented somewhere that users should prefer
> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
> need?
>
> Yes, both in our public docs and the docstring for the beam PubsubLiteIO.
>
> An interesting side effect of subclassing in this way is that if the user
> adds a newer version of the PubsubLiteIO implementation-specific artifact
> in their pom, they won't actually need to make any code changes: the beam
> PubsubLiteIO will transparently refer to the new implementation version.
>
>
>
> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com> wrote:
>
>> How will this be communicated to the user? The idea is that they will
>> discover PubsubLiteIO through their IDE as you described, but that will get
>> them to the Beam one that's subject to the long release cycle. Will it just
>> be documented somewhere that users should prefer
>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>> need?
>>
>> I wonder if a similar result could be achieved just by making Beam's
>> PubsubLiteIO a stub with no implementation that directs users to the
>> com.google.cloud one somehow?
>>
>> junit's matcher interface comes to mind as a precedent here. I have been
>> warned many times by
>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>
>> [1]
>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>
>> Brian
>>
>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> > Question 1: How are you going to approach testing/CI?
>>> The pull requests in the java-pubsublite repo do not trigger Beam repo's
>>> CI. You want to deliver things to your customers after they are tested as
>>> much as possible.
>>>
>>> I'd like to run the integration tests in both locations. They would only
>>> be meaningful in the beam setup when we went to validate a version bump on
>>> the I/O.
>>>
>>> > Question2 : in the code below, what is the purpose of keeping the
>>> PubsubLiteIO in the Beam repo?
>>>
>>> Visibility and autocomplete. It means the core class will be in the beam
>>> javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in an IDE
>>> you'll see pubsublite and PubsubLiteIO.
>>>
>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com> wrote:
>>>
>>>> Hi Daniel,
>>>> (You helped me apply some change to this strange setup a few months
>>>> back. Thank you for working on rectifying the situation.)
>>>>
>>>> I like that idea overall.
>>>>
>>>> Question 1: How are you going to approach testing/CI?
>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>> repo's CI. You want to deliver things to your customers after they are
>>>> tested as much as possible.
>>>>
>>>>
>>>> Question2 : in the code below, what is the purpose of keeping the
>>>> PubsubLiteIO in the Beam repo?
>>>>
>>>> ```
>>>> class PubsubLiteIO extends
>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>> ````
>>>>
>>>> The backward compatibility came to my mind but I thought you may have
>>>> more reasons.
>>>>
>>>>
>>>> My memo:
>>>> java-pubsublite repsitory has:
>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>> beam repo has:
>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>> (and other files in the same directory)
>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet) because
>>>> of its pre-1.0 status.
>>>>
>>>>
>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>>>> wrote:
>>>>
>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least until
>>>>> beam goes to 3.X.X)?
>>>>>
>>>>> Something we can do if this is an issue is mark pubsublite-beam-io's
>>>>> dep on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and
>>>>> just let overriding fix it if that works.
>>>>>
>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> How do you plan to address the circular dependency? Won't this end up
>>>>>> with Beam depending on older versions of itself?
>>>>>>
>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello beam developers,
>>>>>>>
>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get
>>>>>>> some feedback on a change to the model for hosting this I/O in beam. Our
>>>>>>> team has been frustrated by the fact that we have no way to release
>>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>>> copies in parallel.
>>>>>>>
>>>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>>>> there implemented as:
>>>>>>>
>>>>>>> ```
>>>>>>> class PubsubLiteIO extends
>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>> ````
>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>
>>>>>>> This enables beam users who want to just use the
>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>>> would be available on the class in the beam repo.
>>>>>>>
>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>> surrounding this.
>>>>>>>
>>>>>>> -Daniel
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Tomo
>>>>
>>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Daniel Collins <dp...@google.com>.
> How will this be communicated to the user?

The docstring on PubsubLiteIO in beam will mention this. If they get the
one subject to the long release cycle, that's usually okay, unless they
need recently added features/fixes. Pub/Sub Lite's documentation will state
to prefer the one from our artifact, but the expectation is the one in beam
will work fine in recent releases.

> Will it just be documented somewhere that users should prefer
com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
need?

Yes, both in our public docs and the docstring for the beam PubsubLiteIO.

An interesting side effect of subclassing in this way is that if the user
adds a newer version of the PubsubLiteIO implementation-specific artifact
in their pom, they won't actually need to make any code changes: the beam
PubsubLiteIO will transparently refer to the new implementation version.



On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <bh...@google.com> wrote:

> How will this be communicated to the user? The idea is that they will
> discover PubsubLiteIO through their IDE as you described, but that will get
> them to the Beam one that's subject to the long release cycle. Will it just
> be documented somewhere that users should prefer
> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
> need?
>
> I wonder if a similar result could be achieved just by making Beam's
> PubsubLiteIO a stub with no implementation that directs users to the
> com.google.cloud one somehow?
>
> junit's matcher interface comes to mind as a precedent here. I have been
> warned many times by
> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>
> [1]
> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>
> Brian
>
> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com>
> wrote:
>
>> > Question 1: How are you going to approach testing/CI?
>> The pull requests in the java-pubsublite repo do not trigger Beam repo's
>> CI. You want to deliver things to your customers after they are tested as
>> much as possible.
>>
>> I'd like to run the integration tests in both locations. They would only
>> be meaningful in the beam setup when we went to validate a version bump on
>> the I/O.
>>
>> > Question2 : in the code below, what is the purpose of keeping the
>> PubsubLiteIO in the Beam repo?
>>
>> Visibility and autocomplete. It means the core class will be in the beam
>> javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in an IDE
>> you'll see pubsublite and PubsubLiteIO.
>>
>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com> wrote:
>>
>>> Hi Daniel,
>>> (You helped me apply some change to this strange setup a few months
>>> back. Thank you for working on rectifying the situation.)
>>>
>>> I like that idea overall.
>>>
>>> Question 1: How are you going to approach testing/CI?
>>> The pull requests in the java-pubsublite repo do not trigger Beam repo's
>>> CI. You want to deliver things to your customers after they are tested as
>>> much as possible.
>>>
>>>
>>> Question2 : in the code below, what is the purpose of keeping the
>>> PubsubLiteIO in the Beam repo?
>>>
>>> ```
>>> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO
>>> {}
>>> ````
>>>
>>> The backward compatibility came to my mind but I thought you may have
>>> more reasons.
>>>
>>>
>>> My memo:
>>> java-pubsublite repsitory has:
>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>> beam repo has:
>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>> (and other files in the same directory)
>>> google-cloud-pubsublite is not part of the Libraries BOM (yet) because
>>> of its pre-1.0 status.
>>>
>>>
>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> I don't know that the cycle would cause a problem- wouldn't it override
>>>> and cause it to use beam-sdks-java-core:2.30.0 (at least until beam goes to
>>>> 3.X.X)?
>>>>
>>>> Something we can do if this is an issue is mark pubsublite-beam-io's
>>>> dep on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and
>>>> just let overriding fix it if that works.
>>>>
>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> How do you plan to address the circular dependency? Won't this end up
>>>>> with Beam depending on older versions of itself?
>>>>>
>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>
>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hello beam developers,
>>>>>>
>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get
>>>>>> some feedback on a change to the model for hosting this I/O in beam. Our
>>>>>> team has been frustrated by the fact that we have no way to release
>>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>>> copies in parallel.
>>>>>>
>>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>>> there implemented as:
>>>>>>
>>>>>> ```
>>>>>> class PubsubLiteIO extends
>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>> ````
>>>>>> , and add a dependency on our beam artifact.
>>>>>>
>>>>>> This enables beam users who want to just use the
>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>>> also track the canonical version separately in our repo to get fixes and
>>>>>> improvements at a faster rate. All static methods from the parent class
>>>>>> would be available on the class in the beam repo.
>>>>>>
>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>> surrounding this.
>>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Brian Hulette <bh...@google.com>.
How will this be communicated to the user? The idea is that they will
discover PubsubLiteIO through their IDE as you described, but that will get
them to the Beam one that's subject to the long release cycle. Will it just
be documented somewhere that users should prefer
com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
need?

I wonder if a similar result could be achieved just by making Beam's
PubsubLiteIO a stub with no implementation that directs users to the
com.google.cloud one somehow?

junit's matcher interface comes to mind as a precedent here. I have been
warned many times by
Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].

[1]
https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()

Brian

On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <dp...@google.com> wrote:

> > Question 1: How are you going to approach testing/CI?
> The pull requests in the java-pubsublite repo do not trigger Beam repo's
> CI. You want to deliver things to your customers after they are tested as
> much as possible.
>
> I'd like to run the integration tests in both locations. They would only
> be meaningful in the beam setup when we went to validate a version bump on
> the I/O.
>
> > Question2 : in the code below, what is the purpose of keeping the
> PubsubLiteIO in the Beam repo?
>
> Visibility and autocomplete. It means the core class will be in the beam
> javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in an IDE
> you'll see pubsublite and PubsubLiteIO.
>
> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com> wrote:
>
>> Hi Daniel,
>> (You helped me apply some change to this strange setup a few months back.
>> Thank you for working on rectifying the situation.)
>>
>> I like that idea overall.
>>
>> Question 1: How are you going to approach testing/CI?
>> The pull requests in the java-pubsublite repo do not trigger Beam repo's
>> CI. You want to deliver things to your customers after they are tested as
>> much as possible.
>>
>>
>> Question2 : in the code below, what is the purpose of keeping the
>> PubsubLiteIO in the Beam repo?
>>
>> ```
>> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO
>> {}
>> ````
>>
>> The backward compatibility came to my mind but I thought you may have
>> more reasons.
>>
>>
>> My memo:
>> java-pubsublite repsitory has:
>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>> beam repo has:
>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>> (and other files in the same directory)
>> google-cloud-pubsublite is not part of the Libraries BOM (yet) because of
>> its pre-1.0 status.
>>
>>
>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> I don't know that the cycle would cause a problem- wouldn't it override
>>> and cause it to use beam-sdks-java-core:2.30.0 (at least until beam goes to
>>> 3.X.X)?
>>>
>>> Something we can do if this is an issue is mark pubsublite-beam-io's dep
>>> on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and just
>>> let overriding fix it if that works.
>>>
>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> How do you plan to address the circular dependency? Won't this end up
>>>> with Beam depending on older versions of itself?
>>>>
>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>
>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
>>>> wrote:
>>>>
>>>>> Hello beam developers,
>>>>>
>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get
>>>>> some feedback on a change to the model for hosting this I/O in beam. Our
>>>>> team has been frustrated by the fact that we have no way to release
>>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>>> months of the beam release cycle, and that those fixes are necessarily
>>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>>> copies in parallel.
>>>>>
>>>>> I'd like to retain our ability to quickly fix and improve the I/O
>>>>> while retaining end-user visibility within the beam repo. To do this, I'd
>>>>> like to remove all the implementation from the beam repo, and leave the I/O
>>>>> there implemented as:
>>>>>
>>>>> ```
>>>>> class PubsubLiteIO extends
>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>> ````
>>>>> , and add a dependency on our beam artifact.
>>>>>
>>>>> This enables beam users who want to just use the
>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>>> also track the canonical version separately in our repo to get fixes and
>>>>> improvements at a faster rate. All static methods from the parent class
>>>>> would be available on the class in the beam repo.
>>>>>
>>>>> I'd be interested to hear anyones thoughts and suggestions surrounding
>>>>> this.
>>>>>
>>>>> -Daniel
>>>>>
>>>>
>>
>> --
>> Regards,
>> Tomo
>>
>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Daniel Collins <dp...@google.com>.
> Question 1: How are you going to approach testing/CI?
The pull requests in the java-pubsublite repo do not trigger Beam repo's
CI. You want to deliver things to your customers after they are tested as
much as possible.

I'd like to run the integration tests in both locations. They would only be
meaningful in the beam setup when we went to validate a version bump on the
I/O.

> Question2 : in the code below, what is the purpose of keeping the
PubsubLiteIO in the Beam repo?

Visibility and autocomplete. It means the core class will be in the beam
javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` in an IDE
you'll see pubsublite and PubsubLiteIO.

On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <su...@google.com> wrote:

> Hi Daniel,
> (You helped me apply some change to this strange setup a few months back.
> Thank you for working on rectifying the situation.)
>
> I like that idea overall.
>
> Question 1: How are you going to approach testing/CI?
> The pull requests in the java-pubsublite repo do not trigger Beam repo's
> CI. You want to deliver things to your customers after they are tested as
> much as possible.
>
>
> Question2 : in the code below, what is the purpose of keeping the
> PubsubLiteIO in the Beam repo?
>
> ```
> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO {}
> ````
>
> The backward compatibility came to my mind but I thought you may have more
> reasons.
>
>
> My memo:
> java-pubsublite repsitory has:
> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
> beam repo has:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
> (and other files in the same directory)
> google-cloud-pubsublite is not part of the Libraries BOM (yet) because of
> its pre-1.0 status.
>
>
> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com>
> wrote:
>
>> I don't know that the cycle would cause a problem- wouldn't it override
>> and cause it to use beam-sdks-java-core:2.30.0 (at least until beam goes to
>> 3.X.X)?
>>
>> Something we can do if this is an issue is mark pubsublite-beam-io's dep
>> on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and just
>> let overriding fix it if that works.
>>
>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> How do you plan to address the circular dependency? Won't this end up
>>> with Beam depending on older versions of itself?
>>>
>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>
>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
>>> wrote:
>>>
>>>> Hello beam developers,
>>>>
>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get
>>>> some feedback on a change to the model for hosting this I/O in beam. Our
>>>> team has been frustrated by the fact that we have no way to release
>>>> features or fixes for bugs to customers on time scales shorter than the 1-2
>>>> months of the beam release cycle, and that those fixes are necessarily
>>>> coupled with a beam version upgrade. To work around this, I forked the I/O
>>>> in beam to our own repo about 6 months ago and have been maintaining both
>>>> copies in parallel.
>>>>
>>>> I'd like to retain our ability to quickly fix and improve the I/O while
>>>> retaining end-user visibility within the beam repo. To do this, I'd like
>>>> to remove all the implementation from the beam repo, and leave the I/O
>>>> there implemented as:
>>>>
>>>> ```
>>>> class PubsubLiteIO extends
>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>> ````
>>>> , and add a dependency on our beam artifact.
>>>>
>>>> This enables beam users who want to just use the
>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>>> also track the canonical version separately in our repo to get fixes and
>>>> improvements at a faster rate. All static methods from the parent class
>>>> would be available on the class in the beam repo.
>>>>
>>>> I'd be interested to hear anyones thoughts and suggestions surrounding
>>>> this.
>>>>
>>>> -Daniel
>>>>
>>>
>
> --
> Regards,
> Tomo
>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Tomo Suzuki <su...@google.com>.
Hi Daniel,
(You helped me apply some change to this strange setup a few months back.
Thank you for working on rectifying the situation.)

I like that idea overall.

Question 1: How are you going to approach testing/CI?
The pull requests in the java-pubsublite repo do not trigger Beam repo's
CI. You want to deliver things to your customers after they are tested as
much as possible.


Question2 : in the code below, what is the purpose of keeping the
PubsubLiteIO in the Beam repo?

```
class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO {}
````

The backward compatibility came to my mind but I thought you may have more
reasons.


My memo:
java-pubsublite repsitory has:
https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
beam repo has:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
(and other files in the same directory)
google-cloud-pubsublite is not part of the Libraries BOM (yet) because of
its pre-1.0 status.


On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <dp...@google.com> wrote:

> I don't know that the cycle would cause a problem- wouldn't it override
> and cause it to use beam-sdks-java-core:2.30.0 (at least until beam goes to
> 3.X.X)?
>
> Something we can do if this is an issue is mark pubsublite-beam-io's dep
> on beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and just
> let overriding fix it if that works.
>
> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> How do you plan to address the circular dependency? Won't this end up
>> with Beam depending on older versions of itself?
>>
>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>
>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
>> wrote:
>>
>>> Hello beam developers,
>>>
>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get some
>>> feedback on a change to the model for hosting this I/O in beam. Our team
>>> has been frustrated by the fact that we have no way to release features or
>>> fixes for bugs to customers on time scales shorter than the 1-2 months of
>>> the beam release cycle, and that those fixes are necessarily coupled with a
>>> beam version upgrade. To work around this, I forked the I/O in beam to our
>>> own repo about 6 months ago and have been maintaining both copies in
>>> parallel.
>>>
>>> I'd like to retain our ability to quickly fix and improve the I/O while
>>> retaining end-user visibility within the beam repo. To do this, I'd like
>>> to remove all the implementation from the beam repo, and leave the I/O
>>> there implemented as:
>>>
>>> ```
>>> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO
>>> {}
>>> ````
>>> , and add a dependency on our beam artifact.
>>>
>>> This enables beam users who want to just use the
>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>>> also track the canonical version separately in our repo to get fixes and
>>> improvements at a faster rate. All static methods from the parent class
>>> would be available on the class in the beam repo.
>>>
>>> I'd be interested to hear anyones thoughts and suggestions surrounding
>>> this.
>>>
>>> -Daniel
>>>
>>

-- 
Regards,
Tomo

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Daniel Collins <dp...@google.com>.
I don't know that the cycle would cause a problem- wouldn't it override and
cause it to use beam-sdks-java-core:2.30.0 (at least until beam goes to
3.X.X)?

Something we can do if this is an issue is mark pubsublite-beam-io's dep on
beam-sdks-java-core as 'provided'. But I'd prefer to avoid this and just
let overriding fix it if that works.

On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <ap...@google.com> wrote:

> How do you plan to address the circular dependency? Won't this end up with
> Beam depending on older versions of itself?
>
> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>
> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
> wrote:
>
>> Hello beam developers,
>>
>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get some
>> feedback on a change to the model for hosting this I/O in beam. Our team
>> has been frustrated by the fact that we have no way to release features or
>> fixes for bugs to customers on time scales shorter than the 1-2 months of
>> the beam release cycle, and that those fixes are necessarily coupled with a
>> beam version upgrade. To work around this, I forked the I/O in beam to our
>> own repo about 6 months ago and have been maintaining both copies in
>> parallel.
>>
>> I'd like to retain our ability to quickly fix and improve the I/O while
>> retaining end-user visibility within the beam repo. To do this, I'd like
>> to remove all the implementation from the beam repo, and leave the I/O
>> there implemented as:
>>
>> ```
>> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO
>> {}
>> ````
>> , and add a dependency on our beam artifact.
>>
>> This enables beam users who want to just use the
>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
>> also track the canonical version separately in our repo to get fixes and
>> improvements at a faster rate. All static methods from the parent class
>> would be available on the class in the beam repo.
>>
>> I'd be interested to hear anyones thoughts and suggestions surrounding
>> this.
>>
>> -Daniel
>>
>

Re: Aliasing Pub/Sub Lite IO in external repo

Posted by Andrew Pilloud <ap...@google.com>.
How do you plan to address the circular dependency? Won't this end up with
Beam depending on older versions of itself?

beam-sdks-java-io-google-cloud-platform:2.30.0 -> pubsublite-beam-io:0.16.0
-> beam-sdks-java-core:2.29.0

On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <dp...@google.com>
wrote:

> Hello beam developers,
>
> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to get some
> feedback on a change to the model for hosting this I/O in beam. Our team
> has been frustrated by the fact that we have no way to release features or
> fixes for bugs to customers on time scales shorter than the 1-2 months of
> the beam release cycle, and that those fixes are necessarily coupled with a
> beam version upgrade. To work around this, I forked the I/O in beam to our
> own repo about 6 months ago and have been maintaining both copies in
> parallel.
>
> I'd like to retain our ability to quickly fix and improve the I/O while
> retaining end-user visibility within the beam repo. To do this, I'd like
> to remove all the implementation from the beam repo, and leave the I/O
> there implemented as:
>
> ```
> class PubsubLiteIO extends com.google.cloud.pubsublite.beam.PubsubLiteIO {}
> ````
> , and add a dependency on our beam artifact.
>
> This enables beam users who want to just use the
> beam-sdks-java-io-google-cloud-platform artifact to do so, but they can
> also track the canonical version separately in our repo to get fixes and
> improvements at a faster rate. All static methods from the parent class
> would be available on the class in the beam repo.
>
> I'd be interested to hear anyones thoughts and suggestions surrounding
> this.
>
> -Daniel
>