You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kenneth Knowles <kl...@google.com> on 2018/10/19 17:39:34 UTC

[DISCUSS] Publish vendored dependencies independently

Hi all,

A while ago we had pretty good consensus that we should vendor
("pre-shade") specific dependencies, and there's start on it here:
https://github.com/apache/beam/tree/master/vendor

IntelliJ notes:

 - With shading, it is hard (impossible?) to step into dependency code in
IntelliJ's debugger, because the actual symbol at runtime does not match
what is in the external jars
 - With vendoring, if the vendored dependencies are part of the project
then IntelliJ gets confused because it operates on source, not the produced
jars

The second one has a quick fix for most cases*: don't make the vendored dep
a subproject, but just separately build and publish it. Since a vendored
dependency should change much more infrequently (or if we bake the version
into the name, ~never) this means we publish once and save headache and
build time forever.

WDYT? Have I overlooked something? How about we set up vendored versions of
guava, protobuf, grpc, and publish them. We don't have to actually start
using them yet, and can do it incrementally.

(side note: what do other projects like Flink do?)

Kenn

*for generated proto classes, they need to be altered after being generated
so shading happens there, but actually only relocation and the shared
vendored dep should work elsewhere in the project

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Ryan Williams <ry...@runsascoded.com>.
Really interesting ideas and attempts, Kenn/Luke. One inline:

On Fri, Oct 19, 2018 at 7:11 PM Lukasz Cwik <lc...@google.com> wrote:

> I have tried several times to improve the build system and intellij
> integration and each attempt ended with little progress when dealing with
> vendored code. My latest attempt has been the most promising where I take
> the vendored classes/jars and decompile them generating the source that
> Intellij can then use. I have a branch[1] that demonstrates the idea. It
> works pretty well (and up until a change where we started vendoring gRPC,
> was impractical to do. Instructions to try it out are:
>
> // Clean up any remnants of prior builds/intellij projects
> git clean -fdx
> // Generated the source for vendored/shaded modules
> ./gradlew decompile
>
> // Remove the "generated" Java sources for protos so they don't conflict with the decompiled sources.
> rm -rf model/pipeline/build/generated/source/proto
> rm -rf model/job-management/build/generated/source/proto
> rm -rf model/fn-execution/build/generated/source/proto
> // Import the project into Intellij, most code completion now works still some issues with a few classes.
> // Note that the Java decompiler doesn't generate valid source so still need to delegate to Gradle for build/run/test actions
> // Other decompilers may do a better/worse job but haven't tried them.
>
>
> The problems that I face are that the generated Java source from the
> protos and the decompiled source from the compiled version of that source
> post shading are both being imported as content roots and then conflict.
> Also, the CFR decompiler isn't producing valid source, if people could try
> others and report their mileage, we may find one that works and then we
> would be able to use intellij to build/run our code and not need to
> delegate all our build/run/test actions to Gradle.
>
> After all these attempts I have done, vendoring the dependencies outside
> of the project seems like a sane approach and unless someone wants to take
> a stab at the best progress I have made above, I would go with what Kenn is
> suggesting even though it will mean that we will need to perform releases
> every time we want to change the version of one of our vendored
> dependencies.
>
> 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>
>
> On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Another reason to push on this is to get build times down. Once only
>> generated proto classes use the shadow plugin we'll cut the build time in
>> ~half? And there is no reason to constantly re-vendor.
>>
>> Kenn
>>
>> On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com> wrote:
>>
>>> Hi all,
>>>
>>> A while ago we had pretty good consensus that we should vendor
>>> ("pre-shade") specific dependencies, and there's start on it here:
>>> https://github.com/apache/beam/tree/master/vendor
>>>
>>> IntelliJ notes:
>>>
>>>  - With shading, it is hard (impossible?) to step into dependency code
>>> in IntelliJ's debugger, because the actual symbol at runtime does not match
>>> what is in the external jars
>>>
>>
> Intellij can step through the classes if they were published outside the
> project since it can decompile them. The source won't be legible.
> Decompiling the source as in my example effectively shows the same issue.
>

Sorry if this was covered elsewhere in this thread, but it sounds like what
we need is to publish `-sources.jar`s alongside our vendored `.jar`s, to
allow IntelliJ's normal flow for [aligning .class-files and corresponding
sources] to work.

If we started with our dependencies' existing `-sources.jar`s, and renamed
the files in those JARs to appear to live in the vendored packages (but
left the files' contents alone, so they'd contain package- and
import-statements with the un-vendored names), my impression is that
IntelliJ would still align the .class-files to the intended files and line
numbers (even though the bytecode technically wouldn't match the source
files). This assumes the bytecode-vendoring doesn't change the line-numbers
in the bytecode, which I hope is true, but don't know for sure.

Another option would be doing the vendoring at the source-level, rather
than the bytecode-level, and publishing vendored `.jar`s and
`-sources.jar`s that actually match. These would be indistinguishable from
any other library, to IntelliJ (and in general), so all downstream tooling
should work. We'd have to maintain a fork of each vendored dep where we
check out the release we want, apply a script that renames all package- and
import-statements, and publish the results. It would be a bit more
involved, but possibly less brittle. If tricking IntelliJ with a modified
`-sources.jar` as I described previously would work, we could do that, and
not bother with this.


>
>
>>  - With vendoring, if the vendored dependencies are part of the project
>>> then IntelliJ gets confused because it operates on source, not the produced
>>> jars
>>>
>>
> Yes, I tried several ways to get intellij to ignore the source and use the
> output jars but no luck.
>
>
>> The second one has a quick fix for most cases*: don't make the vendored
>>> dep a subproject, but just separately build and publish it. Since a
>>> vendored dependency should change much more infrequently (or if we bake the
>>> version into the name, ~never) this means we publish once and save headache
>>> and build time forever.
>>>
>>> WDYT? Have I overlooked something? How about we set up vendored versions
>>> of guava, protobuf, grpc, and publish them. We don't have to actually start
>>> using them yet, and can do it incrementally.
>>>
>>
> Currently we are relocating code depending on the version string. If the
> major version is >= 1, we use only the major version within the package
> string and rely on semantic versioning provided by the dependency to not
> break people. If the major version is 0, we assume the dependency is
> unstable and use the full version as part of the package string during
> relocation.
>
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ only
> by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import
> statement in all java files to be updated (not too difficult with some
> sed/grep skills)
>
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly
> different versions of the same dependency which is known to be incompatible
> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
> tcnative works).
>
>
>>
>>> (side note: what do other projects like Flink do?)
>>>
>>> Kenn
>>>
>>> *for generated proto classes, they need to be altered after being
>>> generated so shading happens there, but actually only relocation and the
>>> shared vendored dep should work elsewhere in the project
>>>
>>
> We could publish the protos and treat them as "external" dependencies
> within the Java projects which would also remove this pain point.
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
A vendored Guava 20.0 and gRPC 1.13.1 have been published. PR 7105[1] will
enable consuming the first one which should help many Intellij/Eclipse
contributors with import issues.

https://github.com/apache/beam/pull/7105

On Thu, Nov 15, 2018 at 6:06 PM Lukasz Cwik <lc...@google.com> wrote:

> I have created the artifacts and sent out a vote thread.
>
> On Thu, Nov 15, 2018 at 5:23 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> On Thu, Nov 15, 2018 at 11:47 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> In case any of this affects how artifacts are published in general,
>>> please make sure that publishing to 3rd party repo continues to work.
>>>
>>> For example: ./gradlew :beam-runners-flink_2.11-job-server:publish
>>> -PisRelease -PnoSigning -PdistMgmtSnapshotsUrl=
>>> https://mycustomrepo/libs-release
>>>
>>> Yes, I still kept this around since I used the code that we currently
>> use for publishing.
>>
>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Thu, Nov 15, 2018 at 11:27 AM Kenneth Knowles <ke...@apache.org>
>>> wrote:
>>>
>>>> Agree on the low bar. We should just make them all 0.x releases to send
>>>> the right message (don't use, and no compatibility) and not worry as much
>>>> about bad releases, which we
>>>> would never actually depend on in the project.
>>>>
>>>> QQ: What does the new -P flag do? I was also hoping to eliminate the
>>>> redundant -PisRelease flag, especially for vendored deps that should really
>>>> be straight line.
>>>>
>>>
>> I found having the -PisRelease flag useful for local testing because I
>> could publish -SNAPSHOT builds but it isn't strictly necessary.
>> The -PvendoredDependenciesOnly enables publishing of vendored
>> dependencies so they aren't part of the regular release process. The name
>> could be changed to be something more appropriate.
>>
>>
>>>
>>>> Kenn
>>>>
>>>> On Wed, Nov 14, 2018 at 12:38 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Its a small hassle but could be checked in with some changes, my
>>>>> example commit was so that people could try it out as is.
>>>>>
>>>>> I'll work towards getting it checked in and then start a release for
>>>>> gRPC and guava.
>>>>>
>>>>> On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Thanks for pushing this forward Luke.
>>>>>>
>>>>>> My understanding is that these vendored grpc artifacts will only be
>>>>>> consumed directly by Beam internal components (as opposed to Beam user
>>>>>> projects). So there should be a fairly low bar for publishing them. But
>>>>>> perhaps we should have some short checklist for releasing them for
>>>>>> consistency.
>>>>>>
>>>>>> One item I would suggest for such a checklist would be to publish
>>>>>> artifacts from checked-in apache/beam sources and then tag the release
>>>>>> commit. Is it possible to get your changes merged in first, or is there a
>>>>>> chicken-and-egg problem that artifacts need to be published and available
>>>>>> for consumption?
>>>>>>
>>>>>> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Note, I could also release the vendored version of guava 20 in
>>>>>>> preparation for us to start consuming it. Any concerns?
>>>>>>>
>>>>>>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have made some incremental progress on this and wanted to release
>>>>>>>> our first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>>>>>>> number of the import/code completion errors that Intellij was experiencing.
>>>>>>>> I have published an example of what the jar/pom looks like in the Apache
>>>>>>>> Staging repo:
>>>>>>>>
>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>>>>>>
>>>>>>>> You can also checkout[1] and from a clean workspace run:
>>>>>>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>>>>>>> -PvendoredDependenciesOnly
>>>>>>>> which will build a vendored version of gRPC that is published to
>>>>>>>> your local maven repository. All the projects that depended on the gradle
>>>>>>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>>>>>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>>>>>>
>>>>>>>> I was planning to follow the Apache Beam release process but only
>>>>>>>> for this specific artifact and start a vote thread if there aren't any
>>>>>>>> concerns.
>>>>>>>>
>>>>>>>> 1:
>>>>>>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also
>>>>>>>>> am recommending what Thomas is suggesting as well.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <
>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <
>>>>>>>>>> mxm@apache.org
>>>>>>>>>> > <ma...@apache.org>> wrote:
>>>>>>>>>> >
>>>>>>>>>> >     Question: How would a user end up with the same shaded
>>>>>>>>>> dependency
>>>>>>>>>> >     twice?
>>>>>>>>>> >     The shaded dependencies are transitive dependencies of Beam
>>>>>>>>>> and thus,
>>>>>>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>>>>>>> different
>>>>>>>>>> >     versions of Beam in the same JVM?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > What I was referring to was that they aren't exactly the same
>>>>>>>>>> dependency
>>>>>>>>>> > but slightly different versions of the same dependency. Since
>>>>>>>>>> we are
>>>>>>>>>> > planning to vendor each dependency and its transitive
>>>>>>>>>> dependencies as
>>>>>>>>>> > part of the same jar, we can have  vendor-A that contains
>>>>>>>>>> shaded
>>>>>>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0
>>>>>>>>>> both with
>>>>>>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>>>>>>> > transitive-C 2.0 can't be on the same classpath because they
>>>>>>>>>> can't be
>>>>>>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>>>>>>> > files/strings, ...
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> Ah yes. Get it. Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>
>>>>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
I have created the artifacts and sent out a vote thread.

On Thu, Nov 15, 2018 at 5:23 PM Lukasz Cwik <lc...@google.com> wrote:

> On Thu, Nov 15, 2018 at 11:47 AM Thomas Weise <th...@apache.org> wrote:
>
>> In case any of this affects how artifacts are published in general,
>> please make sure that publishing to 3rd party repo continues to work.
>>
>> For example: ./gradlew :beam-runners-flink_2.11-job-server:publish
>> -PisRelease -PnoSigning -PdistMgmtSnapshotsUrl=
>> https://mycustomrepo/libs-release
>>
>> Yes, I still kept this around since I used the code that we currently use
> for publishing.
>
>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Nov 15, 2018 at 11:27 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Agree on the low bar. We should just make them all 0.x releases to send
>>> the right message (don't use, and no compatibility) and not worry as much
>>> about bad releases, which we
>>> would never actually depend on in the project.
>>>
>>> QQ: What does the new -P flag do? I was also hoping to eliminate the
>>> redundant -PisRelease flag, especially for vendored deps that should really
>>> be straight line.
>>>
>>
> I found having the -PisRelease flag useful for local testing because I
> could publish -SNAPSHOT builds but it isn't strictly necessary.
> The -PvendoredDependenciesOnly enables publishing of vendored dependencies
> so they aren't part of the regular release process. The name could be
> changed to be something more appropriate.
>
>
>>
>>> Kenn
>>>
>>> On Wed, Nov 14, 2018 at 12:38 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> Its a small hassle but could be checked in with some changes, my
>>>> example commit was so that people could try it out as is.
>>>>
>>>> I'll work towards getting it checked in and then start a release for
>>>> gRPC and guava.
>>>>
>>>> On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org> wrote:
>>>>
>>>>> Thanks for pushing this forward Luke.
>>>>>
>>>>> My understanding is that these vendored grpc artifacts will only be
>>>>> consumed directly by Beam internal components (as opposed to Beam user
>>>>> projects). So there should be a fairly low bar for publishing them. But
>>>>> perhaps we should have some short checklist for releasing them for
>>>>> consistency.
>>>>>
>>>>> One item I would suggest for such a checklist would be to publish
>>>>> artifacts from checked-in apache/beam sources and then tag the release
>>>>> commit. Is it possible to get your changes merged in first, or is there a
>>>>> chicken-and-egg problem that artifacts need to be published and available
>>>>> for consumption?
>>>>>
>>>>> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> Note, I could also release the vendored version of guava 20 in
>>>>>> preparation for us to start consuming it. Any concerns?
>>>>>>
>>>>>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> I have made some incremental progress on this and wanted to release
>>>>>>> our first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>>>>>> number of the import/code completion errors that Intellij was experiencing.
>>>>>>> I have published an example of what the jar/pom looks like in the Apache
>>>>>>> Staging repo:
>>>>>>>
>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>>>>>
>>>>>>> You can also checkout[1] and from a clean workspace run:
>>>>>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>>>>>> -PvendoredDependenciesOnly
>>>>>>> which will build a vendored version of gRPC that is published to
>>>>>>> your local maven repository. All the projects that depended on the gradle
>>>>>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>>>>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>>>>>
>>>>>>> I was planning to follow the Apache Beam release process but only
>>>>>>> for this specific artifact and start a vote thread if there aren't any
>>>>>>> concerns.
>>>>>>>
>>>>>>> 1:
>>>>>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also
>>>>>>>> am recommending what Thomas is suggesting as well.
>>>>>>>>
>>>>>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <
>>>>>>>>> mxm@apache.org
>>>>>>>>> > <ma...@apache.org>> wrote:
>>>>>>>>> >
>>>>>>>>> >     Question: How would a user end up with the same shaded
>>>>>>>>> dependency
>>>>>>>>> >     twice?
>>>>>>>>> >     The shaded dependencies are transitive dependencies of Beam
>>>>>>>>> and thus,
>>>>>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>>>>>> different
>>>>>>>>> >     versions of Beam in the same JVM?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > What I was referring to was that they aren't exactly the same
>>>>>>>>> dependency
>>>>>>>>> > but slightly different versions of the same dependency. Since we
>>>>>>>>> are
>>>>>>>>> > planning to vendor each dependency and its transitive
>>>>>>>>> dependencies as
>>>>>>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>>>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0
>>>>>>>>> both with
>>>>>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>>>>>> > transitive-C 2.0 can't be on the same classpath because they
>>>>>>>>> can't be
>>>>>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>>>>>> > files/strings, ...
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> Ah yes. Get it. Thanks!
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
On Thu, Nov 15, 2018 at 11:47 AM Thomas Weise <th...@apache.org> wrote:

> In case any of this affects how artifacts are published in general, please
> make sure that publishing to 3rd party repo continues to work.
>
> For example: ./gradlew :beam-runners-flink_2.11-job-server:publish
> -PisRelease -PnoSigning -PdistMgmtSnapshotsUrl=
> https://mycustomrepo/libs-release
>
> Yes, I still kept this around since I used the code that we currently use
for publishing.


> Thanks,
> Thomas
>
>
> On Thu, Nov 15, 2018 at 11:27 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Agree on the low bar. We should just make them all 0.x releases to send
>> the right message (don't use, and no compatibility) and not worry as much
>> about bad releases, which we
>> would never actually depend on in the project.
>>
>> QQ: What does the new -P flag do? I was also hoping to eliminate the
>> redundant -PisRelease flag, especially for vendored deps that should really
>> be straight line.
>>
>
I found having the -PisRelease flag useful for local testing because I
could publish -SNAPSHOT builds but it isn't strictly necessary.
The -PvendoredDependenciesOnly enables publishing of vendored dependencies
so they aren't part of the regular release process. The name could be
changed to be something more appropriate.


>
>> Kenn
>>
>> On Wed, Nov 14, 2018 at 12:38 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> Its a small hassle but could be checked in with some changes, my example
>>> commit was so that people could try it out as is.
>>>
>>> I'll work towards getting it checked in and then start a release for
>>> gRPC and guava.
>>>
>>> On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org> wrote:
>>>
>>>> Thanks for pushing this forward Luke.
>>>>
>>>> My understanding is that these vendored grpc artifacts will only be
>>>> consumed directly by Beam internal components (as opposed to Beam user
>>>> projects). So there should be a fairly low bar for publishing them. But
>>>> perhaps we should have some short checklist for releasing them for
>>>> consistency.
>>>>
>>>> One item I would suggest for such a checklist would be to publish
>>>> artifacts from checked-in apache/beam sources and then tag the release
>>>> commit. Is it possible to get your changes merged in first, or is there a
>>>> chicken-and-egg problem that artifacts need to be published and available
>>>> for consumption?
>>>>
>>>> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Note, I could also release the vendored version of guava 20 in
>>>>> preparation for us to start consuming it. Any concerns?
>>>>>
>>>>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I have made some incremental progress on this and wanted to release
>>>>>> our first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>>>>> number of the import/code completion errors that Intellij was experiencing.
>>>>>> I have published an example of what the jar/pom looks like in the Apache
>>>>>> Staging repo:
>>>>>>
>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>>>>
>>>>>> You can also checkout[1] and from a clean workspace run:
>>>>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>>>>> -PvendoredDependenciesOnly
>>>>>> which will build a vendored version of gRPC that is published to your
>>>>>> local maven repository. All the projects that depended on the gradle
>>>>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>>>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>>>>
>>>>>> I was planning to follow the Apache Beam release process but only for
>>>>>> this specific artifact and start a vote thread if there aren't any concerns.
>>>>>>
>>>>>> 1:
>>>>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also
>>>>>>> am recommending what Thomas is suggesting as well.
>>>>>>>
>>>>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <
>>>>>>>> mxm@apache.org
>>>>>>>> > <ma...@apache.org>> wrote:
>>>>>>>> >
>>>>>>>> >     Question: How would a user end up with the same shaded
>>>>>>>> dependency
>>>>>>>> >     twice?
>>>>>>>> >     The shaded dependencies are transitive dependencies of Beam
>>>>>>>> and thus,
>>>>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>>>>> different
>>>>>>>> >     versions of Beam in the same JVM?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > What I was referring to was that they aren't exactly the same
>>>>>>>> dependency
>>>>>>>> > but slightly different versions of the same dependency. Since we
>>>>>>>> are
>>>>>>>> > planning to vendor each dependency and its transitive
>>>>>>>> dependencies as
>>>>>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both
>>>>>>>> with
>>>>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>>>>> > transitive-C 2.0 can't be on the same classpath because they
>>>>>>>> can't be
>>>>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>>>>> > files/strings, ...
>>>>>>>> >
>>>>>>>>
>>>>>>>> Ah yes. Get it. Thanks!
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Thomas Weise <th...@apache.org>.
In case any of this affects how artifacts are published in general, please
make sure that publishing to 3rd party repo continues to work.

For example: ./gradlew :beam-runners-flink_2.11-job-server:publish
-PisRelease -PnoSigning -PdistMgmtSnapshotsUrl=
https://mycustomrepo/libs-release

Thanks,
Thomas


On Thu, Nov 15, 2018 at 11:27 AM Kenneth Knowles <ke...@apache.org> wrote:

> Agree on the low bar. We should just make them all 0.x releases to send
> the right message (don't use, and no compatibility) and not worry as much
> about bad releases, which we
> would never actually depend on in the project.
>
> QQ: What does the new -P flag do? I was also hoping to eliminate the
> redundant -PisRelease flag, especially for vendored deps that should really
> be straight line.
>
> Kenn
>
> On Wed, Nov 14, 2018 at 12:38 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> Its a small hassle but could be checked in with some changes, my example
>> commit was so that people could try it out as is.
>>
>> I'll work towards getting it checked in and then start a release for gRPC
>> and guava.
>>
>> On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org> wrote:
>>
>>> Thanks for pushing this forward Luke.
>>>
>>> My understanding is that these vendored grpc artifacts will only be
>>> consumed directly by Beam internal components (as opposed to Beam user
>>> projects). So there should be a fairly low bar for publishing them. But
>>> perhaps we should have some short checklist for releasing them for
>>> consistency.
>>>
>>> One item I would suggest for such a checklist would be to publish
>>> artifacts from checked-in apache/beam sources and then tag the release
>>> commit. Is it possible to get your changes merged in first, or is there a
>>> chicken-and-egg problem that artifacts need to be published and available
>>> for consumption?
>>>
>>> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> Note, I could also release the vendored version of guava 20 in
>>>> preparation for us to start consuming it. Any concerns?
>>>>
>>>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> I have made some incremental progress on this and wanted to release
>>>>> our first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>>>> number of the import/code completion errors that Intellij was experiencing.
>>>>> I have published an example of what the jar/pom looks like in the Apache
>>>>> Staging repo:
>>>>>
>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>>>
>>>>> You can also checkout[1] and from a clean workspace run:
>>>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>>>> -PvendoredDependenciesOnly
>>>>> which will build a vendored version of gRPC that is published to your
>>>>> local maven repository. All the projects that depended on the gradle
>>>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>>>
>>>>> I was planning to follow the Apache Beam release process but only for
>>>>> this specific artifact and start a vote thread if there aren't any concerns.
>>>>>
>>>>> 1:
>>>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>>>
>>>>>
>>>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also am
>>>>>> recommending what Thomas is suggesting as well.
>>>>>>
>>>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>>>>>>> > <ma...@apache.org>> wrote:
>>>>>>> >
>>>>>>> >     Question: How would a user end up with the same shaded
>>>>>>> dependency
>>>>>>> >     twice?
>>>>>>> >     The shaded dependencies are transitive dependencies of Beam
>>>>>>> and thus,
>>>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>>>> different
>>>>>>> >     versions of Beam in the same JVM?
>>>>>>> >
>>>>>>> >
>>>>>>> > What I was referring to was that they aren't exactly the same
>>>>>>> dependency
>>>>>>> > but slightly different versions of the same dependency. Since we
>>>>>>> are
>>>>>>> > planning to vendor each dependency and its transitive dependencies
>>>>>>> as
>>>>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both
>>>>>>> with
>>>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>>>> > transitive-C 2.0 can't be on the same classpath because they can't
>>>>>>> be
>>>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>>>> > files/strings, ...
>>>>>>> >
>>>>>>>
>>>>>>> Ah yes. Get it. Thanks!
>>>>>>>
>>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <ke...@apache.org>.
Agree on the low bar. We should just make them all 0.x releases to send the
right message (don't use, and no compatibility) and not worry as much about
bad releases, which we
would never actually depend on in the project.

QQ: What does the new -P flag do? I was also hoping to eliminate the
redundant -PisRelease flag, especially for vendored deps that should really
be straight line.

Kenn

On Wed, Nov 14, 2018 at 12:38 PM Lukasz Cwik <lc...@google.com> wrote:

> Its a small hassle but could be checked in with some changes, my example
> commit was so that people could try it out as is.
>
> I'll work towards getting it checked in and then start a release for gRPC
> and guava.
>
> On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org> wrote:
>
>> Thanks for pushing this forward Luke.
>>
>> My understanding is that these vendored grpc artifacts will only be
>> consumed directly by Beam internal components (as opposed to Beam user
>> projects). So there should be a fairly low bar for publishing them. But
>> perhaps we should have some short checklist for releasing them for
>> consistency.
>>
>> One item I would suggest for such a checklist would be to publish
>> artifacts from checked-in apache/beam sources and then tag the release
>> commit. Is it possible to get your changes merged in first, or is there a
>> chicken-and-egg problem that artifacts need to be published and available
>> for consumption?
>>
>> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> Note, I could also release the vendored version of guava 20 in
>>> preparation for us to start consuming it. Any concerns?
>>>
>>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> I have made some incremental progress on this and wanted to release our
>>>> first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>>> number of the import/code completion errors that Intellij was experiencing.
>>>> I have published an example of what the jar/pom looks like in the Apache
>>>> Staging repo:
>>>>
>>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>>
>>>> You can also checkout[1] and from a clean workspace run:
>>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>>> -PvendoredDependenciesOnly
>>>> which will build a vendored version of gRPC that is published to your
>>>> local maven repository. All the projects that depended on the gradle
>>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>>
>>>> I was planning to follow the Apache Beam release process but only for
>>>> this specific artifact and start a vote thread if there aren't any concerns.
>>>>
>>>> 1:
>>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>>
>>>>
>>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also am
>>>>> recommending what Thomas is suggesting as well.
>>>>>
>>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>>>>>> > <ma...@apache.org>> wrote:
>>>>>> >
>>>>>> >     Question: How would a user end up with the same shaded
>>>>>> dependency
>>>>>> >     twice?
>>>>>> >     The shaded dependencies are transitive dependencies of Beam and
>>>>>> thus,
>>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>>> different
>>>>>> >     versions of Beam in the same JVM?
>>>>>> >
>>>>>> >
>>>>>> > What I was referring to was that they aren't exactly the same
>>>>>> dependency
>>>>>> > but slightly different versions of the same dependency. Since we
>>>>>> are
>>>>>> > planning to vendor each dependency and its transitive dependencies
>>>>>> as
>>>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both
>>>>>> with
>>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>>> > transitive-C 2.0 can't be on the same classpath because they can't
>>>>>> be
>>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>>> > files/strings, ...
>>>>>> >
>>>>>>
>>>>>> Ah yes. Get it. Thanks!
>>>>>>
>>>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
Its a small hassle but could be checked in with some changes, my example
commit was so that people could try it out as is.

I'll work towards getting it checked in and then start a release for gRPC
and guava.

On Wed, Nov 14, 2018 at 11:45 AM Scott Wegner <sc...@apache.org> wrote:

> Thanks for pushing this forward Luke.
>
> My understanding is that these vendored grpc artifacts will only be
> consumed directly by Beam internal components (as opposed to Beam user
> projects). So there should be a fairly low bar for publishing them. But
> perhaps we should have some short checklist for releasing them for
> consistency.
>
> One item I would suggest for such a checklist would be to publish
> artifacts from checked-in apache/beam sources and then tag the release
> commit. Is it possible to get your changes merged in first, or is there a
> chicken-and-egg problem that artifacts need to be published and available
> for consumption?
>
> On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> Note, I could also release the vendored version of guava 20 in
>> preparation for us to start consuming it. Any concerns?
>>
>> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> I have made some incremental progress on this and wanted to release our
>>> first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>>> number of the import/code completion errors that Intellij was experiencing.
>>> I have published an example of what the jar/pom looks like in the Apache
>>> Staging repo:
>>>
>>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>>
>>> You can also checkout[1] and from a clean workspace run:
>>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>>> -PvendoredDependenciesOnly
>>> which will build a vendored version of gRPC that is published to your
>>> local maven repository. All the projects that depended on the gradle
>>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>>
>>> I was planning to follow the Apache Beam release process but only for
>>> this specific artifact and start a vote thread if there aren't any concerns.
>>>
>>> 1:
>>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>>
>>>
>>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> Thats a good point Thomas, hadn't considered the lib/ case. I also am
>>>> recommending what Thomas is suggesting as well.
>>>>
>>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>>> >
>>>>> >
>>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>>>>> > <ma...@apache.org>> wrote:
>>>>> >
>>>>> >     Question: How would a user end up with the same shaded dependency
>>>>> >     twice?
>>>>> >     The shaded dependencies are transitive dependencies of Beam and
>>>>> thus,
>>>>> >     this shouldn't happen. Is this a safe-guard when running
>>>>> different
>>>>> >     versions of Beam in the same JVM?
>>>>> >
>>>>> >
>>>>> > What I was referring to was that they aren't exactly the same
>>>>> dependency
>>>>> > but slightly different versions of the same dependency. Since we are
>>>>> > planning to vendor each dependency and its transitive dependencies
>>>>> as
>>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both
>>>>> with
>>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>>> > transitive-C 2.0 can't be on the same classpath because they can't
>>>>> be
>>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>>> > files/strings, ...
>>>>> >
>>>>>
>>>>> Ah yes. Get it. Thanks!
>>>>>
>>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Scott Wegner <sc...@apache.org>.
Thanks for pushing this forward Luke.

My understanding is that these vendored grpc artifacts will only be
consumed directly by Beam internal components (as opposed to Beam user
projects). So there should be a fairly low bar for publishing them. But
perhaps we should have some short checklist for releasing them for
consistency.

One item I would suggest for such a checklist would be to publish artifacts
from checked-in apache/beam sources and then tag the release commit. Is it
possible to get your changes merged in first, or is there a chicken-and-egg
problem that artifacts need to be published and available for consumption?

On Wed, Nov 14, 2018 at 10:51 AM Lukasz Cwik <lc...@google.com> wrote:

> Note, I could also release the vendored version of guava 20 in preparation
> for us to start consuming it. Any concerns?
>
> On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> I have made some incremental progress on this and wanted to release our
>> first vendored dependency of gRPC 1.13.1 since I was able to fix a good
>> number of the import/code completion errors that Intellij was experiencing.
>> I have published an example of what the jar/pom looks like in the Apache
>> Staging repo:
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>>
>> You can also checkout[1] and from a clean workspace run:
>> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
>> -PvendoredDependenciesOnly
>> which will build a vendored version of gRPC that is published to your
>> local maven repository. All the projects that depended on the gradle
>> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
>> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>>
>> I was planning to follow the Apache Beam release process but only for
>> this specific artifact and start a vote thread if there aren't any concerns.
>>
>> 1:
>> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>>
>>
>> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> Thats a good point Thomas, hadn't considered the lib/ case. I also am
>>> recommending what Thomas is suggesting as well.
>>>
>>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>>> >
>>>> >
>>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>>>> > <ma...@apache.org>> wrote:
>>>> >
>>>> >     Question: How would a user end up with the same shaded dependency
>>>> >     twice?
>>>> >     The shaded dependencies are transitive dependencies of Beam and
>>>> thus,
>>>> >     this shouldn't happen. Is this a safe-guard when running different
>>>> >     versions of Beam in the same JVM?
>>>> >
>>>> >
>>>> > What I was referring to was that they aren't exactly the same
>>>> dependency
>>>> > but slightly different versions of the same dependency. Since we are
>>>> > planning to vendor each dependency and its transitive dependencies as
>>>> > part of the same jar, we can have  vendor-A that contains shaded
>>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both
>>>> with
>>>> > different package prefixes. It can be that transitive-C 1.0 and
>>>> > transitive-C 2.0 can't be on the same classpath because they can't be
>>>> > perfectly shaded due to JNI, java reflection, magical property
>>>> > files/strings, ...
>>>> >
>>>>
>>>> Ah yes. Get it. Thanks!
>>>>
>>>

-- 




Got feedback? tinyurl.com/swegner-feedback

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
Note, I could also release the vendored version of guava 20 in preparation
for us to start consuming it. Any concerns?

On Tue, Nov 13, 2018 at 3:59 PM Lukasz Cwik <lc...@google.com> wrote:

> I have made some incremental progress on this and wanted to release our
> first vendored dependency of gRPC 1.13.1 since I was able to fix a good
> number of the import/code completion errors that Intellij was experiencing.
> I have published an example of what the jar/pom looks like in the Apache
> Staging repo:
>
> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/
>
> You can also checkout[1] and from a clean workspace run:
> ./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
> -PvendoredDependenciesOnly
> which will build a vendored version of gRPC that is published to your
> local maven repository. All the projects that depended on the gradle
> beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
> org.apache.beam:beam-vendor-grpc-1_13_1:0.1
>
> I was planning to follow the Apache Beam release process but only for this
> specific artifact and start a vote thread if there aren't any concerns.
>
> 1:
> https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9
>
>
> On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> Thats a good point Thomas, hadn't considered the lib/ case. I also am
>> recommending what Thomas is suggesting as well.
>>
>> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
>> wrote:
>>
>>> On 25.10.18 19:23, Lukasz Cwik wrote:
>>> >
>>> >
>>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>>> > <ma...@apache.org>> wrote:
>>> >
>>> >     Question: How would a user end up with the same shaded dependency
>>> >     twice?
>>> >     The shaded dependencies are transitive dependencies of Beam and
>>> thus,
>>> >     this shouldn't happen. Is this a safe-guard when running different
>>> >     versions of Beam in the same JVM?
>>> >
>>> >
>>> > What I was referring to was that they aren't exactly the same
>>> dependency
>>> > but slightly different versions of the same dependency. Since we are
>>> > planning to vendor each dependency and its transitive dependencies as
>>> > part of the same jar, we can have  vendor-A that contains shaded
>>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both with
>>> > different package prefixes. It can be that transitive-C 1.0 and
>>> > transitive-C 2.0 can't be on the same classpath because they can't be
>>> > perfectly shaded due to JNI, java reflection, magical property
>>> > files/strings, ...
>>> >
>>>
>>> Ah yes. Get it. Thanks!
>>>
>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
I have made some incremental progress on this and wanted to release our
first vendored dependency of gRPC 1.13.1 since I was able to fix a good
number of the import/code completion errors that Intellij was experiencing.
I have published an example of what the jar/pom looks like in the Apache
Staging repo:
https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-vendor-grpc-1_13_1/

You can also checkout[1] and from a clean workspace run:
./gradlew :beam-vendor-grpc-1_13_1:publishToMavenLocal -PisRelease
-PvendoredDependenciesOnly
which will build a vendored version of gRPC that is published to your local
maven repository. All the projects that depended on the gradle
beam-vendor-grpc-1_13_1 project are now pointing at the Maven artifact
org.apache.beam:beam-vendor-grpc-1_13_1:0.1

I was planning to follow the Apache Beam release process but only for this
specific artifact and start a vote thread if there aren't any concerns.

1:
https://github.com/lukecwik/incubator-beam/commit/4b1b7b40ef316559f81c42dfdd44da988db201e9


On Thu, Oct 25, 2018 at 10:59 AM Lukasz Cwik <lc...@google.com> wrote:

> Thats a good point Thomas, hadn't considered the lib/ case. I also am
> recommending what Thomas is suggesting as well.
>
> On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org>
> wrote:
>
>> On 25.10.18 19:23, Lukasz Cwik wrote:
>> >
>> >
>> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     Question: How would a user end up with the same shaded dependency
>> >     twice?
>> >     The shaded dependencies are transitive dependencies of Beam and
>> thus,
>> >     this shouldn't happen. Is this a safe-guard when running different
>> >     versions of Beam in the same JVM?
>> >
>> >
>> > What I was referring to was that they aren't exactly the same
>> dependency
>> > but slightly different versions of the same dependency. Since we are
>> > planning to vendor each dependency and its transitive dependencies as
>> > part of the same jar, we can have  vendor-A that contains shaded
>> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both with
>> > different package prefixes. It can be that transitive-C 1.0 and
>> > transitive-C 2.0 can't be on the same classpath because they can't be
>> > perfectly shaded due to JNI, java reflection, magical property
>> > files/strings, ...
>> >
>>
>> Ah yes. Get it. Thanks!
>>
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
Thats a good point Thomas, hadn't considered the lib/ case. I also am
recommending what Thomas is suggesting as well.

On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:

> On 25.10.18 19:23, Lukasz Cwik wrote:
> >
> >
> > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Question: How would a user end up with the same shaded dependency
> >     twice?
> >     The shaded dependencies are transitive dependencies of Beam and thus,
> >     this shouldn't happen. Is this a safe-guard when running different
> >     versions of Beam in the same JVM?
> >
> >
> > What I was referring to was that they aren't exactly the same dependency
> > but slightly different versions of the same dependency. Since we are
> > planning to vendor each dependency and its transitive dependencies as
> > part of the same jar, we can have  vendor-A that contains shaded
> > transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both with
> > different package prefixes. It can be that transitive-C 1.0 and
> > transitive-C 2.0 can't be on the same classpath because they can't be
> > perfectly shaded due to JNI, java reflection, magical property
> > files/strings, ...
> >
>
> Ah yes. Get it. Thanks!
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Maximilian Michels <mx...@apache.org>.
On 25.10.18 19:23, Lukasz Cwik wrote:
> 
> 
> On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Question: How would a user end up with the same shaded dependency
>     twice?
>     The shaded dependencies are transitive dependencies of Beam and thus,
>     this shouldn't happen. Is this a safe-guard when running different
>     versions of Beam in the same JVM?
> 
> 
> What I was referring to was that they aren't exactly the same dependency 
> but slightly different versions of the same dependency. Since we are 
> planning to vendor each dependency and its transitive dependencies as 
> part of the same jar, we can have  vendor-A that contains shaded 
> transitive-C 1.0 and vendor-B that contains transitive-C 2.0 both with 
> different package prefixes. It can be that transitive-C 1.0 and 
> transitive-C 2.0 can't be on the same classpath because they can't be 
> perfectly shaded due to JNI, java reflection, magical property 
> files/strings, ...
> 

Ah yes. Get it. Thanks!

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mx...@apache.org> wrote:

> Question: How would a user end up with the same shaded dependency twice?
> The shaded dependencies are transitive dependencies of Beam and thus,
> this shouldn't happen. Is this a safe-guard when running different
> versions of Beam in the same JVM?
>

What I was referring to was that they aren't exactly the same dependency
but slightly different versions of the same dependency. Since we are
planning to vendor each dependency and its transitive dependencies as part
of the same jar, we can have  vendor-A that contains shaded transitive-C
1.0 and vendor-B that contains transitive-C 2.0 both with different package
prefixes. It can be that transitive-C 1.0 and transitive-C 2.0 can't be on
the same classpath because they can't be perfectly shaded due to JNI, java
reflection, magical property files/strings, ...


>
> Other than the search/replace issue, the full version string is probably
> more elegant and less error-prone, so I'm fine with that.
>
> For the groupid/artifact name I prefer:
>
> groupid: org.apache.beam.vendored
> artifact: guava-20_0
>
> On 24.10.18 21:53, Lukasz Cwik wrote:
> > On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <kenn@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     OK. I just opened https://github.com/apache/beam/pull/6809 to push
> >     Guava through. I made some comments there, and also I agree with
> >     Luke that full version string makes sense. For this purpose it seems
> >     easy and fine to do a search/replace to swap 20.0 for 20.1, and
> >     compatibility between them should not be a concern.
> >
> >     I have minor suggestions and clarifications:
> >
> >       - Is there value to `beam` in the artifactId? I would leave it off
> >     unless there's a special need
> >
> > It would only provide consistency with all our other artifactIds that we
> > publish but there isn't a special need that I'm aware of.
> >
> >       - Users should never use these and we make it extremely clear they
> >     are not supported for any reasons
> >
> >       - Use 0.x versions indicating no intention of semantic versioning
> >
> > I like this idea a lot.
> >
> >
> >     Bringing my comments and Luke's together, here's the proposal:
> >
> >          groupId: org.apache.beam
> >          artifactId: vendored-guava-20_0
> >          namespace: org.apache.beam.vendored.guava.v20_0
> >          version: 0.1
> >
> >     Alternatively it could be
> >
> >          groupId: org.apache.beam-vendored
> >          artifactid: guava-20_0
> >          namespace: org.apache.beam.vendored.guava.v20_0
> >          version: 0.1
> >
> >     I like the latter but I haven't gone through the process of
> >     establishing a new groupId.
> >
> > Based on
> > https://maven.apache.org/guides/mini/guide-naming-conventions.html, the
> > alternative groupId should be org.apache.beam.vendored and not
> > org.apache.beam-vendored
> > I slightly prefer org.apache.beam over org.apache.beam.vendored but not
> > enough to object to either choice as long as we maintain consistency for
> > all vendored dependencies we produce going forward.
> >
> >     And for now we do not publish source jars. A couple of TODOs to get
> >     the build in good shape (classifiers, jars, interaction with plugins)
> >
> >     Kenn
> >
> >
> >     On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lcwik@google.com
> >     <ma...@google.com>> wrote:
> >
> >         It looks like we are agreeing to make each vendored dependency
> >         self contained and have all their own internal dependencies
> >         packaged. For example, gRPC and all its transitive dependencies
> >         would use org.apache.beam.vendored.grpc.vYYY and Calcite and all
> >         its transitive dependencies would use
> >         org.apache.beam.vendored.calcite.vZZZ.
> >
> >         I also wanted to circle back on this question I had earlier that
> >         didn't have any follow-up:
> >         Currently we are relocating code depending on the version
> >         string. If the major version is >= 1, we use only the major
> >         version within the package string and rely on semantic
> >         versioning provided by the dependency to not break people. If
> >         the major version is 0, we assume the dependency is unstable and
> >         use the full version as part of the package string during
> >         relocation.
> >
> >         The downside of using the full version string for relocated
> >         packages:
> >         1) Users will end up with multiple copies of dependencies that
> >         differ only by the minor or patch version increasing the size of
> >         their application.
> >         2) Bumping up the version of a dependency now requires the
> >         import statement in all java files to be updated (not too
> >         difficult with some sed/grep skills)
> >
> >         The upside of using the full version string in the relocated
> >         package:
> >         1) We don't have to worry about whether a dependency maintains
> >         semantic versioning which means our users won't have to worry
> >         about that either.
> >         2) This increases the odds that a user will load multiple
> >         slightly different versions of the same dependency which is
> >         known to be incompatible in certain situations (e.g. Netty
> >         4.1.25 can't be on the classpath with Netty 4.1.28 even though
> >         they are both shaded due to issues of how JNI with tcnative
> works).
> >
> >         My preference would be to use the full version string for import
> >         statements (so org.apache.beam.vendor.grpc.v1_13_1...) since
> >         this would allow multiple copies to not conflict with each other
> >         since in my opinion it is a lot more difficult to help a user
> >         debug a dependency issue then to use string replacement during
> >         dependency upgrades to fix import statements. Also I would
> >         suggest we name the artifacts in Maven as follows:
> >         groupId: org.apache.beam
> >         artifactId: beam-vendor-grpc-v1_13_1
> >         version: 1.0.0 (first version and subsequent versions such as
> >         1.0.1 are only for patch upgrades that fix any shading issues we
> >         may have had when producing the vendored jar)
> >
> >
> >         On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels
> >         <mxm@apache.org <ma...@apache.org>> wrote:
> >
> >             Would also keep it simple and optimize for the JAR size only
> >             if necessary.
> >
> >             On 24.10.18 00:06, Kenneth Knowles wrote:
> >              > I think it makes sense for each vendored dependency to be
> >             self-contained
> >              > as much as possible. It should keep it fairly simple.
> >             Things that cross
> >              > their API surface cannot be hidden, of course. Jar size
> >             is not a concern
> >              > IMO.
> >              >
> >              > Kenn
> >              >
> >              > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik
> >             <lcwik@google.com <ma...@google.com>
> >              > <mailto:lcwik@google.com <ma...@google.com>>>
> wrote:
> >              >
> >              >     How should we handle the transitive dependencies of
> >             the things we
> >              >     want to vendor?
> >              >
> >              >     For example we use gRPC which depends on Guava 20 and
> >             we also use
> >              >     Calcite which depends on Guava 19.
> >              >
> >              >     Should the vendored gRPC/Calcite/... be
> >             self-contained so it
> >              >     contains all its dependencies, hence vendored gRPC
> >             would contain
> >              >     Guava 20 and vendored Calcite would contain Guava 19
> >             (both under
> >              >     different namespaces)?
> >              >     This leads to larger jars but less vendored
> >             dependencies to maintain.
> >              >
> >              >     Or should we produce a vendored library for those
> >             that we want to
> >              >     share, e.g. Guava 20 that could be reused across
> >             multiple vendored
> >              >     libraries?
> >              >     Makes the vendoring process slightly more
> >             complicated, more
> >              >     dependencies to maintain, smaller jars.
> >              >
> >              >     Or should we produce a vendored library for each
> >             dependency?
> >              >     Lots of vendoring needed, likely tooling required to
> >             be built to
> >              >     maintain this.
> >              >
> >              >
> >              >
> >              >
> >              >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles
> >             <klk@google.com <ma...@google.com>
> >              >     <mailto:klk@google.com <ma...@google.com>>>
> wrote:
> >              >
> >              >         I actually created the subtasks by finding things
> >             shaded by at
> >              >         least one module. I think each one should
> >             definitely have an on
> >              >         list discussion that clarifies the target
> >             artifact, namespace,
> >              >         version, possible complications, etc.
> >              >
> >              >         My impression is that many many modules shade
> >             only Guava. So for
> >              >         build time and simplification that is a big win.
> >              >
> >              >         Kenn
> >              >
> >              >         On Tue, Oct 23, 2018, 08:16 Thomas Weise
> >             <thw@apache.org <ma...@apache.org>
> >              >         <mailto:thw@apache.org <ma...@apache.org>>>
> >             wrote:
> >              >
> >              >             +1 for separate artifacts
> >              >
> >              >             I would request that we explicitly discuss
> >             and agree which
> >              >             dependencies we vendor though.
> >              >
> >              >             Not everything listed in the JIRA subtasks is
> >             currently
> >              >             relocated.
> >              >
> >              >             Thomas
> >              >
> >              >
> >              >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
> >              >             <david.moravek@gmail.com
> >             <ma...@gmail.com>
> >             <mailto:david.moravek@gmail.com
> >             <ma...@gmail.com>>>
> >              >             wrote:
> >              >
> >              >                 +1 This should improve build times a lot.
> >             It would be
> >              >                 great if vendored deps could stay in the
> >             main repository.
> >              >
> >              >                 D.
> >              >
> >              >                 On Tue, Oct 23, 2018 at 12:21 PM
> >             Maximilian Michels
> >              >                 <mxm@apache.org <ma...@apache.org>
> >             <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
> >              >
> >              >                     Looks great, Kenn!
> >              >
> >              >                      > Max: what is the story behind
> >             having a separate
> >              >                     flink-shaded repo? Did that make it
> >             easier to manage
> >              >                     in some way?
> >              >
> >              >                     Better separation of concerns, but I
> >             don't think
> >              >                     releasing the shaded
> >              >                     artifacts from the main repo is a
> >             problem. I'd even
> >              >                     prefer not to split
> >              >                     up the repo because it makes updates
> >             to the vendored
> >              >                     dependencies
> >              >                     slightly easier.
> >              >
> >              >                     On 23.10.18 03:25, Kenneth Knowles
> wrote:
> >              >                      > OK, I've filed
> >              > https://issues.apache.org/jira/browse/BEAM-5819 to
> >              >                      > collect sub-tasks. This has enough
> >             upsides
> >              >                     throughout lots of areas of
> >              >                      > the project that even though it is
> >             not glamorous
> >              >                     it seems pretty
> >              >                      > valuable to start on immediately.
> >             And I want to
> >              >                     find out if there's a
> >              >                      > pitfall lurking.
> >              >                      >
> >              >                      > Max: what is the story behind
> >             having a separate
> >              >                     flink-shaded repo? Did
> >              >                      > that make it easier to manage in
> >             some way?
> >              >                      >
> >              >                      > Kenn
> >              >                      >
> >              >                      > On Mon, Oct 22, 2018 at 2:55 AM
> >             Maximilian
> >              >                     Michels <mxm@apache.org
> >             <ma...@apache.org> <mailto:mxm@apache.org
> >             <ma...@apache.org>>
> >              >                      > <mailto:mxm@apache.org
> >             <ma...@apache.org> <mailto:mxm@apache.org
> >             <ma...@apache.org>>>>
> >              >                     wrote:
> >              >                      >
> >              >                      >     +1 for publishing vendored Jars
> >              >                     independently. It will improve build
> >              >                      >     time and ease IntelliJ
> >             integration.
> >              >                      >
> >              >                      >     Flink also publishes shaded
> >             dependencies
> >              >                     separately:
> >              >                      >
> >              >                      >     -
> >             https://github.com/apache/flink-shaded
> >              >                      >     -
> >              > https://issues.apache.org/jira/browse/FLINK-6529
> >              >                      >
> >              >                      >     AFAIK their main motivation
> >             was to get rid of
> >              >                     duplicate shaded classes
> >              >                      >     on the classpath. We don't
> >             appear to have
> >              >                     that problem because we
> >              >                      >     already have a separate
> >             "vendor" project.
> >              >                      >
> >              >                      >      >  - With shading, it is hard
> >             (impossible?)
> >              >                     to step into dependency
> >              >                      >     code in IntelliJ's debugger,
> >             because the
> >              >                     actual symbol at runtime
> >              >                      >     does not match what is in the
> >             external jars
> >              >                      >
> >              >                      >     This would be solved by
> >             releasing the sources
> >              >                     of the shaded jars.
> >              >                      >      From a
> >              >                      >     legal perspective, this could
> >             be problematic
> >              >                     as alluded to here:
> >              >                      >
> >             https://github.com/apache/flink-shaded/issues/25
> >              >                      >
> >              >                      >     -Max
> >              >                      >
> >              >                      >     On 20.10.18 01:11, Lukasz Cwik
> >             wrote:
> >              >                      >      > I have tried several times
> >             to improve the
> >              >                     build system and intellij
> >              >                      >      > integration and each
> >             attempt ended with
> >              >                     little progress when dealing
> >              >                      >      > with vendored code. My
> >             latest attempt has
> >              >                     been the most promising
> >              >                      >     where
> >              >                      >      > I take the vendored
> >             classes/jars and
> >              >                     decompile them generating the
> >              >                      >      > source that Intellij can
> >             then use. I have
> >              >                     a branch[1] that
> >              >                      >     demonstrates
> >              >                      >      > the idea. It works pretty
> >             well (and up
> >              >                     until a change where we
> >              >                      >     started
> >              >                      >      > vendoring gRPC, was
> >             impractical to do.
> >              >                     Instructions to try it out
> >              >                      >     are:
> >              >                      >      >
> >              >                      >      > // Clean up any remnants of
> >             prior
> >              >                     builds/intellij projects
> >              >                      >      > git clean -fdx
> >              >                      >      > // Generated the source for
> >              >                     vendored/shaded modules
> >              >                      >      > ./gradlew decompile
> >              >                      >      >
> >              >                      >      > // Remove the "generated"
> >             Java sources for
> >              >                     protos so they don't
> >              >                      >     conflict with the decompiled
> >             sources.
> >              >                      >      > rm -rf
> >              >
> >               model/pipeline/build/generated/source/proto
> >              >                      >      > rm -rf
> >              >
> >               model/job-management/build/generated/source/proto
> >              >                      >      > rm -rf
> >              >
> >               model/fn-execution/build/generated/source/proto
> >              >                      >      > // Import the project into
> >             Intellij, most
> >              >                     code completion now
> >              >                      >     works still some issues with a
> >             few classes.
> >              >                      >      > // Note that the Java
> >             decompiler doesn't
> >              >                     generate valid source so
> >              >                      >     still need to delegate to
> >             Gradle for
> >              >                     build/run/test actions
> >              >                      >      > // Other decompilers may do
> >             a better/worse
> >              >                     job but haven't tried
> >              >                      >     them.
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > The problems that I face
> >             are that the
> >              >                     generated Java source from the
> >              >                      >      > protos and the decompiled
> >             source from the
> >              >                     compiled version of that
> >              >                      >      > source post shading are
> >             both being
> >              >                     imported as content roots and
> >              >                      >     then
> >              >                      >      > conflict. Also, the CFR
> >             decompiler isn't
> >              >                     producing valid source, if
> >              >                      >      > people could try others and
> >             report their
> >              >                     mileage, we may find one
> >              >                      >     that
> >              >                      >      > works and then we would be
> >             able to use
> >              >                     intellij to build/run our
> >              >                      >     code
> >              >                      >      > and not need to delegate
> >             all our
> >              >                     build/run/test actions to Gradle.
> >              >                      >      >
> >              >                      >      > After all these attempts I
> >             have done,
> >              >                     vendoring the dependencies
> >              >                      >     outside
> >              >                      >      > of the project seems like a
> >             sane approach
> >              >                     and unless someone
> >              >                      >     wants to
> >              >                      >      > take a stab at the best
> >             progress I have
> >              >                     made above, I would go
> >              >                      >     with what
> >              >                      >      > Kenn is suggesting even
> >             though it will
> >              >                     mean that we will need to
> >              >                      >     perform
> >              >                      >      > releases every time we want
> >             to change the
> >              >                     version of one of our
> >              >                      >     vendored
> >              >                      >      > dependencies.
> >              >                      >      >
> >              >                      >      > 1:
> >              > https://github.com/lukecwik/incubator-beam/tree/intellij
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > On Fri, Oct 19, 2018 at
> >             10:43 AM Kenneth
> >              >                     Knowles <kenn@apache.org
> >             <ma...@apache.org> <mailto:kenn@apache.org
> >             <ma...@apache.org>>
> >              >                      >     <mailto:kenn@apache.org
> >             <ma...@apache.org> <mailto:kenn@apache.org
> >             <ma...@apache.org>>>
> >              >                      >      > <mailto:kenn@apache.org
> >             <ma...@apache.org>
> >              >                     <mailto:kenn@apache.org
> >             <ma...@apache.org>> <mailto:kenn@apache.org
> >             <ma...@apache.org>
> >              >                     <mailto:kenn@apache.org
> >             <ma...@apache.org>>>>> wrote:
> >              >                      >      >
> >              >                      >      >     Another reason to push
> >             on this is to
> >              >                     get build times down.
> >              >                      >     Once only
> >              >                      >      >     generated proto classes
> >             use the shadow
> >              >                     plugin we'll cut the build
> >              >                      >      >     time in ~half? And
> >             there is no reason
> >              >                     to constantly re-vendor.
> >              >                      >      >
> >              >                      >      >     Kenn
> >              >                      >      >
> >              >                      >      >     On Fri, Oct 19, 2018 at
> >             10:39 AM
> >              >                     Kenneth Knowles
> >              >                      >     <klk@google.com
> >             <ma...@google.com> <mailto:klk@google.com
> >             <ma...@google.com>>
> >              >                     <mailto:klk@google.com
> >             <ma...@google.com> <mailto:klk@google.com
> >             <ma...@google.com>>>
> >              >                      >      >     <mailto:klk@google.com
> >             <ma...@google.com>
> >              >                     <mailto:klk@google.com
> >             <ma...@google.com>> <mailto:klk@google.com
> >             <ma...@google.com>
> >              >                     <mailto:klk@google.com
> >             <ma...@google.com>>>>> wrote:
> >              >                      >      >
> >              >                      >      >         Hi all,
> >              >                      >      >
> >              >                      >      >         A while ago we had
> >             pretty good
> >              >                     consensus that we should
> >              >                      >     vendor
> >              >                      >      >         ("pre-shade")
> specific
> >              >                     dependencies, and there's start on it
> >              >                      >      >         here:
> >              > https://github.com/apache/beam/tree/master/vendor
> >              >                      >      >
> >              >                      >      >         IntelliJ notes:
> >              >                      >      >
> >              >                      >      >           - With shading,
> >             it is hard
> >              >                     (impossible?) to step into
> >              >                      >      >         dependency code in
> >             IntelliJ's
> >              >                     debugger, because the actual
> >              >                      >      >         symbol at runtime
> >             does not match
> >              >                     what is in the external jars
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > Intellij can step through
> >             the classes if
> >              >                     they were published
> >              >                      >     outside the
> >              >                      >      > project since it can
> >             decompile them. The
> >              >                     source won't be legible.
> >              >                      >      > Decompiling the source as
> >             in my example
> >              >                     effectively shows the
> >              >                      >     same issue.
> >              >                      >      >
> >              >                      >      >           - With vendoring,
> >             if the
> >              >                     vendored dependencies are part
> >              >                      >     of the
> >              >                      >      >         project then
> >             IntelliJ gets
> >              >                     confused because it operates on
> >              >                      >      >         source, not the
> >             produced jars
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > Yes, I tried several ways
> >             to get intellij
> >              >                     to ignore the source
> >              >                      >     and use
> >              >                      >      > the output jars but no luck.
> >              >                      >      >
> >              >                      >      >         The second one has
> >             a quick fix for
> >              >                     most cases*: don't
> >              >                      >     make the
> >              >                      >      >         vendored dep a
> >             subproject, but
> >              >                     just separately build and
> >              >                      >     publish
> >              >                      >      >         it. Since a
> >             vendored dependency
> >              >                     should change much more
> >              >                      >      >         infrequently (or if
> >             we bake the
> >              >                     version into the name,
> >              >                      >     ~never)
> >              >                      >      >         this means we
> >             publish once and
> >              >                     save headache and build
> >              >                      >     time forever.
> >              >                      >      >
> >              >                      >      >         WDYT? Have I
> >             overlooked something?
> >              >                     How about we set up
> >              >                      >     vendored
> >              >                      >      >         versions of guava,
> >             protobuf, grpc,
> >              >                     and publish them. We don't
> >              >                      >      >         have to actually
> >             start using them
> >              >                     yet, and can do it
> >              >                      >     incrementally.
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > Currently we are relocating
> >             code depending
> >              >                     on the version string.
> >              >                      >     If the
> >              >                      >      > major version is >= 1, we
> >             use only the
> >              >                     major version within the
> >              >                      >     package
> >              >                      >      > string and rely on semantic
> >             versioning
> >              >                     provided by the dependency
> >              >                      >     to not
> >              >                      >      > break people. If the major
> >             version is 0,
> >              >                     we assume the dependency is
> >              >                      >      > unstable and use the full
> >             version as part
> >              >                     of the package string
> >              >                      >     during
> >              >                      >      > relocation.
> >              >                      >      >
> >              >                      >      > The downside of using the
> >             full version
> >              >                     string for relocated packages:
> >              >                      >      > 1) Users will end up with
> >             multiple copies
> >              >                     of dependencies that
> >              >                      >     differ
> >              >                      >      > only by the minor or patch
> >             version
> >              >                     increasing the size of their
> >              >                      >     application.
> >              >                      >      > 2) Bumping up the version
> >             of a dependency
> >              >                     now requires the import
> >              >                      >      > statement in all java files
> >             to be updated
> >              >                     (not too difficult with
> >              >                      >     some
> >              >                      >      > sed/grep skills)
> >              >                      >      >
> >              >                      >      > The upside of using the
> >             full version
> >              >                     string in the relocated package:
> >              >                      >      > 1) We don't have to worry
> >             about whether a
> >              >                     dependency maintains
> >              >                      >     semantic
> >              >                      >      > versioning which means our
> >             users won't
> >              >                     have to worry about that
> >              >                      >     either.
> >              >                      >      > 2) This increases the odds
> >             that a user
> >              >                     will load multiple slightly
> >              >                      >      > different versions of the
> >             same dependency
> >              >                     which is known to be
> >              >                      >      > incompatible in certain
> >             situations (e.g.
> >              >                     Netty 4.1.25 can't be on
> >              >                      >     the
> >              >                      >      > classpath with Netty 4.1.28
> >             even though
> >              >                     they are both shaded due to
> >              >                      >      > issues of how JNI with
> >             tcnative works).
> >              >                      >      >
> >              >                      >      >
> >              >                      >      >         (side note: what do
> >             other projects
> >              >                     like Flink do?)
> >              >                      >      >
> >              >                      >      >         Kenn
> >              >                      >      >
> >              >                      >      >         *for generated
> >             proto classes, they
> >              >                     need to be altered after
> >              >                      >      >         being generated so
> >             shading happens
> >              >                     there, but actually only
> >              >                      >      >         relocation and the
> >             shared vendored
> >              >                     dep should work
> >              >                      >     elsewhere in
> >              >                      >      >         the project
> >              >                      >      >
> >              >                      >      >
> >              >                      >      > We could publish the protos
> >             and treat them
> >              >                     as "external"
> >              >                      >     dependencies
> >              >                      >      > within the Java projects
> >             which would also
> >              >                     remove this pain point.
> >              >                      >
> >              >
> >
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Maximilian Michels <mx...@apache.org>.
Question: How would a user end up with the same shaded dependency twice? 
The shaded dependencies are transitive dependencies of Beam and thus, 
this shouldn't happen. Is this a safe-guard when running different 
versions of Beam in the same JVM?

Other than the search/replace issue, the full version string is probably 
more elegant and less error-prone, so I'm fine with that.

For the groupid/artifact name I prefer:

groupid: org.apache.beam.vendored
artifact: guava-20_0

On 24.10.18 21:53, Lukasz Cwik wrote:
> On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <kenn@apache.org 
> <ma...@apache.org>> wrote:
> 
>     OK. I just opened https://github.com/apache/beam/pull/6809 to push
>     Guava through. I made some comments there, and also I agree with
>     Luke that full version string makes sense. For this purpose it seems
>     easy and fine to do a search/replace to swap 20.0 for 20.1, and
>     compatibility between them should not be a concern.
> 
>     I have minor suggestions and clarifications:
> 
>       - Is there value to `beam` in the artifactId? I would leave it off
>     unless there's a special need
> 
> It would only provide consistency with all our other artifactIds that we 
> publish but there isn't a special need that I'm aware of.
> 
>       - Users should never use these and we make it extremely clear they
>     are not supported for any reasons
> 
>       - Use 0.x versions indicating no intention of semantic versioning
> 
> I like this idea a lot.
> 
> 
>     Bringing my comments and Luke's together, here's the proposal:
> 
>          groupId: org.apache.beam
>          artifactId: vendored-guava-20_0
>          namespace: org.apache.beam.vendored.guava.v20_0
>          version: 0.1
> 
>     Alternatively it could be
> 
>          groupId: org.apache.beam-vendored
>          artifactid: guava-20_0
>          namespace: org.apache.beam.vendored.guava.v20_0
>          version: 0.1
> 
>     I like the latter but I haven't gone through the process of
>     establishing a new groupId.
> 
> Based on 
> https://maven.apache.org/guides/mini/guide-naming-conventions.html, the 
> alternative groupId should be org.apache.beam.vendored and not 
> org.apache.beam-vendored
> I slightly prefer org.apache.beam over org.apache.beam.vendored but not 
> enough to object to either choice as long as we maintain consistency for 
> all vendored dependencies we produce going forward.
> 
>     And for now we do not publish source jars. A couple of TODOs to get
>     the build in good shape (classifiers, jars, interaction with plugins)
> 
>     Kenn
> 
> 
>     On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lcwik@google.com
>     <ma...@google.com>> wrote:
> 
>         It looks like we are agreeing to make each vendored dependency
>         self contained and have all their own internal dependencies
>         packaged. For example, gRPC and all its transitive dependencies
>         would use org.apache.beam.vendored.grpc.vYYY and Calcite and all
>         its transitive dependencies would use
>         org.apache.beam.vendored.calcite.vZZZ.
> 
>         I also wanted to circle back on this question I had earlier that
>         didn't have any follow-up:
>         Currently we are relocating code depending on the version
>         string. If the major version is >= 1, we use only the major
>         version within the package string and rely on semantic
>         versioning provided by the dependency to not break people. If
>         the major version is 0, we assume the dependency is unstable and
>         use the full version as part of the package string during
>         relocation.
> 
>         The downside of using the full version string for relocated
>         packages:
>         1) Users will end up with multiple copies of dependencies that
>         differ only by the minor or patch version increasing the size of
>         their application.
>         2) Bumping up the version of a dependency now requires the
>         import statement in all java files to be updated (not too
>         difficult with some sed/grep skills)
> 
>         The upside of using the full version string in the relocated
>         package:
>         1) We don't have to worry about whether a dependency maintains
>         semantic versioning which means our users won't have to worry
>         about that either.
>         2) This increases the odds that a user will load multiple
>         slightly different versions of the same dependency which is
>         known to be incompatible in certain situations (e.g. Netty
>         4.1.25 can't be on the classpath with Netty 4.1.28 even though
>         they are both shaded due to issues of how JNI with tcnative works).
> 
>         My preference would be to use the full version string for import
>         statements (so org.apache.beam.vendor.grpc.v1_13_1...) since
>         this would allow multiple copies to not conflict with each other
>         since in my opinion it is a lot more difficult to help a user
>         debug a dependency issue then to use string replacement during
>         dependency upgrades to fix import statements. Also I would
>         suggest we name the artifacts in Maven as follows:
>         groupId: org.apache.beam
>         artifactId: beam-vendor-grpc-v1_13_1
>         version: 1.0.0 (first version and subsequent versions such as
>         1.0.1 are only for patch upgrades that fix any shading issues we
>         may have had when producing the vendored jar)
> 
> 
>         On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels
>         <mxm@apache.org <ma...@apache.org>> wrote:
> 
>             Would also keep it simple and optimize for the JAR size only
>             if necessary.
> 
>             On 24.10.18 00:06, Kenneth Knowles wrote:
>              > I think it makes sense for each vendored dependency to be
>             self-contained
>              > as much as possible. It should keep it fairly simple.
>             Things that cross
>              > their API surface cannot be hidden, of course. Jar size
>             is not a concern
>              > IMO.
>              >
>              > Kenn
>              >
>              > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik
>             <lcwik@google.com <ma...@google.com>
>              > <mailto:lcwik@google.com <ma...@google.com>>> wrote:
>              >
>              >     How should we handle the transitive dependencies of
>             the things we
>              >     want to vendor?
>              >
>              >     For example we use gRPC which depends on Guava 20 and
>             we also use
>              >     Calcite which depends on Guava 19.
>              >
>              >     Should the vendored gRPC/Calcite/... be
>             self-contained so it
>              >     contains all its dependencies, hence vendored gRPC
>             would contain
>              >     Guava 20 and vendored Calcite would contain Guava 19
>             (both under
>              >     different namespaces)?
>              >     This leads to larger jars but less vendored
>             dependencies to maintain.
>              >
>              >     Or should we produce a vendored library for those
>             that we want to
>              >     share, e.g. Guava 20 that could be reused across
>             multiple vendored
>              >     libraries?
>              >     Makes the vendoring process slightly more
>             complicated, more
>              >     dependencies to maintain, smaller jars.
>              >
>              >     Or should we produce a vendored library for each
>             dependency?
>              >     Lots of vendoring needed, likely tooling required to
>             be built to
>              >     maintain this.
>              >
>              >
>              >
>              >
>              >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles
>             <klk@google.com <ma...@google.com>
>              >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>              >
>              >         I actually created the subtasks by finding things
>             shaded by at
>              >         least one module. I think each one should
>             definitely have an on
>              >         list discussion that clarifies the target
>             artifact, namespace,
>              >         version, possible complications, etc.
>              >
>              >         My impression is that many many modules shade
>             only Guava. So for
>              >         build time and simplification that is a big win.
>              >
>              >         Kenn
>              >
>              >         On Tue, Oct 23, 2018, 08:16 Thomas Weise
>             <thw@apache.org <ma...@apache.org>
>              >         <mailto:thw@apache.org <ma...@apache.org>>>
>             wrote:
>              >
>              >             +1 for separate artifacts
>              >
>              >             I would request that we explicitly discuss
>             and agree which
>              >             dependencies we vendor though.
>              >
>              >             Not everything listed in the JIRA subtasks is
>             currently
>              >             relocated.
>              >
>              >             Thomas
>              >
>              >
>              >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>              >             <david.moravek@gmail.com
>             <ma...@gmail.com>
>             <mailto:david.moravek@gmail.com
>             <ma...@gmail.com>>>
>              >             wrote:
>              >
>              >                 +1 This should improve build times a lot.
>             It would be
>              >                 great if vendored deps could stay in the
>             main repository.
>              >
>              >                 D.
>              >
>              >                 On Tue, Oct 23, 2018 at 12:21 PM
>             Maximilian Michels
>              >                 <mxm@apache.org <ma...@apache.org>
>             <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
>              >
>              >                     Looks great, Kenn!
>              >
>              >                      > Max: what is the story behind
>             having a separate
>              >                     flink-shaded repo? Did that make it
>             easier to manage
>              >                     in some way?
>              >
>              >                     Better separation of concerns, but I
>             don't think
>              >                     releasing the shaded
>              >                     artifacts from the main repo is a
>             problem. I'd even
>              >                     prefer not to split
>              >                     up the repo because it makes updates
>             to the vendored
>              >                     dependencies
>              >                     slightly easier.
>              >
>              >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>              >                      > OK, I've filed
>              > https://issues.apache.org/jira/browse/BEAM-5819 to
>              >                      > collect sub-tasks. This has enough
>             upsides
>              >                     throughout lots of areas of
>              >                      > the project that even though it is
>             not glamorous
>              >                     it seems pretty
>              >                      > valuable to start on immediately.
>             And I want to
>              >                     find out if there's a
>              >                      > pitfall lurking.
>              >                      >
>              >                      > Max: what is the story behind
>             having a separate
>              >                     flink-shaded repo? Did
>              >                      > that make it easier to manage in
>             some way?
>              >                      >
>              >                      > Kenn
>              >                      >
>              >                      > On Mon, Oct 22, 2018 at 2:55 AM
>             Maximilian
>              >                     Michels <mxm@apache.org
>             <ma...@apache.org> <mailto:mxm@apache.org
>             <ma...@apache.org>>
>              >                      > <mailto:mxm@apache.org
>             <ma...@apache.org> <mailto:mxm@apache.org
>             <ma...@apache.org>>>>
>              >                     wrote:
>              >                      >
>              >                      >     +1 for publishing vendored Jars
>              >                     independently. It will improve build
>              >                      >     time and ease IntelliJ
>             integration.
>              >                      >
>              >                      >     Flink also publishes shaded
>             dependencies
>              >                     separately:
>              >                      >
>              >                      >     -
>             https://github.com/apache/flink-shaded
>              >                      >     -
>              > https://issues.apache.org/jira/browse/FLINK-6529
>              >                      >
>              >                      >     AFAIK their main motivation
>             was to get rid of
>              >                     duplicate shaded classes
>              >                      >     on the classpath. We don't
>             appear to have
>              >                     that problem because we
>              >                      >     already have a separate
>             "vendor" project.
>              >                      >
>              >                      >      >  - With shading, it is hard
>             (impossible?)
>              >                     to step into dependency
>              >                      >     code in IntelliJ's debugger,
>             because the
>              >                     actual symbol at runtime
>              >                      >     does not match what is in the
>             external jars
>              >                      >
>              >                      >     This would be solved by
>             releasing the sources
>              >                     of the shaded jars.
>              >                      >      From a
>              >                      >     legal perspective, this could
>             be problematic
>              >                     as alluded to here:
>              >                      >
>             https://github.com/apache/flink-shaded/issues/25
>              >                      >
>              >                      >     -Max
>              >                      >
>              >                      >     On 20.10.18 01:11, Lukasz Cwik
>             wrote:
>              >                      >      > I have tried several times
>             to improve the
>              >                     build system and intellij
>              >                      >      > integration and each
>             attempt ended with
>              >                     little progress when dealing
>              >                      >      > with vendored code. My
>             latest attempt has
>              >                     been the most promising
>              >                      >     where
>              >                      >      > I take the vendored
>             classes/jars and
>              >                     decompile them generating the
>              >                      >      > source that Intellij can
>             then use. I have
>              >                     a branch[1] that
>              >                      >     demonstrates
>              >                      >      > the idea. It works pretty
>             well (and up
>              >                     until a change where we
>              >                      >     started
>              >                      >      > vendoring gRPC, was
>             impractical to do.
>              >                     Instructions to try it out
>              >                      >     are:
>              >                      >      >
>              >                      >      > // Clean up any remnants of
>             prior
>              >                     builds/intellij projects
>              >                      >      > git clean -fdx
>              >                      >      > // Generated the source for
>              >                     vendored/shaded modules
>              >                      >      > ./gradlew decompile
>              >                      >      >
>              >                      >      > // Remove the "generated"
>             Java sources for
>              >                     protos so they don't
>              >                      >     conflict with the decompiled
>             sources.
>              >                      >      > rm -rf
>              >                   
>               model/pipeline/build/generated/source/proto
>              >                      >      > rm -rf
>              >                   
>               model/job-management/build/generated/source/proto
>              >                      >      > rm -rf
>              >                   
>               model/fn-execution/build/generated/source/proto
>              >                      >      > // Import the project into
>             Intellij, most
>              >                     code completion now
>              >                      >     works still some issues with a
>             few classes.
>              >                      >      > // Note that the Java
>             decompiler doesn't
>              >                     generate valid source so
>              >                      >     still need to delegate to
>             Gradle for
>              >                     build/run/test actions
>              >                      >      > // Other decompilers may do
>             a better/worse
>              >                     job but haven't tried
>              >                      >     them.
>              >                      >      >
>              >                      >      >
>              >                      >      > The problems that I face
>             are that the
>              >                     generated Java source from the
>              >                      >      > protos and the decompiled
>             source from the
>              >                     compiled version of that
>              >                      >      > source post shading are
>             both being
>              >                     imported as content roots and
>              >                      >     then
>              >                      >      > conflict. Also, the CFR
>             decompiler isn't
>              >                     producing valid source, if
>              >                      >      > people could try others and
>             report their
>              >                     mileage, we may find one
>              >                      >     that
>              >                      >      > works and then we would be
>             able to use
>              >                     intellij to build/run our
>              >                      >     code
>              >                      >      > and not need to delegate
>             all our
>              >                     build/run/test actions to Gradle.
>              >                      >      >
>              >                      >      > After all these attempts I
>             have done,
>              >                     vendoring the dependencies
>              >                      >     outside
>              >                      >      > of the project seems like a
>             sane approach
>              >                     and unless someone
>              >                      >     wants to
>              >                      >      > take a stab at the best
>             progress I have
>              >                     made above, I would go
>              >                      >     with what
>              >                      >      > Kenn is suggesting even
>             though it will
>              >                     mean that we will need to
>              >                      >     perform
>              >                      >      > releases every time we want
>             to change the
>              >                     version of one of our
>              >                      >     vendored
>              >                      >      > dependencies.
>              >                      >      >
>              >                      >      > 1:
>              > https://github.com/lukecwik/incubator-beam/tree/intellij
>              >                      >      >
>              >                      >      >
>              >                      >      > On Fri, Oct 19, 2018 at
>             10:43 AM Kenneth
>              >                     Knowles <kenn@apache.org
>             <ma...@apache.org> <mailto:kenn@apache.org
>             <ma...@apache.org>>
>              >                      >     <mailto:kenn@apache.org
>             <ma...@apache.org> <mailto:kenn@apache.org
>             <ma...@apache.org>>>
>              >                      >      > <mailto:kenn@apache.org
>             <ma...@apache.org>
>              >                     <mailto:kenn@apache.org
>             <ma...@apache.org>> <mailto:kenn@apache.org
>             <ma...@apache.org>
>              >                     <mailto:kenn@apache.org
>             <ma...@apache.org>>>>> wrote:
>              >                      >      >
>              >                      >      >     Another reason to push
>             on this is to
>              >                     get build times down.
>              >                      >     Once only
>              >                      >      >     generated proto classes
>             use the shadow
>              >                     plugin we'll cut the build
>              >                      >      >     time in ~half? And
>             there is no reason
>              >                     to constantly re-vendor.
>              >                      >      >
>              >                      >      >     Kenn
>              >                      >      >
>              >                      >      >     On Fri, Oct 19, 2018 at
>             10:39 AM
>              >                     Kenneth Knowles
>              >                      >     <klk@google.com
>             <ma...@google.com> <mailto:klk@google.com
>             <ma...@google.com>>
>              >                     <mailto:klk@google.com
>             <ma...@google.com> <mailto:klk@google.com
>             <ma...@google.com>>>
>              >                      >      >     <mailto:klk@google.com
>             <ma...@google.com>
>              >                     <mailto:klk@google.com
>             <ma...@google.com>> <mailto:klk@google.com
>             <ma...@google.com>
>              >                     <mailto:klk@google.com
>             <ma...@google.com>>>>> wrote:
>              >                      >      >
>              >                      >      >         Hi all,
>              >                      >      >
>              >                      >      >         A while ago we had
>             pretty good
>              >                     consensus that we should
>              >                      >     vendor
>              >                      >      >         ("pre-shade") specific
>              >                     dependencies, and there's start on it
>              >                      >      >         here:
>              > https://github.com/apache/beam/tree/master/vendor
>              >                      >      >
>              >                      >      >         IntelliJ notes:
>              >                      >      >
>              >                      >      >           - With shading,
>             it is hard
>              >                     (impossible?) to step into
>              >                      >      >         dependency code in
>             IntelliJ's
>              >                     debugger, because the actual
>              >                      >      >         symbol at runtime
>             does not match
>              >                     what is in the external jars
>              >                      >      >
>              >                      >      >
>              >                      >      > Intellij can step through
>             the classes if
>              >                     they were published
>              >                      >     outside the
>              >                      >      > project since it can
>             decompile them. The
>              >                     source won't be legible.
>              >                      >      > Decompiling the source as
>             in my example
>              >                     effectively shows the
>              >                      >     same issue.
>              >                      >      >
>              >                      >      >           - With vendoring,
>             if the
>              >                     vendored dependencies are part
>              >                      >     of the
>              >                      >      >         project then
>             IntelliJ gets
>              >                     confused because it operates on
>              >                      >      >         source, not the
>             produced jars
>              >                      >      >
>              >                      >      >
>              >                      >      > Yes, I tried several ways
>             to get intellij
>              >                     to ignore the source
>              >                      >     and use
>              >                      >      > the output jars but no luck.
>              >                      >      >
>              >                      >      >         The second one has
>             a quick fix for
>              >                     most cases*: don't
>              >                      >     make the
>              >                      >      >         vendored dep a
>             subproject, but
>              >                     just separately build and
>              >                      >     publish
>              >                      >      >         it. Since a
>             vendored dependency
>              >                     should change much more
>              >                      >      >         infrequently (or if
>             we bake the
>              >                     version into the name,
>              >                      >     ~never)
>              >                      >      >         this means we
>             publish once and
>              >                     save headache and build
>              >                      >     time forever.
>              >                      >      >
>              >                      >      >         WDYT? Have I
>             overlooked something?
>              >                     How about we set up
>              >                      >     vendored
>              >                      >      >         versions of guava,
>             protobuf, grpc,
>              >                     and publish them. We don't
>              >                      >      >         have to actually
>             start using them
>              >                     yet, and can do it
>              >                      >     incrementally.
>              >                      >      >
>              >                      >      >
>              >                      >      > Currently we are relocating
>             code depending
>              >                     on the version string.
>              >                      >     If the
>              >                      >      > major version is >= 1, we
>             use only the
>              >                     major version within the
>              >                      >     package
>              >                      >      > string and rely on semantic
>             versioning
>              >                     provided by the dependency
>              >                      >     to not
>              >                      >      > break people. If the major
>             version is 0,
>              >                     we assume the dependency is
>              >                      >      > unstable and use the full
>             version as part
>              >                     of the package string
>              >                      >     during
>              >                      >      > relocation.
>              >                      >      >
>              >                      >      > The downside of using the
>             full version
>              >                     string for relocated packages:
>              >                      >      > 1) Users will end up with
>             multiple copies
>              >                     of dependencies that
>              >                      >     differ
>              >                      >      > only by the minor or patch
>             version
>              >                     increasing the size of their
>              >                      >     application.
>              >                      >      > 2) Bumping up the version
>             of a dependency
>              >                     now requires the import
>              >                      >      > statement in all java files
>             to be updated
>              >                     (not too difficult with
>              >                      >     some
>              >                      >      > sed/grep skills)
>              >                      >      >
>              >                      >      > The upside of using the
>             full version
>              >                     string in the relocated package:
>              >                      >      > 1) We don't have to worry
>             about whether a
>              >                     dependency maintains
>              >                      >     semantic
>              >                      >      > versioning which means our
>             users won't
>              >                     have to worry about that
>              >                      >     either.
>              >                      >      > 2) This increases the odds
>             that a user
>              >                     will load multiple slightly
>              >                      >      > different versions of the
>             same dependency
>              >                     which is known to be
>              >                      >      > incompatible in certain
>             situations (e.g.
>              >                     Netty 4.1.25 can't be on
>              >                      >     the
>              >                      >      > classpath with Netty 4.1.28
>             even though
>              >                     they are both shaded due to
>              >                      >      > issues of how JNI with
>             tcnative works).
>              >                      >      >
>              >                      >      >
>              >                      >      >         (side note: what do
>             other projects
>              >                     like Flink do?)
>              >                      >      >
>              >                      >      >         Kenn
>              >                      >      >
>              >                      >      >         *for generated
>             proto classes, they
>              >                     need to be altered after
>              >                      >      >         being generated so
>             shading happens
>              >                     there, but actually only
>              >                      >      >         relocation and the
>             shared vendored
>              >                     dep should work
>              >                      >     elsewhere in
>              >                      >      >         the project
>              >                      >      >
>              >                      >      >
>              >                      >      > We could publish the protos
>             and treat them
>              >                     as "external"
>              >                      >     dependencies
>              >                      >      > within the Java projects
>             which would also
>              >                     remove this pain point.
>              >                      >
>              >
> 

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <ke...@apache.org> wrote:

> OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
> through. I made some comments there, and also I agree with Luke that full
> version string makes sense. For this purpose it seems easy and fine to do a
> search/replace to swap 20.0 for 20.1, and compatibility between them should
> not be a concern.
>
> I have minor suggestions and clarifications:
>
>  - Is there value to `beam` in the artifactId? I would leave it off unless
> there's a special need
>
It would only provide consistency with all our other artifactIds that we
publish but there isn't a special need that I'm aware of.


>  - Users should never use these and we make it extremely clear they are
> not supported for any reasons
>
 - Use 0.x versions indicating no intention of semantic versioning
>
I like this idea a lot.


>
> Bringing my comments and Luke's together, here's the proposal:
>
>     groupId: org.apache.beam
>     artifactId: vendored-guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> Alternatively it could be
>
>     groupId: org.apache.beam-vendored
>     artifactid: guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> I like the latter but I haven't gone through the process of establishing a
> new groupId.
>
> Based on
https://maven.apache.org/guides/mini/guide-naming-conventions.html, the
alternative groupId should be org.apache.beam.vendored and not
org.apache.beam-vendored
I slightly prefer org.apache.beam over org.apache.beam.vendored but not
enough to object to either choice as long as we maintain consistency for
all vendored dependencies we produce going forward.


> And for now we do not publish source jars. A couple of TODOs to get the
> build in good shape (classifiers, jars, interaction with plugins)
>
> Kenn
>
>
> On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> It looks like we are agreeing to make each vendored dependency self
>> contained and have all their own internal dependencies packaged. For
>> example, gRPC and all its transitive dependencies would use
>> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
>> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>>
>> I also wanted to circle back on this question I had earlier that didn't
>> have any follow-up:
>> Currently we are relocating code depending on the version string. If the
>> major version is >= 1, we use only the major version within the package
>> string and rely on semantic versioning provided by the dependency to not
>> break people. If the major version is 0, we assume the dependency is
>> unstable and use the full version as part of the package string during
>> relocation.
>>
>> The downside of using the full version string for relocated packages:
>> 1) Users will end up with multiple copies of dependencies that differ
>> only by the minor or patch version increasing the size of their application.
>> 2) Bumping up the version of a dependency now requires the import
>> statement in all java files to be updated (not too difficult with some
>> sed/grep skills)
>>
>> The upside of using the full version string in the relocated package:
>> 1) We don't have to worry about whether a dependency maintains semantic
>> versioning which means our users won't have to worry about that either.
>> 2) This increases the odds that a user will load multiple slightly
>> different versions of the same dependency which is known to be incompatible
>> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
>> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
>> tcnative works).
>>
>> My preference would be to use the full version string for import
>> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
>> allow multiple copies to not conflict with each other since in my opinion
>> it is a lot more difficult to help a user debug a dependency issue then to
>> use string replacement during dependency upgrades to fix import statements.
>> Also I would suggest we name the artifacts in Maven as follows:
>> groupId: org.apache.beam
>> artifactId: beam-vendor-grpc-v1_13_1
>> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
>> only for patch upgrades that fix any shading issues we may have had when
>> producing the vendored jar)
>>
>>
>> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <mx...@apache.org>
>> wrote:
>>
>>> Would also keep it simple and optimize for the JAR size only if
>>> necessary.
>>>
>>> On 24.10.18 00:06, Kenneth Knowles wrote:
>>> > I think it makes sense for each vendored dependency to be
>>> self-contained
>>> > as much as possible. It should keep it fairly simple. Things that
>>> cross
>>> > their API surface cannot be hidden, of course. Jar size is not a
>>> concern
>>> > IMO.
>>> >
>>> > Kenn
>>> >
>>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lcwik@google.com
>>> > <ma...@google.com>> wrote:
>>> >
>>> >     How should we handle the transitive dependencies of the things we
>>> >     want to vendor?
>>> >
>>> >     For example we use gRPC which depends on Guava 20 and we also use
>>> >     Calcite which depends on Guava 19.
>>> >
>>> >     Should the vendored gRPC/Calcite/... be self-contained so it
>>> >     contains all its dependencies, hence vendored gRPC would contain
>>> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
>>> >     different namespaces)?
>>> >     This leads to larger jars but less vendored dependencies to
>>> maintain.
>>> >
>>> >     Or should we produce a vendored library for those that we want to
>>> >     share, e.g. Guava 20 that could be reused across multiple vendored
>>> >     libraries?
>>> >     Makes the vendoring process slightly more complicated, more
>>> >     dependencies to maintain, smaller jars.
>>> >
>>> >     Or should we produce a vendored library for each dependency?
>>> >     Lots of vendoring needed, likely tooling required to be built to
>>> >     maintain this.
>>> >
>>> >
>>> >
>>> >
>>> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <klk@google.com
>>> >     <ma...@google.com>> wrote:
>>> >
>>> >         I actually created the subtasks by finding things shaded by at
>>> >         least one module. I think each one should definitely have an on
>>> >         list discussion that clarifies the target artifact, namespace,
>>> >         version, possible complications, etc.
>>> >
>>> >         My impression is that many many modules shade only Guava. So
>>> for
>>> >         build time and simplification that is a big win.
>>> >
>>> >         Kenn
>>> >
>>> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <thw@apache.org
>>> >         <ma...@apache.org>> wrote:
>>> >
>>> >             +1 for separate artifacts
>>> >
>>> >             I would request that we explicitly discuss and agree which
>>> >             dependencies we vendor though.
>>> >
>>> >             Not everything listed in the JIRA subtasks is currently
>>> >             relocated.
>>> >
>>> >             Thomas
>>> >
>>> >
>>> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>>> >             <david.moravek@gmail.com <ma...@gmail.com>>
>>> >             wrote:
>>> >
>>> >                 +1 This should improve build times a lot. It would be
>>> >                 great if vendored deps could stay in the main
>>> repository.
>>> >
>>> >                 D.
>>> >
>>> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>>> >                 <mxm@apache.org <ma...@apache.org>> wrote:
>>> >
>>> >                     Looks great, Kenn!
>>> >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did that make it easier to
>>> manage
>>> >                     in some way?
>>> >
>>> >                     Better separation of concerns, but I don't think
>>> >                     releasing the shaded
>>> >                     artifacts from the main repo is a problem. I'd even
>>> >                     prefer not to split
>>> >                     up the repo because it makes updates to the
>>> vendored
>>> >                     dependencies
>>> >                     slightly easier.
>>> >
>>> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>>> >                      > OK, I've filed
>>> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
>>> >                      > collect sub-tasks. This has enough upsides
>>> >                     throughout lots of areas of
>>> >                      > the project that even though it is not glamorous
>>> >                     it seems pretty
>>> >                      > valuable to start on immediately. And I want to
>>> >                     find out if there's a
>>> >                      > pitfall lurking.
>>> >                      >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did
>>> >                      > that make it easier to manage in some way?
>>> >                      >
>>> >                      > Kenn
>>> >                      >
>>> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>>> >                     Michels <mxm@apache.org <ma...@apache.org>
>>> >                      > <mailto:mxm@apache.org <mailto:mxm@apache.org
>>> >>>
>>> >                     wrote:
>>> >                      >
>>> >                      >     +1 for publishing vendored Jars
>>> >                     independently. It will improve build
>>> >                      >     time and ease IntelliJ integration.
>>> >                      >
>>> >                      >     Flink also publishes shaded dependencies
>>> >                     separately:
>>> >                      >
>>> >                      >     - https://github.com/apache/flink-shaded
>>> >                      >     -
>>> >                     https://issues.apache.org/jira/browse/FLINK-6529
>>> >                      >
>>> >                      >     AFAIK their main motivation was to get rid
>>> of
>>> >                     duplicate shaded classes
>>> >                      >     on the classpath. We don't appear to have
>>> >                     that problem because we
>>> >                      >     already have a separate "vendor" project.
>>> >                      >
>>> >                      >      >  - With shading, it is hard (impossible?)
>>> >                     to step into dependency
>>> >                      >     code in IntelliJ's debugger, because the
>>> >                     actual symbol at runtime
>>> >                      >     does not match what is in the external jars
>>> >                      >
>>> >                      >     This would be solved by releasing the
>>> sources
>>> >                     of the shaded jars.
>>> >                      >      From a
>>> >                      >     legal perspective, this could be problematic
>>> >                     as alluded to here:
>>> >                      >
>>> https://github.com/apache/flink-shaded/issues/25
>>> >                      >
>>> >                      >     -Max
>>> >                      >
>>> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>> >                      >      > I have tried several times to improve the
>>> >                     build system and intellij
>>> >                      >      > integration and each attempt ended with
>>> >                     little progress when dealing
>>> >                      >      > with vendored code. My latest attempt has
>>> >                     been the most promising
>>> >                      >     where
>>> >                      >      > I take the vendored classes/jars and
>>> >                     decompile them generating the
>>> >                      >      > source that Intellij can then use. I have
>>> >                     a branch[1] that
>>> >                      >     demonstrates
>>> >                      >      > the idea. It works pretty well (and up
>>> >                     until a change where we
>>> >                      >     started
>>> >                      >      > vendoring gRPC, was impractical to do.
>>> >                     Instructions to try it out
>>> >                      >     are:
>>> >                      >      >
>>> >                      >      > // Clean up any remnants of prior
>>> >                     builds/intellij projects
>>> >                      >      > git clean -fdx
>>> >                      >      > // Generated the source for
>>> >                     vendored/shaded modules
>>> >                      >      > ./gradlew decompile
>>> >                      >      >
>>> >                      >      > // Remove the "generated" Java sources
>>> for
>>> >                     protos so they don't
>>> >                      >     conflict with the decompiled sources.
>>> >                      >      > rm -rf
>>> >                     model/pipeline/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/job-management/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/fn-execution/build/generated/source/proto
>>> >                      >      > // Import the project into Intellij, most
>>> >                     code completion now
>>> >                      >     works still some issues with a few classes.
>>> >                      >      > // Note that the Java decompiler doesn't
>>> >                     generate valid source so
>>> >                      >     still need to delegate to Gradle for
>>> >                     build/run/test actions
>>> >                      >      > // Other decompilers may do a
>>> better/worse
>>> >                     job but haven't tried
>>> >                      >     them.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > The problems that I face are that the
>>> >                     generated Java source from the
>>> >                      >      > protos and the decompiled source from the
>>> >                     compiled version of that
>>> >                      >      > source post shading are both being
>>> >                     imported as content roots and
>>> >                      >     then
>>> >                      >      > conflict. Also, the CFR decompiler isn't
>>> >                     producing valid source, if
>>> >                      >      > people could try others and report their
>>> >                     mileage, we may find one
>>> >                      >     that
>>> >                      >      > works and then we would be able to use
>>> >                     intellij to build/run our
>>> >                      >     code
>>> >                      >      > and not need to delegate all our
>>> >                     build/run/test actions to Gradle.
>>> >                      >      >
>>> >                      >      > After all these attempts I have done,
>>> >                     vendoring the dependencies
>>> >                      >     outside
>>> >                      >      > of the project seems like a sane approach
>>> >                     and unless someone
>>> >                      >     wants to
>>> >                      >      > take a stab at the best progress I have
>>> >                     made above, I would go
>>> >                      >     with what
>>> >                      >      > Kenn is suggesting even though it will
>>> >                     mean that we will need to
>>> >                      >     perform
>>> >                      >      > releases every time we want to change the
>>> >                     version of one of our
>>> >                      >     vendored
>>> >                      >      > dependencies.
>>> >                      >      >
>>> >                      >      > 1:
>>> >
>>> https://github.com/lukecwik/incubator-beam/tree/intellij
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>>> >                     Knowles <kenn@apache.org <ma...@apache.org>
>>> >                      >     <mailto:kenn@apache.org <mailto:
>>> kenn@apache.org>>
>>> >                      >      > <mailto:kenn@apache.org
>>> >                     <ma...@apache.org> <mailto:kenn@apache.org
>>> >                     <ma...@apache.org>>>> wrote:
>>> >                      >      >
>>> >                      >      >     Another reason to push on this is to
>>> >                     get build times down.
>>> >                      >     Once only
>>> >                      >      >     generated proto classes use the
>>> shadow
>>> >                     plugin we'll cut the build
>>> >                      >      >     time in ~half? And there is no reason
>>> >                     to constantly re-vendor.
>>> >                      >      >
>>> >                      >      >     Kenn
>>> >                      >      >
>>> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>>> >                     Kenneth Knowles
>>> >                      >     <klk@google.com <ma...@google.com>
>>> >                     <mailto:klk@google.com <ma...@google.com>>
>>> >                      >      >     <mailto:klk@google.com
>>> >                     <ma...@google.com> <mailto:klk@google.com
>>> >                     <ma...@google.com>>>> wrote:
>>> >                      >      >
>>> >                      >      >         Hi all,
>>> >                      >      >
>>> >                      >      >         A while ago we had pretty good
>>> >                     consensus that we should
>>> >                      >     vendor
>>> >                      >      >         ("pre-shade") specific
>>> >                     dependencies, and there's start on it
>>> >                      >      >         here:
>>> >                     https://github.com/apache/beam/tree/master/vendor
>>> >                      >      >
>>> >                      >      >         IntelliJ notes:
>>> >                      >      >
>>> >                      >      >           - With shading, it is hard
>>> >                     (impossible?) to step into
>>> >                      >      >         dependency code in IntelliJ's
>>> >                     debugger, because the actual
>>> >                      >      >         symbol at runtime does not match
>>> >                     what is in the external jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Intellij can step through the classes if
>>> >                     they were published
>>> >                      >     outside the
>>> >                      >      > project since it can decompile them. The
>>> >                     source won't be legible.
>>> >                      >      > Decompiling the source as in my example
>>> >                     effectively shows the
>>> >                      >     same issue.
>>> >                      >      >
>>> >                      >      >           - With vendoring, if the
>>> >                     vendored dependencies are part
>>> >                      >     of the
>>> >                      >      >         project then IntelliJ gets
>>> >                     confused because it operates on
>>> >                      >      >         source, not the produced jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Yes, I tried several ways to get intellij
>>> >                     to ignore the source
>>> >                      >     and use
>>> >                      >      > the output jars but no luck.
>>> >                      >      >
>>> >                      >      >         The second one has a quick fix
>>> for
>>> >                     most cases*: don't
>>> >                      >     make the
>>> >                      >      >         vendored dep a subproject, but
>>> >                     just separately build and
>>> >                      >     publish
>>> >                      >      >         it. Since a vendored dependency
>>> >                     should change much more
>>> >                      >      >         infrequently (or if we bake the
>>> >                     version into the name,
>>> >                      >     ~never)
>>> >                      >      >         this means we publish once and
>>> >                     save headache and build
>>> >                      >     time forever.
>>> >                      >      >
>>> >                      >      >         WDYT? Have I overlooked
>>> something?
>>> >                     How about we set up
>>> >                      >     vendored
>>> >                      >      >         versions of guava, protobuf,
>>> grpc,
>>> >                     and publish them. We don't
>>> >                      >      >         have to actually start using them
>>> >                     yet, and can do it
>>> >                      >     incrementally.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Currently we are relocating code
>>> depending
>>> >                     on the version string.
>>> >                      >     If the
>>> >                      >      > major version is >= 1, we use only the
>>> >                     major version within the
>>> >                      >     package
>>> >                      >      > string and rely on semantic versioning
>>> >                     provided by the dependency
>>> >                      >     to not
>>> >                      >      > break people. If the major version is 0,
>>> >                     we assume the dependency is
>>> >                      >      > unstable and use the full version as part
>>> >                     of the package string
>>> >                      >     during
>>> >                      >      > relocation.
>>> >                      >      >
>>> >                      >      > The downside of using the full version
>>> >                     string for relocated packages:
>>> >                      >      > 1) Users will end up with multiple copies
>>> >                     of dependencies that
>>> >                      >     differ
>>> >                      >      > only by the minor or patch version
>>> >                     increasing the size of their
>>> >                      >     application.
>>> >                      >      > 2) Bumping up the version of a dependency
>>> >                     now requires the import
>>> >                      >      > statement in all java files to be updated
>>> >                     (not too difficult with
>>> >                      >     some
>>> >                      >      > sed/grep skills)
>>> >                      >      >
>>> >                      >      > The upside of using the full version
>>> >                     string in the relocated package:
>>> >                      >      > 1) We don't have to worry about whether a
>>> >                     dependency maintains
>>> >                      >     semantic
>>> >                      >      > versioning which means our users won't
>>> >                     have to worry about that
>>> >                      >     either.
>>> >                      >      > 2) This increases the odds that a user
>>> >                     will load multiple slightly
>>> >                      >      > different versions of the same dependency
>>> >                     which is known to be
>>> >                      >      > incompatible in certain situations (e.g.
>>> >                     Netty 4.1.25 can't be on
>>> >                      >     the
>>> >                      >      > classpath with Netty 4.1.28 even though
>>> >                     they are both shaded due to
>>> >                      >      > issues of how JNI with tcnative works).
>>> >                      >      >
>>> >                      >      >
>>> >                      >      >         (side note: what do other
>>> projects
>>> >                     like Flink do?)
>>> >                      >      >
>>> >                      >      >         Kenn
>>> >                      >      >
>>> >                      >      >         *for generated proto classes,
>>> they
>>> >                     need to be altered after
>>> >                      >      >         being generated so shading
>>> happens
>>> >                     there, but actually only
>>> >                      >      >         relocation and the shared
>>> vendored
>>> >                     dep should work
>>> >                      >     elsewhere in
>>> >                      >      >         the project
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > We could publish the protos and treat
>>> them
>>> >                     as "external"
>>> >                      >     dependencies
>>> >                      >      > within the Java projects which would also
>>> >                     remove this pain point.
>>> >                      >
>>> >
>>>
>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Thomas Weise <th...@apache.org>.
I would prefer very explicit artifact names, to avoid confusion or
collision where the jar files end up in a flat namespace (like a /lib
folder).

Example:

groupId: org.apache.beam
artifactId beam-vendored-guava-20_0

Thanks,
Thomas






On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <ke...@apache.org> wrote:

> OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
> through. I made some comments there, and also I agree with Luke that full
> version string makes sense. For this purpose it seems easy and fine to do a
> search/replace to swap 20.0 for 20.1, and compatibility between them should
> not be a concern.
>
> I have minor suggestions and clarifications:
>
>  - Is there value to `beam` in the artifactId? I would leave it off unless
> there's a special need
>  - Users should never use these and we make it extremely clear they are
> not supported for any reasons
>  - Use 0.x versions indicating no intention of semantic versioning
>
> Bringing my comments and Luke's together, here's the proposal:
>
>     groupId: org.apache.beam
>     artifactId: vendored-guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> Alternatively it could be
>
>     groupId: org.apache.beam-vendored
>     artifactid: guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> I like the latter but I haven't gone through the process of establishing a
> new groupId.
>
> And for now we do not publish source jars. A couple of TODOs to get the
> build in good shape (classifiers, jars, interaction with plugins)
>
> Kenn
>
>
> On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> It looks like we are agreeing to make each vendored dependency self
>> contained and have all their own internal dependencies packaged. For
>> example, gRPC and all its transitive dependencies would use
>> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
>> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>>
>> I also wanted to circle back on this question I had earlier that didn't
>> have any follow-up:
>> Currently we are relocating code depending on the version string. If the
>> major version is >= 1, we use only the major version within the package
>> string and rely on semantic versioning provided by the dependency to not
>> break people. If the major version is 0, we assume the dependency is
>> unstable and use the full version as part of the package string during
>> relocation.
>>
>> The downside of using the full version string for relocated packages:
>> 1) Users will end up with multiple copies of dependencies that differ
>> only by the minor or patch version increasing the size of their application.
>> 2) Bumping up the version of a dependency now requires the import
>> statement in all java files to be updated (not too difficult with some
>> sed/grep skills)
>>
>> The upside of using the full version string in the relocated package:
>> 1) We don't have to worry about whether a dependency maintains semantic
>> versioning which means our users won't have to worry about that either.
>> 2) This increases the odds that a user will load multiple slightly
>> different versions of the same dependency which is known to be incompatible
>> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
>> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
>> tcnative works).
>>
>> My preference would be to use the full version string for import
>> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
>> allow multiple copies to not conflict with each other since in my opinion
>> it is a lot more difficult to help a user debug a dependency issue then to
>> use string replacement during dependency upgrades to fix import statements.
>> Also I would suggest we name the artifacts in Maven as follows:
>> groupId: org.apache.beam
>> artifactId: beam-vendor-grpc-v1_13_1
>> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
>> only for patch upgrades that fix any shading issues we may have had when
>> producing the vendored jar)
>>
>>
>> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <mx...@apache.org>
>> wrote:
>>
>>> Would also keep it simple and optimize for the JAR size only if
>>> necessary.
>>>
>>> On 24.10.18 00:06, Kenneth Knowles wrote:
>>> > I think it makes sense for each vendored dependency to be
>>> self-contained
>>> > as much as possible. It should keep it fairly simple. Things that
>>> cross
>>> > their API surface cannot be hidden, of course. Jar size is not a
>>> concern
>>> > IMO.
>>> >
>>> > Kenn
>>> >
>>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lcwik@google.com
>>> > <ma...@google.com>> wrote:
>>> >
>>> >     How should we handle the transitive dependencies of the things we
>>> >     want to vendor?
>>> >
>>> >     For example we use gRPC which depends on Guava 20 and we also use
>>> >     Calcite which depends on Guava 19.
>>> >
>>> >     Should the vendored gRPC/Calcite/... be self-contained so it
>>> >     contains all its dependencies, hence vendored gRPC would contain
>>> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
>>> >     different namespaces)?
>>> >     This leads to larger jars but less vendored dependencies to
>>> maintain.
>>> >
>>> >     Or should we produce a vendored library for those that we want to
>>> >     share, e.g. Guava 20 that could be reused across multiple vendored
>>> >     libraries?
>>> >     Makes the vendoring process slightly more complicated, more
>>> >     dependencies to maintain, smaller jars.
>>> >
>>> >     Or should we produce a vendored library for each dependency?
>>> >     Lots of vendoring needed, likely tooling required to be built to
>>> >     maintain this.
>>> >
>>> >
>>> >
>>> >
>>> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <klk@google.com
>>> >     <ma...@google.com>> wrote:
>>> >
>>> >         I actually created the subtasks by finding things shaded by at
>>> >         least one module. I think each one should definitely have an on
>>> >         list discussion that clarifies the target artifact, namespace,
>>> >         version, possible complications, etc.
>>> >
>>> >         My impression is that many many modules shade only Guava. So
>>> for
>>> >         build time and simplification that is a big win.
>>> >
>>> >         Kenn
>>> >
>>> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <thw@apache.org
>>> >         <ma...@apache.org>> wrote:
>>> >
>>> >             +1 for separate artifacts
>>> >
>>> >             I would request that we explicitly discuss and agree which
>>> >             dependencies we vendor though.
>>> >
>>> >             Not everything listed in the JIRA subtasks is currently
>>> >             relocated.
>>> >
>>> >             Thomas
>>> >
>>> >
>>> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>>> >             <david.moravek@gmail.com <ma...@gmail.com>>
>>> >             wrote:
>>> >
>>> >                 +1 This should improve build times a lot. It would be
>>> >                 great if vendored deps could stay in the main
>>> repository.
>>> >
>>> >                 D.
>>> >
>>> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>>> >                 <mxm@apache.org <ma...@apache.org>> wrote:
>>> >
>>> >                     Looks great, Kenn!
>>> >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did that make it easier to
>>> manage
>>> >                     in some way?
>>> >
>>> >                     Better separation of concerns, but I don't think
>>> >                     releasing the shaded
>>> >                     artifacts from the main repo is a problem. I'd even
>>> >                     prefer not to split
>>> >                     up the repo because it makes updates to the
>>> vendored
>>> >                     dependencies
>>> >                     slightly easier.
>>> >
>>> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>>> >                      > OK, I've filed
>>> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
>>> >                      > collect sub-tasks. This has enough upsides
>>> >                     throughout lots of areas of
>>> >                      > the project that even though it is not glamorous
>>> >                     it seems pretty
>>> >                      > valuable to start on immediately. And I want to
>>> >                     find out if there's a
>>> >                      > pitfall lurking.
>>> >                      >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did
>>> >                      > that make it easier to manage in some way?
>>> >                      >
>>> >                      > Kenn
>>> >                      >
>>> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>>> >                     Michels <mxm@apache.org <ma...@apache.org>
>>> >                      > <mailto:mxm@apache.org <mailto:mxm@apache.org
>>> >>>
>>> >                     wrote:
>>> >                      >
>>> >                      >     +1 for publishing vendored Jars
>>> >                     independently. It will improve build
>>> >                      >     time and ease IntelliJ integration.
>>> >                      >
>>> >                      >     Flink also publishes shaded dependencies
>>> >                     separately:
>>> >                      >
>>> >                      >     - https://github.com/apache/flink-shaded
>>> >                      >     -
>>> >                     https://issues.apache.org/jira/browse/FLINK-6529
>>> >                      >
>>> >                      >     AFAIK their main motivation was to get rid
>>> of
>>> >                     duplicate shaded classes
>>> >                      >     on the classpath. We don't appear to have
>>> >                     that problem because we
>>> >                      >     already have a separate "vendor" project.
>>> >                      >
>>> >                      >      >  - With shading, it is hard (impossible?)
>>> >                     to step into dependency
>>> >                      >     code in IntelliJ's debugger, because the
>>> >                     actual symbol at runtime
>>> >                      >     does not match what is in the external jars
>>> >                      >
>>> >                      >     This would be solved by releasing the
>>> sources
>>> >                     of the shaded jars.
>>> >                      >      From a
>>> >                      >     legal perspective, this could be problematic
>>> >                     as alluded to here:
>>> >                      >
>>> https://github.com/apache/flink-shaded/issues/25
>>> >                      >
>>> >                      >     -Max
>>> >                      >
>>> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>> >                      >      > I have tried several times to improve the
>>> >                     build system and intellij
>>> >                      >      > integration and each attempt ended with
>>> >                     little progress when dealing
>>> >                      >      > with vendored code. My latest attempt has
>>> >                     been the most promising
>>> >                      >     where
>>> >                      >      > I take the vendored classes/jars and
>>> >                     decompile them generating the
>>> >                      >      > source that Intellij can then use. I have
>>> >                     a branch[1] that
>>> >                      >     demonstrates
>>> >                      >      > the idea. It works pretty well (and up
>>> >                     until a change where we
>>> >                      >     started
>>> >                      >      > vendoring gRPC, was impractical to do.
>>> >                     Instructions to try it out
>>> >                      >     are:
>>> >                      >      >
>>> >                      >      > // Clean up any remnants of prior
>>> >                     builds/intellij projects
>>> >                      >      > git clean -fdx
>>> >                      >      > // Generated the source for
>>> >                     vendored/shaded modules
>>> >                      >      > ./gradlew decompile
>>> >                      >      >
>>> >                      >      > // Remove the "generated" Java sources
>>> for
>>> >                     protos so they don't
>>> >                      >     conflict with the decompiled sources.
>>> >                      >      > rm -rf
>>> >                     model/pipeline/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/job-management/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/fn-execution/build/generated/source/proto
>>> >                      >      > // Import the project into Intellij, most
>>> >                     code completion now
>>> >                      >     works still some issues with a few classes.
>>> >                      >      > // Note that the Java decompiler doesn't
>>> >                     generate valid source so
>>> >                      >     still need to delegate to Gradle for
>>> >                     build/run/test actions
>>> >                      >      > // Other decompilers may do a
>>> better/worse
>>> >                     job but haven't tried
>>> >                      >     them.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > The problems that I face are that the
>>> >                     generated Java source from the
>>> >                      >      > protos and the decompiled source from the
>>> >                     compiled version of that
>>> >                      >      > source post shading are both being
>>> >                     imported as content roots and
>>> >                      >     then
>>> >                      >      > conflict. Also, the CFR decompiler isn't
>>> >                     producing valid source, if
>>> >                      >      > people could try others and report their
>>> >                     mileage, we may find one
>>> >                      >     that
>>> >                      >      > works and then we would be able to use
>>> >                     intellij to build/run our
>>> >                      >     code
>>> >                      >      > and not need to delegate all our
>>> >                     build/run/test actions to Gradle.
>>> >                      >      >
>>> >                      >      > After all these attempts I have done,
>>> >                     vendoring the dependencies
>>> >                      >     outside
>>> >                      >      > of the project seems like a sane approach
>>> >                     and unless someone
>>> >                      >     wants to
>>> >                      >      > take a stab at the best progress I have
>>> >                     made above, I would go
>>> >                      >     with what
>>> >                      >      > Kenn is suggesting even though it will
>>> >                     mean that we will need to
>>> >                      >     perform
>>> >                      >      > releases every time we want to change the
>>> >                     version of one of our
>>> >                      >     vendored
>>> >                      >      > dependencies.
>>> >                      >      >
>>> >                      >      > 1:
>>> >
>>> https://github.com/lukecwik/incubator-beam/tree/intellij
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>>> >                     Knowles <kenn@apache.org <ma...@apache.org>
>>> >                      >     <mailto:kenn@apache.org <mailto:
>>> kenn@apache.org>>
>>> >                      >      > <mailto:kenn@apache.org
>>> >                     <ma...@apache.org> <mailto:kenn@apache.org
>>> >                     <ma...@apache.org>>>> wrote:
>>> >                      >      >
>>> >                      >      >     Another reason to push on this is to
>>> >                     get build times down.
>>> >                      >     Once only
>>> >                      >      >     generated proto classes use the
>>> shadow
>>> >                     plugin we'll cut the build
>>> >                      >      >     time in ~half? And there is no reason
>>> >                     to constantly re-vendor.
>>> >                      >      >
>>> >                      >      >     Kenn
>>> >                      >      >
>>> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>>> >                     Kenneth Knowles
>>> >                      >     <klk@google.com <ma...@google.com>
>>> >                     <mailto:klk@google.com <ma...@google.com>>
>>> >                      >      >     <mailto:klk@google.com
>>> >                     <ma...@google.com> <mailto:klk@google.com
>>> >                     <ma...@google.com>>>> wrote:
>>> >                      >      >
>>> >                      >      >         Hi all,
>>> >                      >      >
>>> >                      >      >         A while ago we had pretty good
>>> >                     consensus that we should
>>> >                      >     vendor
>>> >                      >      >         ("pre-shade") specific
>>> >                     dependencies, and there's start on it
>>> >                      >      >         here:
>>> >                     https://github.com/apache/beam/tree/master/vendor
>>> >                      >      >
>>> >                      >      >         IntelliJ notes:
>>> >                      >      >
>>> >                      >      >           - With shading, it is hard
>>> >                     (impossible?) to step into
>>> >                      >      >         dependency code in IntelliJ's
>>> >                     debugger, because the actual
>>> >                      >      >         symbol at runtime does not match
>>> >                     what is in the external jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Intellij can step through the classes if
>>> >                     they were published
>>> >                      >     outside the
>>> >                      >      > project since it can decompile them. The
>>> >                     source won't be legible.
>>> >                      >      > Decompiling the source as in my example
>>> >                     effectively shows the
>>> >                      >     same issue.
>>> >                      >      >
>>> >                      >      >           - With vendoring, if the
>>> >                     vendored dependencies are part
>>> >                      >     of the
>>> >                      >      >         project then IntelliJ gets
>>> >                     confused because it operates on
>>> >                      >      >         source, not the produced jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Yes, I tried several ways to get intellij
>>> >                     to ignore the source
>>> >                      >     and use
>>> >                      >      > the output jars but no luck.
>>> >                      >      >
>>> >                      >      >         The second one has a quick fix
>>> for
>>> >                     most cases*: don't
>>> >                      >     make the
>>> >                      >      >         vendored dep a subproject, but
>>> >                     just separately build and
>>> >                      >     publish
>>> >                      >      >         it. Since a vendored dependency
>>> >                     should change much more
>>> >                      >      >         infrequently (or if we bake the
>>> >                     version into the name,
>>> >                      >     ~never)
>>> >                      >      >         this means we publish once and
>>> >                     save headache and build
>>> >                      >     time forever.
>>> >                      >      >
>>> >                      >      >         WDYT? Have I overlooked
>>> something?
>>> >                     How about we set up
>>> >                      >     vendored
>>> >                      >      >         versions of guava, protobuf,
>>> grpc,
>>> >                     and publish them. We don't
>>> >                      >      >         have to actually start using them
>>> >                     yet, and can do it
>>> >                      >     incrementally.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Currently we are relocating code
>>> depending
>>> >                     on the version string.
>>> >                      >     If the
>>> >                      >      > major version is >= 1, we use only the
>>> >                     major version within the
>>> >                      >     package
>>> >                      >      > string and rely on semantic versioning
>>> >                     provided by the dependency
>>> >                      >     to not
>>> >                      >      > break people. If the major version is 0,
>>> >                     we assume the dependency is
>>> >                      >      > unstable and use the full version as part
>>> >                     of the package string
>>> >                      >     during
>>> >                      >      > relocation.
>>> >                      >      >
>>> >                      >      > The downside of using the full version
>>> >                     string for relocated packages:
>>> >                      >      > 1) Users will end up with multiple copies
>>> >                     of dependencies that
>>> >                      >     differ
>>> >                      >      > only by the minor or patch version
>>> >                     increasing the size of their
>>> >                      >     application.
>>> >                      >      > 2) Bumping up the version of a dependency
>>> >                     now requires the import
>>> >                      >      > statement in all java files to be updated
>>> >                     (not too difficult with
>>> >                      >     some
>>> >                      >      > sed/grep skills)
>>> >                      >      >
>>> >                      >      > The upside of using the full version
>>> >                     string in the relocated package:
>>> >                      >      > 1) We don't have to worry about whether a
>>> >                     dependency maintains
>>> >                      >     semantic
>>> >                      >      > versioning which means our users won't
>>> >                     have to worry about that
>>> >                      >     either.
>>> >                      >      > 2) This increases the odds that a user
>>> >                     will load multiple slightly
>>> >                      >      > different versions of the same dependency
>>> >                     which is known to be
>>> >                      >      > incompatible in certain situations (e.g.
>>> >                     Netty 4.1.25 can't be on
>>> >                      >     the
>>> >                      >      > classpath with Netty 4.1.28 even though
>>> >                     they are both shaded due to
>>> >                      >      > issues of how JNI with tcnative works).
>>> >                      >      >
>>> >                      >      >
>>> >                      >      >         (side note: what do other
>>> projects
>>> >                     like Flink do?)
>>> >                      >      >
>>> >                      >      >         Kenn
>>> >                      >      >
>>> >                      >      >         *for generated proto classes,
>>> they
>>> >                     need to be altered after
>>> >                      >      >         being generated so shading
>>> happens
>>> >                     there, but actually only
>>> >                      >      >         relocation and the shared
>>> vendored
>>> >                     dep should work
>>> >                      >     elsewhere in
>>> >                      >      >         the project
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > We could publish the protos and treat
>>> them
>>> >                     as "external"
>>> >                      >     dependencies
>>> >                      >      > within the Java projects which would also
>>> >                     remove this pain point.
>>> >                      >
>>> >
>>>
>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <ke...@apache.org>.
OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
through. I made some comments there, and also I agree with Luke that full
version string makes sense. For this purpose it seems easy and fine to do a
search/replace to swap 20.0 for 20.1, and compatibility between them should
not be a concern.

I have minor suggestions and clarifications:

 - Is there value to `beam` in the artifactId? I would leave it off unless
there's a special need
 - Users should never use these and we make it extremely clear they are not
supported for any reasons
 - Use 0.x versions indicating no intention of semantic versioning

Bringing my comments and Luke's together, here's the proposal:

    groupId: org.apache.beam
    artifactId: vendored-guava-20_0
    namespace: org.apache.beam.vendored.guava.v20_0
    version: 0.1

Alternatively it could be

    groupId: org.apache.beam-vendored
    artifactid: guava-20_0
    namespace: org.apache.beam.vendored.guava.v20_0
    version: 0.1

I like the latter but I haven't gone through the process of establishing a
new groupId.

And for now we do not publish source jars. A couple of TODOs to get the
build in good shape (classifiers, jars, interaction with plugins)

Kenn


On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lc...@google.com> wrote:

> It looks like we are agreeing to make each vendored dependency self
> contained and have all their own internal dependencies packaged. For
> example, gRPC and all its transitive dependencies would use
> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>
> I also wanted to circle back on this question I had earlier that didn't
> have any follow-up:
> Currently we are relocating code depending on the version string. If the
> major version is >= 1, we use only the major version within the package
> string and rely on semantic versioning provided by the dependency to not
> break people. If the major version is 0, we assume the dependency is
> unstable and use the full version as part of the package string during
> relocation.
>
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ only
> by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import
> statement in all java files to be updated (not too difficult with some
> sed/grep skills)
>
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly
> different versions of the same dependency which is known to be incompatible
> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
> tcnative works).
>
> My preference would be to use the full version string for import
> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
> allow multiple copies to not conflict with each other since in my opinion
> it is a lot more difficult to help a user debug a dependency issue then to
> use string replacement during dependency upgrades to fix import statements.
> Also I would suggest we name the artifacts in Maven as follows:
> groupId: org.apache.beam
> artifactId: beam-vendor-grpc-v1_13_1
> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
> only for patch upgrades that fix any shading issues we may have had when
> producing the vendored jar)
>
>
> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Would also keep it simple and optimize for the JAR size only if necessary.
>>
>> On 24.10.18 00:06, Kenneth Knowles wrote:
>> > I think it makes sense for each vendored dependency to be
>> self-contained
>> > as much as possible. It should keep it fairly simple. Things that cross
>> > their API surface cannot be hidden, of course. Jar size is not a
>> concern
>> > IMO.
>> >
>> > Kenn
>> >
>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lcwik@google.com
>> > <ma...@google.com>> wrote:
>> >
>> >     How should we handle the transitive dependencies of the things we
>> >     want to vendor?
>> >
>> >     For example we use gRPC which depends on Guava 20 and we also use
>> >     Calcite which depends on Guava 19.
>> >
>> >     Should the vendored gRPC/Calcite/... be self-contained so it
>> >     contains all its dependencies, hence vendored gRPC would contain
>> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
>> >     different namespaces)?
>> >     This leads to larger jars but less vendored dependencies to
>> maintain.
>> >
>> >     Or should we produce a vendored library for those that we want to
>> >     share, e.g. Guava 20 that could be reused across multiple vendored
>> >     libraries?
>> >     Makes the vendoring process slightly more complicated, more
>> >     dependencies to maintain, smaller jars.
>> >
>> >     Or should we produce a vendored library for each dependency?
>> >     Lots of vendoring needed, likely tooling required to be built to
>> >     maintain this.
>> >
>> >
>> >
>> >
>> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <klk@google.com
>> >     <ma...@google.com>> wrote:
>> >
>> >         I actually created the subtasks by finding things shaded by at
>> >         least one module. I think each one should definitely have an on
>> >         list discussion that clarifies the target artifact, namespace,
>> >         version, possible complications, etc.
>> >
>> >         My impression is that many many modules shade only Guava. So for
>> >         build time and simplification that is a big win.
>> >
>> >         Kenn
>> >
>> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <thw@apache.org
>> >         <ma...@apache.org>> wrote:
>> >
>> >             +1 for separate artifacts
>> >
>> >             I would request that we explicitly discuss and agree which
>> >             dependencies we vendor though.
>> >
>> >             Not everything listed in the JIRA subtasks is currently
>> >             relocated.
>> >
>> >             Thomas
>> >
>> >
>> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>> >             <david.moravek@gmail.com <ma...@gmail.com>>
>> >             wrote:
>> >
>> >                 +1 This should improve build times a lot. It would be
>> >                 great if vendored deps could stay in the main
>> repository.
>> >
>> >                 D.
>> >
>> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>> >                 <mxm@apache.org <ma...@apache.org>> wrote:
>> >
>> >                     Looks great, Kenn!
>> >
>> >                      > Max: what is the story behind having a separate
>> >                     flink-shaded repo? Did that make it easier to manage
>> >                     in some way?
>> >
>> >                     Better separation of concerns, but I don't think
>> >                     releasing the shaded
>> >                     artifacts from the main repo is a problem. I'd even
>> >                     prefer not to split
>> >                     up the repo because it makes updates to the vendored
>> >                     dependencies
>> >                     slightly easier.
>> >
>> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>> >                      > OK, I've filed
>> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
>> >                      > collect sub-tasks. This has enough upsides
>> >                     throughout lots of areas of
>> >                      > the project that even though it is not glamorous
>> >                     it seems pretty
>> >                      > valuable to start on immediately. And I want to
>> >                     find out if there's a
>> >                      > pitfall lurking.
>> >                      >
>> >                      > Max: what is the story behind having a separate
>> >                     flink-shaded repo? Did
>> >                      > that make it easier to manage in some way?
>> >                      >
>> >                      > Kenn
>> >                      >
>> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>> >                     Michels <mxm@apache.org <ma...@apache.org>
>> >                      > <mailto:mxm@apache.org <ma...@apache.org>>>
>> >                     wrote:
>> >                      >
>> >                      >     +1 for publishing vendored Jars
>> >                     independently. It will improve build
>> >                      >     time and ease IntelliJ integration.
>> >                      >
>> >                      >     Flink also publishes shaded dependencies
>> >                     separately:
>> >                      >
>> >                      >     - https://github.com/apache/flink-shaded
>> >                      >     -
>> >                     https://issues.apache.org/jira/browse/FLINK-6529
>> >                      >
>> >                      >     AFAIK their main motivation was to get rid of
>> >                     duplicate shaded classes
>> >                      >     on the classpath. We don't appear to have
>> >                     that problem because we
>> >                      >     already have a separate "vendor" project.
>> >                      >
>> >                      >      >  - With shading, it is hard (impossible?)
>> >                     to step into dependency
>> >                      >     code in IntelliJ's debugger, because the
>> >                     actual symbol at runtime
>> >                      >     does not match what is in the external jars
>> >                      >
>> >                      >     This would be solved by releasing the sources
>> >                     of the shaded jars.
>> >                      >      From a
>> >                      >     legal perspective, this could be problematic
>> >                     as alluded to here:
>> >                      > https://github.com/apache/flink-shaded/issues/25
>> >                      >
>> >                      >     -Max
>> >                      >
>> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>> >                      >      > I have tried several times to improve the
>> >                     build system and intellij
>> >                      >      > integration and each attempt ended with
>> >                     little progress when dealing
>> >                      >      > with vendored code. My latest attempt has
>> >                     been the most promising
>> >                      >     where
>> >                      >      > I take the vendored classes/jars and
>> >                     decompile them generating the
>> >                      >      > source that Intellij can then use. I have
>> >                     a branch[1] that
>> >                      >     demonstrates
>> >                      >      > the idea. It works pretty well (and up
>> >                     until a change where we
>> >                      >     started
>> >                      >      > vendoring gRPC, was impractical to do.
>> >                     Instructions to try it out
>> >                      >     are:
>> >                      >      >
>> >                      >      > // Clean up any remnants of prior
>> >                     builds/intellij projects
>> >                      >      > git clean -fdx
>> >                      >      > // Generated the source for
>> >                     vendored/shaded modules
>> >                      >      > ./gradlew decompile
>> >                      >      >
>> >                      >      > // Remove the "generated" Java sources for
>> >                     protos so they don't
>> >                      >     conflict with the decompiled sources.
>> >                      >      > rm -rf
>> >                     model/pipeline/build/generated/source/proto
>> >                      >      > rm -rf
>> >                     model/job-management/build/generated/source/proto
>> >                      >      > rm -rf
>> >                     model/fn-execution/build/generated/source/proto
>> >                      >      > // Import the project into Intellij, most
>> >                     code completion now
>> >                      >     works still some issues with a few classes.
>> >                      >      > // Note that the Java decompiler doesn't
>> >                     generate valid source so
>> >                      >     still need to delegate to Gradle for
>> >                     build/run/test actions
>> >                      >      > // Other decompilers may do a better/worse
>> >                     job but haven't tried
>> >                      >     them.
>> >                      >      >
>> >                      >      >
>> >                      >      > The problems that I face are that the
>> >                     generated Java source from the
>> >                      >      > protos and the decompiled source from the
>> >                     compiled version of that
>> >                      >      > source post shading are both being
>> >                     imported as content roots and
>> >                      >     then
>> >                      >      > conflict. Also, the CFR decompiler isn't
>> >                     producing valid source, if
>> >                      >      > people could try others and report their
>> >                     mileage, we may find one
>> >                      >     that
>> >                      >      > works and then we would be able to use
>> >                     intellij to build/run our
>> >                      >     code
>> >                      >      > and not need to delegate all our
>> >                     build/run/test actions to Gradle.
>> >                      >      >
>> >                      >      > After all these attempts I have done,
>> >                     vendoring the dependencies
>> >                      >     outside
>> >                      >      > of the project seems like a sane approach
>> >                     and unless someone
>> >                      >     wants to
>> >                      >      > take a stab at the best progress I have
>> >                     made above, I would go
>> >                      >     with what
>> >                      >      > Kenn is suggesting even though it will
>> >                     mean that we will need to
>> >                      >     perform
>> >                      >      > releases every time we want to change the
>> >                     version of one of our
>> >                      >     vendored
>> >                      >      > dependencies.
>> >                      >      >
>> >                      >      > 1:
>> >
>> https://github.com/lukecwik/incubator-beam/tree/intellij
>> >                      >      >
>> >                      >      >
>> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>> >                     Knowles <kenn@apache.org <ma...@apache.org>
>> >                      >     <mailto:kenn@apache.org <mailto:
>> kenn@apache.org>>
>> >                      >      > <mailto:kenn@apache.org
>> >                     <ma...@apache.org> <mailto:kenn@apache.org
>> >                     <ma...@apache.org>>>> wrote:
>> >                      >      >
>> >                      >      >     Another reason to push on this is to
>> >                     get build times down.
>> >                      >     Once only
>> >                      >      >     generated proto classes use the shadow
>> >                     plugin we'll cut the build
>> >                      >      >     time in ~half? And there is no reason
>> >                     to constantly re-vendor.
>> >                      >      >
>> >                      >      >     Kenn
>> >                      >      >
>> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>> >                     Kenneth Knowles
>> >                      >     <klk@google.com <ma...@google.com>
>> >                     <mailto:klk@google.com <ma...@google.com>>
>> >                      >      >     <mailto:klk@google.com
>> >                     <ma...@google.com> <mailto:klk@google.com
>> >                     <ma...@google.com>>>> wrote:
>> >                      >      >
>> >                      >      >         Hi all,
>> >                      >      >
>> >                      >      >         A while ago we had pretty good
>> >                     consensus that we should
>> >                      >     vendor
>> >                      >      >         ("pre-shade") specific
>> >                     dependencies, and there's start on it
>> >                      >      >         here:
>> >                     https://github.com/apache/beam/tree/master/vendor
>> >                      >      >
>> >                      >      >         IntelliJ notes:
>> >                      >      >
>> >                      >      >           - With shading, it is hard
>> >                     (impossible?) to step into
>> >                      >      >         dependency code in IntelliJ's
>> >                     debugger, because the actual
>> >                      >      >         symbol at runtime does not match
>> >                     what is in the external jars
>> >                      >      >
>> >                      >      >
>> >                      >      > Intellij can step through the classes if
>> >                     they were published
>> >                      >     outside the
>> >                      >      > project since it can decompile them. The
>> >                     source won't be legible.
>> >                      >      > Decompiling the source as in my example
>> >                     effectively shows the
>> >                      >     same issue.
>> >                      >      >
>> >                      >      >           - With vendoring, if the
>> >                     vendored dependencies are part
>> >                      >     of the
>> >                      >      >         project then IntelliJ gets
>> >                     confused because it operates on
>> >                      >      >         source, not the produced jars
>> >                      >      >
>> >                      >      >
>> >                      >      > Yes, I tried several ways to get intellij
>> >                     to ignore the source
>> >                      >     and use
>> >                      >      > the output jars but no luck.
>> >                      >      >
>> >                      >      >         The second one has a quick fix for
>> >                     most cases*: don't
>> >                      >     make the
>> >                      >      >         vendored dep a subproject, but
>> >                     just separately build and
>> >                      >     publish
>> >                      >      >         it. Since a vendored dependency
>> >                     should change much more
>> >                      >      >         infrequently (or if we bake the
>> >                     version into the name,
>> >                      >     ~never)
>> >                      >      >         this means we publish once and
>> >                     save headache and build
>> >                      >     time forever.
>> >                      >      >
>> >                      >      >         WDYT? Have I overlooked something?
>> >                     How about we set up
>> >                      >     vendored
>> >                      >      >         versions of guava, protobuf, grpc,
>> >                     and publish them. We don't
>> >                      >      >         have to actually start using them
>> >                     yet, and can do it
>> >                      >     incrementally.
>> >                      >      >
>> >                      >      >
>> >                      >      > Currently we are relocating code depending
>> >                     on the version string.
>> >                      >     If the
>> >                      >      > major version is >= 1, we use only the
>> >                     major version within the
>> >                      >     package
>> >                      >      > string and rely on semantic versioning
>> >                     provided by the dependency
>> >                      >     to not
>> >                      >      > break people. If the major version is 0,
>> >                     we assume the dependency is
>> >                      >      > unstable and use the full version as part
>> >                     of the package string
>> >                      >     during
>> >                      >      > relocation.
>> >                      >      >
>> >                      >      > The downside of using the full version
>> >                     string for relocated packages:
>> >                      >      > 1) Users will end up with multiple copies
>> >                     of dependencies that
>> >                      >     differ
>> >                      >      > only by the minor or patch version
>> >                     increasing the size of their
>> >                      >     application.
>> >                      >      > 2) Bumping up the version of a dependency
>> >                     now requires the import
>> >                      >      > statement in all java files to be updated
>> >                     (not too difficult with
>> >                      >     some
>> >                      >      > sed/grep skills)
>> >                      >      >
>> >                      >      > The upside of using the full version
>> >                     string in the relocated package:
>> >                      >      > 1) We don't have to worry about whether a
>> >                     dependency maintains
>> >                      >     semantic
>> >                      >      > versioning which means our users won't
>> >                     have to worry about that
>> >                      >     either.
>> >                      >      > 2) This increases the odds that a user
>> >                     will load multiple slightly
>> >                      >      > different versions of the same dependency
>> >                     which is known to be
>> >                      >      > incompatible in certain situations (e.g.
>> >                     Netty 4.1.25 can't be on
>> >                      >     the
>> >                      >      > classpath with Netty 4.1.28 even though
>> >                     they are both shaded due to
>> >                      >      > issues of how JNI with tcnative works).
>> >                      >      >
>> >                      >      >
>> >                      >      >         (side note: what do other projects
>> >                     like Flink do?)
>> >                      >      >
>> >                      >      >         Kenn
>> >                      >      >
>> >                      >      >         *for generated proto classes, they
>> >                     need to be altered after
>> >                      >      >         being generated so shading happens
>> >                     there, but actually only
>> >                      >      >         relocation and the shared vendored
>> >                     dep should work
>> >                      >     elsewhere in
>> >                      >      >         the project
>> >                      >      >
>> >                      >      >
>> >                      >      > We could publish the protos and treat them
>> >                     as "external"
>> >                      >     dependencies
>> >                      >      > within the Java projects which would also
>> >                     remove this pain point.
>> >                      >
>> >
>>
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
It looks like we are agreeing to make each vendored dependency self
contained and have all their own internal dependencies packaged. For
example, gRPC and all its transitive dependencies would use
org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
dependencies would use org.apache.beam.vendored.calcite.vZZZ.

I also wanted to circle back on this question I had earlier that didn't
have any follow-up:
Currently we are relocating code depending on the version string. If the
major version is >= 1, we use only the major version within the package
string and rely on semantic versioning provided by the dependency to not
break people. If the major version is 0, we assume the dependency is
unstable and use the full version as part of the package string during
relocation.

The downside of using the full version string for relocated packages:
1) Users will end up with multiple copies of dependencies that differ only
by the minor or patch version increasing the size of their application.
2) Bumping up the version of a dependency now requires the import statement
in all java files to be updated (not too difficult with some sed/grep
skills)

The upside of using the full version string in the relocated package:
1) We don't have to worry about whether a dependency maintains semantic
versioning which means our users won't have to worry about that either.
2) This increases the odds that a user will load multiple slightly
different versions of the same dependency which is known to be incompatible
in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
Netty 4.1.28 even though they are both shaded due to issues of how JNI with
tcnative works).

My preference would be to use the full version string for import statements
(so org.apache.beam.vendor.grpc.v1_13_1...) since this would allow multiple
copies to not conflict with each other since in my opinion it is a lot more
difficult to help a user debug a dependency issue then to use string
replacement during dependency upgrades to fix import statements. Also I
would suggest we name the artifacts in Maven as follows:
groupId: org.apache.beam
artifactId: beam-vendor-grpc-v1_13_1
version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
only for patch upgrades that fix any shading issues we may have had when
producing the vendored jar)


On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <mx...@apache.org> wrote:

> Would also keep it simple and optimize for the JAR size only if necessary.
>
> On 24.10.18 00:06, Kenneth Knowles wrote:
> > I think it makes sense for each vendored dependency to be self-contained
> > as much as possible. It should keep it fairly simple. Things that cross
> > their API surface cannot be hidden, of course. Jar size is not a concern
> > IMO.
> >
> > Kenn
> >
> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lcwik@google.com
> > <ma...@google.com>> wrote:
> >
> >     How should we handle the transitive dependencies of the things we
> >     want to vendor?
> >
> >     For example we use gRPC which depends on Guava 20 and we also use
> >     Calcite which depends on Guava 19.
> >
> >     Should the vendored gRPC/Calcite/... be self-contained so it
> >     contains all its dependencies, hence vendored gRPC would contain
> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
> >     different namespaces)?
> >     This leads to larger jars but less vendored dependencies to maintain.
> >
> >     Or should we produce a vendored library for those that we want to
> >     share, e.g. Guava 20 that could be reused across multiple vendored
> >     libraries?
> >     Makes the vendoring process slightly more complicated, more
> >     dependencies to maintain, smaller jars.
> >
> >     Or should we produce a vendored library for each dependency?
> >     Lots of vendoring needed, likely tooling required to be built to
> >     maintain this.
> >
> >
> >
> >
> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <klk@google.com
> >     <ma...@google.com>> wrote:
> >
> >         I actually created the subtasks by finding things shaded by at
> >         least one module. I think each one should definitely have an on
> >         list discussion that clarifies the target artifact, namespace,
> >         version, possible complications, etc.
> >
> >         My impression is that many many modules shade only Guava. So for
> >         build time and simplification that is a big win.
> >
> >         Kenn
> >
> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <thw@apache.org
> >         <ma...@apache.org>> wrote:
> >
> >             +1 for separate artifacts
> >
> >             I would request that we explicitly discuss and agree which
> >             dependencies we vendor though.
> >
> >             Not everything listed in the JIRA subtasks is currently
> >             relocated.
> >
> >             Thomas
> >
> >
> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
> >             <david.moravek@gmail.com <ma...@gmail.com>>
> >             wrote:
> >
> >                 +1 This should improve build times a lot. It would be
> >                 great if vendored deps could stay in the main repository.
> >
> >                 D.
> >
> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
> >                 <mxm@apache.org <ma...@apache.org>> wrote:
> >
> >                     Looks great, Kenn!
> >
> >                      > Max: what is the story behind having a separate
> >                     flink-shaded repo? Did that make it easier to manage
> >                     in some way?
> >
> >                     Better separation of concerns, but I don't think
> >                     releasing the shaded
> >                     artifacts from the main repo is a problem. I'd even
> >                     prefer not to split
> >                     up the repo because it makes updates to the vendored
> >                     dependencies
> >                     slightly easier.
> >
> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
> >                      > OK, I've filed
> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
> >                      > collect sub-tasks. This has enough upsides
> >                     throughout lots of areas of
> >                      > the project that even though it is not glamorous
> >                     it seems pretty
> >                      > valuable to start on immediately. And I want to
> >                     find out if there's a
> >                      > pitfall lurking.
> >                      >
> >                      > Max: what is the story behind having a separate
> >                     flink-shaded repo? Did
> >                      > that make it easier to manage in some way?
> >                      >
> >                      > Kenn
> >                      >
> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
> >                     Michels <mxm@apache.org <ma...@apache.org>
> >                      > <mailto:mxm@apache.org <ma...@apache.org>>>
> >                     wrote:
> >                      >
> >                      >     +1 for publishing vendored Jars
> >                     independently. It will improve build
> >                      >     time and ease IntelliJ integration.
> >                      >
> >                      >     Flink also publishes shaded dependencies
> >                     separately:
> >                      >
> >                      >     - https://github.com/apache/flink-shaded
> >                      >     -
> >                     https://issues.apache.org/jira/browse/FLINK-6529
> >                      >
> >                      >     AFAIK their main motivation was to get rid of
> >                     duplicate shaded classes
> >                      >     on the classpath. We don't appear to have
> >                     that problem because we
> >                      >     already have a separate "vendor" project.
> >                      >
> >                      >      >  - With shading, it is hard (impossible?)
> >                     to step into dependency
> >                      >     code in IntelliJ's debugger, because the
> >                     actual symbol at runtime
> >                      >     does not match what is in the external jars
> >                      >
> >                      >     This would be solved by releasing the sources
> >                     of the shaded jars.
> >                      >      From a
> >                      >     legal perspective, this could be problematic
> >                     as alluded to here:
> >                      > https://github.com/apache/flink-shaded/issues/25
> >                      >
> >                      >     -Max
> >                      >
> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
> >                      >      > I have tried several times to improve the
> >                     build system and intellij
> >                      >      > integration and each attempt ended with
> >                     little progress when dealing
> >                      >      > with vendored code. My latest attempt has
> >                     been the most promising
> >                      >     where
> >                      >      > I take the vendored classes/jars and
> >                     decompile them generating the
> >                      >      > source that Intellij can then use. I have
> >                     a branch[1] that
> >                      >     demonstrates
> >                      >      > the idea. It works pretty well (and up
> >                     until a change where we
> >                      >     started
> >                      >      > vendoring gRPC, was impractical to do.
> >                     Instructions to try it out
> >                      >     are:
> >                      >      >
> >                      >      > // Clean up any remnants of prior
> >                     builds/intellij projects
> >                      >      > git clean -fdx
> >                      >      > // Generated the source for
> >                     vendored/shaded modules
> >                      >      > ./gradlew decompile
> >                      >      >
> >                      >      > // Remove the "generated" Java sources for
> >                     protos so they don't
> >                      >     conflict with the decompiled sources.
> >                      >      > rm -rf
> >                     model/pipeline/build/generated/source/proto
> >                      >      > rm -rf
> >                     model/job-management/build/generated/source/proto
> >                      >      > rm -rf
> >                     model/fn-execution/build/generated/source/proto
> >                      >      > // Import the project into Intellij, most
> >                     code completion now
> >                      >     works still some issues with a few classes.
> >                      >      > // Note that the Java decompiler doesn't
> >                     generate valid source so
> >                      >     still need to delegate to Gradle for
> >                     build/run/test actions
> >                      >      > // Other decompilers may do a better/worse
> >                     job but haven't tried
> >                      >     them.
> >                      >      >
> >                      >      >
> >                      >      > The problems that I face are that the
> >                     generated Java source from the
> >                      >      > protos and the decompiled source from the
> >                     compiled version of that
> >                      >      > source post shading are both being
> >                     imported as content roots and
> >                      >     then
> >                      >      > conflict. Also, the CFR decompiler isn't
> >                     producing valid source, if
> >                      >      > people could try others and report their
> >                     mileage, we may find one
> >                      >     that
> >                      >      > works and then we would be able to use
> >                     intellij to build/run our
> >                      >     code
> >                      >      > and not need to delegate all our
> >                     build/run/test actions to Gradle.
> >                      >      >
> >                      >      > After all these attempts I have done,
> >                     vendoring the dependencies
> >                      >     outside
> >                      >      > of the project seems like a sane approach
> >                     and unless someone
> >                      >     wants to
> >                      >      > take a stab at the best progress I have
> >                     made above, I would go
> >                      >     with what
> >                      >      > Kenn is suggesting even though it will
> >                     mean that we will need to
> >                      >     perform
> >                      >      > releases every time we want to change the
> >                     version of one of our
> >                      >     vendored
> >                      >      > dependencies.
> >                      >      >
> >                      >      > 1:
> >
> https://github.com/lukecwik/incubator-beam/tree/intellij
> >                      >      >
> >                      >      >
> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
> >                     Knowles <kenn@apache.org <ma...@apache.org>
> >                      >     <mailto:kenn@apache.org <mailto:
> kenn@apache.org>>
> >                      >      > <mailto:kenn@apache.org
> >                     <ma...@apache.org> <mailto:kenn@apache.org
> >                     <ma...@apache.org>>>> wrote:
> >                      >      >
> >                      >      >     Another reason to push on this is to
> >                     get build times down.
> >                      >     Once only
> >                      >      >     generated proto classes use the shadow
> >                     plugin we'll cut the build
> >                      >      >     time in ~half? And there is no reason
> >                     to constantly re-vendor.
> >                      >      >
> >                      >      >     Kenn
> >                      >      >
> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
> >                     Kenneth Knowles
> >                      >     <klk@google.com <ma...@google.com>
> >                     <mailto:klk@google.com <ma...@google.com>>
> >                      >      >     <mailto:klk@google.com
> >                     <ma...@google.com> <mailto:klk@google.com
> >                     <ma...@google.com>>>> wrote:
> >                      >      >
> >                      >      >         Hi all,
> >                      >      >
> >                      >      >         A while ago we had pretty good
> >                     consensus that we should
> >                      >     vendor
> >                      >      >         ("pre-shade") specific
> >                     dependencies, and there's start on it
> >                      >      >         here:
> >                     https://github.com/apache/beam/tree/master/vendor
> >                      >      >
> >                      >      >         IntelliJ notes:
> >                      >      >
> >                      >      >           - With shading, it is hard
> >                     (impossible?) to step into
> >                      >      >         dependency code in IntelliJ's
> >                     debugger, because the actual
> >                      >      >         symbol at runtime does not match
> >                     what is in the external jars
> >                      >      >
> >                      >      >
> >                      >      > Intellij can step through the classes if
> >                     they were published
> >                      >     outside the
> >                      >      > project since it can decompile them. The
> >                     source won't be legible.
> >                      >      > Decompiling the source as in my example
> >                     effectively shows the
> >                      >     same issue.
> >                      >      >
> >                      >      >           - With vendoring, if the
> >                     vendored dependencies are part
> >                      >     of the
> >                      >      >         project then IntelliJ gets
> >                     confused because it operates on
> >                      >      >         source, not the produced jars
> >                      >      >
> >                      >      >
> >                      >      > Yes, I tried several ways to get intellij
> >                     to ignore the source
> >                      >     and use
> >                      >      > the output jars but no luck.
> >                      >      >
> >                      >      >         The second one has a quick fix for
> >                     most cases*: don't
> >                      >     make the
> >                      >      >         vendored dep a subproject, but
> >                     just separately build and
> >                      >     publish
> >                      >      >         it. Since a vendored dependency
> >                     should change much more
> >                      >      >         infrequently (or if we bake the
> >                     version into the name,
> >                      >     ~never)
> >                      >      >         this means we publish once and
> >                     save headache and build
> >                      >     time forever.
> >                      >      >
> >                      >      >         WDYT? Have I overlooked something?
> >                     How about we set up
> >                      >     vendored
> >                      >      >         versions of guava, protobuf, grpc,
> >                     and publish them. We don't
> >                      >      >         have to actually start using them
> >                     yet, and can do it
> >                      >     incrementally.
> >                      >      >
> >                      >      >
> >                      >      > Currently we are relocating code depending
> >                     on the version string.
> >                      >     If the
> >                      >      > major version is >= 1, we use only the
> >                     major version within the
> >                      >     package
> >                      >      > string and rely on semantic versioning
> >                     provided by the dependency
> >                      >     to not
> >                      >      > break people. If the major version is 0,
> >                     we assume the dependency is
> >                      >      > unstable and use the full version as part
> >                     of the package string
> >                      >     during
> >                      >      > relocation.
> >                      >      >
> >                      >      > The downside of using the full version
> >                     string for relocated packages:
> >                      >      > 1) Users will end up with multiple copies
> >                     of dependencies that
> >                      >     differ
> >                      >      > only by the minor or patch version
> >                     increasing the size of their
> >                      >     application.
> >                      >      > 2) Bumping up the version of a dependency
> >                     now requires the import
> >                      >      > statement in all java files to be updated
> >                     (not too difficult with
> >                      >     some
> >                      >      > sed/grep skills)
> >                      >      >
> >                      >      > The upside of using the full version
> >                     string in the relocated package:
> >                      >      > 1) We don't have to worry about whether a
> >                     dependency maintains
> >                      >     semantic
> >                      >      > versioning which means our users won't
> >                     have to worry about that
> >                      >     either.
> >                      >      > 2) This increases the odds that a user
> >                     will load multiple slightly
> >                      >      > different versions of the same dependency
> >                     which is known to be
> >                      >      > incompatible in certain situations (e.g.
> >                     Netty 4.1.25 can't be on
> >                      >     the
> >                      >      > classpath with Netty 4.1.28 even though
> >                     they are both shaded due to
> >                      >      > issues of how JNI with tcnative works).
> >                      >      >
> >                      >      >
> >                      >      >         (side note: what do other projects
> >                     like Flink do?)
> >                      >      >
> >                      >      >         Kenn
> >                      >      >
> >                      >      >         *for generated proto classes, they
> >                     need to be altered after
> >                      >      >         being generated so shading happens
> >                     there, but actually only
> >                      >      >         relocation and the shared vendored
> >                     dep should work
> >                      >     elsewhere in
> >                      >      >         the project
> >                      >      >
> >                      >      >
> >                      >      > We could publish the protos and treat them
> >                     as "external"
> >                      >     dependencies
> >                      >      > within the Java projects which would also
> >                     remove this pain point.
> >                      >
> >
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Maximilian Michels <mx...@apache.org>.
Would also keep it simple and optimize for the JAR size only if necessary.

On 24.10.18 00:06, Kenneth Knowles wrote:
> I think it makes sense for each vendored dependency to be self-contained 
> as much as possible. It should keep it fairly simple. Things that cross 
> their API surface cannot be hidden, of course. Jar size is not a concern 
> IMO.
> 
> Kenn
> 
> On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lcwik@google.com 
> <ma...@google.com>> wrote:
> 
>     How should we handle the transitive dependencies of the things we
>     want to vendor?
> 
>     For example we use gRPC which depends on Guava 20 and we also use
>     Calcite which depends on Guava 19.
> 
>     Should the vendored gRPC/Calcite/... be self-contained so it
>     contains all its dependencies, hence vendored gRPC would contain
>     Guava 20 and vendored Calcite would contain Guava 19 (both under
>     different namespaces)?
>     This leads to larger jars but less vendored dependencies to maintain.
> 
>     Or should we produce a vendored library for those that we want to
>     share, e.g. Guava 20 that could be reused across multiple vendored
>     libraries?
>     Makes the vendoring process slightly more complicated, more
>     dependencies to maintain, smaller jars.
> 
>     Or should we produce a vendored library for each dependency?
>     Lots of vendoring needed, likely tooling required to be built to
>     maintain this.
> 
> 
> 
> 
>     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <klk@google.com
>     <ma...@google.com>> wrote:
> 
>         I actually created the subtasks by finding things shaded by at
>         least one module. I think each one should definitely have an on
>         list discussion that clarifies the target artifact, namespace,
>         version, possible complications, etc.
> 
>         My impression is that many many modules shade only Guava. So for
>         build time and simplification that is a big win.
> 
>         Kenn
> 
>         On Tue, Oct 23, 2018, 08:16 Thomas Weise <thw@apache.org
>         <ma...@apache.org>> wrote:
> 
>             +1 for separate artifacts
> 
>             I would request that we explicitly discuss and agree which
>             dependencies we vendor though.
> 
>             Not everything listed in the JIRA subtasks is currently
>             relocated.
> 
>             Thomas
> 
> 
>             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>             <david.moravek@gmail.com <ma...@gmail.com>>
>             wrote:
> 
>                 +1 This should improve build times a lot. It would be
>                 great if vendored deps could stay in the main repository.
> 
>                 D.
> 
>                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>                 <mxm@apache.org <ma...@apache.org>> wrote:
> 
>                     Looks great, Kenn!
> 
>                      > Max: what is the story behind having a separate
>                     flink-shaded repo? Did that make it easier to manage
>                     in some way?
> 
>                     Better separation of concerns, but I don't think
>                     releasing the shaded
>                     artifacts from the main repo is a problem. I'd even
>                     prefer not to split
>                     up the repo because it makes updates to the vendored
>                     dependencies
>                     slightly easier.
> 
>                     On 23.10.18 03:25, Kenneth Knowles wrote:
>                      > OK, I've filed
>                     https://issues.apache.org/jira/browse/BEAM-5819 to
>                      > collect sub-tasks. This has enough upsides
>                     throughout lots of areas of
>                      > the project that even though it is not glamorous
>                     it seems pretty
>                      > valuable to start on immediately. And I want to
>                     find out if there's a
>                      > pitfall lurking.
>                      >
>                      > Max: what is the story behind having a separate
>                     flink-shaded repo? Did
>                      > that make it easier to manage in some way?
>                      >
>                      > Kenn
>                      >
>                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>                     Michels <mxm@apache.org <ma...@apache.org>
>                      > <mailto:mxm@apache.org <ma...@apache.org>>>
>                     wrote:
>                      >
>                      >     +1 for publishing vendored Jars
>                     independently. It will improve build
>                      >     time and ease IntelliJ integration.
>                      >
>                      >     Flink also publishes shaded dependencies
>                     separately:
>                      >
>                      >     - https://github.com/apache/flink-shaded
>                      >     -
>                     https://issues.apache.org/jira/browse/FLINK-6529
>                      >
>                      >     AFAIK their main motivation was to get rid of
>                     duplicate shaded classes
>                      >     on the classpath. We don't appear to have
>                     that problem because we
>                      >     already have a separate "vendor" project.
>                      >
>                      >      >  - With shading, it is hard (impossible?)
>                     to step into dependency
>                      >     code in IntelliJ's debugger, because the
>                     actual symbol at runtime
>                      >     does not match what is in the external jars
>                      >
>                      >     This would be solved by releasing the sources
>                     of the shaded jars.
>                      >      From a
>                      >     legal perspective, this could be problematic
>                     as alluded to here:
>                      > https://github.com/apache/flink-shaded/issues/25
>                      >
>                      >     -Max
>                      >
>                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>                      >      > I have tried several times to improve the
>                     build system and intellij
>                      >      > integration and each attempt ended with
>                     little progress when dealing
>                      >      > with vendored code. My latest attempt has
>                     been the most promising
>                      >     where
>                      >      > I take the vendored classes/jars and
>                     decompile them generating the
>                      >      > source that Intellij can then use. I have
>                     a branch[1] that
>                      >     demonstrates
>                      >      > the idea. It works pretty well (and up
>                     until a change where we
>                      >     started
>                      >      > vendoring gRPC, was impractical to do.
>                     Instructions to try it out
>                      >     are:
>                      >      >
>                      >      > // Clean up any remnants of prior
>                     builds/intellij projects
>                      >      > git clean -fdx
>                      >      > // Generated the source for
>                     vendored/shaded modules
>                      >      > ./gradlew decompile
>                      >      >
>                      >      > // Remove the "generated" Java sources for
>                     protos so they don't
>                      >     conflict with the decompiled sources.
>                      >      > rm -rf
>                     model/pipeline/build/generated/source/proto
>                      >      > rm -rf
>                     model/job-management/build/generated/source/proto
>                      >      > rm -rf
>                     model/fn-execution/build/generated/source/proto
>                      >      > // Import the project into Intellij, most
>                     code completion now
>                      >     works still some issues with a few classes.
>                      >      > // Note that the Java decompiler doesn't
>                     generate valid source so
>                      >     still need to delegate to Gradle for
>                     build/run/test actions
>                      >      > // Other decompilers may do a better/worse
>                     job but haven't tried
>                      >     them.
>                      >      >
>                      >      >
>                      >      > The problems that I face are that the
>                     generated Java source from the
>                      >      > protos and the decompiled source from the
>                     compiled version of that
>                      >      > source post shading are both being
>                     imported as content roots and
>                      >     then
>                      >      > conflict. Also, the CFR decompiler isn't
>                     producing valid source, if
>                      >      > people could try others and report their
>                     mileage, we may find one
>                      >     that
>                      >      > works and then we would be able to use
>                     intellij to build/run our
>                      >     code
>                      >      > and not need to delegate all our
>                     build/run/test actions to Gradle.
>                      >      >
>                      >      > After all these attempts I have done,
>                     vendoring the dependencies
>                      >     outside
>                      >      > of the project seems like a sane approach
>                     and unless someone
>                      >     wants to
>                      >      > take a stab at the best progress I have
>                     made above, I would go
>                      >     with what
>                      >      > Kenn is suggesting even though it will
>                     mean that we will need to
>                      >     perform
>                      >      > releases every time we want to change the
>                     version of one of our
>                      >     vendored
>                      >      > dependencies.
>                      >      >
>                      >      > 1:
>                     https://github.com/lukecwik/incubator-beam/tree/intellij
>                      >      >
>                      >      >
>                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>                     Knowles <kenn@apache.org <ma...@apache.org>
>                      >     <mailto:kenn@apache.org <ma...@apache.org>>
>                      >      > <mailto:kenn@apache.org
>                     <ma...@apache.org> <mailto:kenn@apache.org
>                     <ma...@apache.org>>>> wrote:
>                      >      >
>                      >      >     Another reason to push on this is to
>                     get build times down.
>                      >     Once only
>                      >      >     generated proto classes use the shadow
>                     plugin we'll cut the build
>                      >      >     time in ~half? And there is no reason
>                     to constantly re-vendor.
>                      >      >
>                      >      >     Kenn
>                      >      >
>                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>                     Kenneth Knowles
>                      >     <klk@google.com <ma...@google.com>
>                     <mailto:klk@google.com <ma...@google.com>>
>                      >      >     <mailto:klk@google.com
>                     <ma...@google.com> <mailto:klk@google.com
>                     <ma...@google.com>>>> wrote:
>                      >      >
>                      >      >         Hi all,
>                      >      >
>                      >      >         A while ago we had pretty good
>                     consensus that we should
>                      >     vendor
>                      >      >         ("pre-shade") specific
>                     dependencies, and there's start on it
>                      >      >         here:
>                     https://github.com/apache/beam/tree/master/vendor
>                      >      >
>                      >      >         IntelliJ notes:
>                      >      >
>                      >      >           - With shading, it is hard
>                     (impossible?) to step into
>                      >      >         dependency code in IntelliJ's
>                     debugger, because the actual
>                      >      >         symbol at runtime does not match
>                     what is in the external jars
>                      >      >
>                      >      >
>                      >      > Intellij can step through the classes if
>                     they were published
>                      >     outside the
>                      >      > project since it can decompile them. The
>                     source won't be legible.
>                      >      > Decompiling the source as in my example
>                     effectively shows the
>                      >     same issue.
>                      >      >
>                      >      >           - With vendoring, if the
>                     vendored dependencies are part
>                      >     of the
>                      >      >         project then IntelliJ gets
>                     confused because it operates on
>                      >      >         source, not the produced jars
>                      >      >
>                      >      >
>                      >      > Yes, I tried several ways to get intellij
>                     to ignore the source
>                      >     and use
>                      >      > the output jars but no luck.
>                      >      >
>                      >      >         The second one has a quick fix for
>                     most cases*: don't
>                      >     make the
>                      >      >         vendored dep a subproject, but
>                     just separately build and
>                      >     publish
>                      >      >         it. Since a vendored dependency
>                     should change much more
>                      >      >         infrequently (or if we bake the
>                     version into the name,
>                      >     ~never)
>                      >      >         this means we publish once and
>                     save headache and build
>                      >     time forever.
>                      >      >
>                      >      >         WDYT? Have I overlooked something?
>                     How about we set up
>                      >     vendored
>                      >      >         versions of guava, protobuf, grpc,
>                     and publish them. We don't
>                      >      >         have to actually start using them
>                     yet, and can do it
>                      >     incrementally.
>                      >      >
>                      >      >
>                      >      > Currently we are relocating code depending
>                     on the version string.
>                      >     If the
>                      >      > major version is >= 1, we use only the
>                     major version within the
>                      >     package
>                      >      > string and rely on semantic versioning
>                     provided by the dependency
>                      >     to not
>                      >      > break people. If the major version is 0,
>                     we assume the dependency is
>                      >      > unstable and use the full version as part
>                     of the package string
>                      >     during
>                      >      > relocation.
>                      >      >
>                      >      > The downside of using the full version
>                     string for relocated packages:
>                      >      > 1) Users will end up with multiple copies
>                     of dependencies that
>                      >     differ
>                      >      > only by the minor or patch version
>                     increasing the size of their
>                      >     application.
>                      >      > 2) Bumping up the version of a dependency
>                     now requires the import
>                      >      > statement in all java files to be updated
>                     (not too difficult with
>                      >     some
>                      >      > sed/grep skills)
>                      >      >
>                      >      > The upside of using the full version
>                     string in the relocated package:
>                      >      > 1) We don't have to worry about whether a
>                     dependency maintains
>                      >     semantic
>                      >      > versioning which means our users won't
>                     have to worry about that
>                      >     either.
>                      >      > 2) This increases the odds that a user
>                     will load multiple slightly
>                      >      > different versions of the same dependency
>                     which is known to be
>                      >      > incompatible in certain situations (e.g.
>                     Netty 4.1.25 can't be on
>                      >     the
>                      >      > classpath with Netty 4.1.28 even though
>                     they are both shaded due to
>                      >      > issues of how JNI with tcnative works).
>                      >      >
>                      >      >
>                      >      >         (side note: what do other projects
>                     like Flink do?)
>                      >      >
>                      >      >         Kenn
>                      >      >
>                      >      >         *for generated proto classes, they
>                     need to be altered after
>                      >      >         being generated so shading happens
>                     there, but actually only
>                      >      >         relocation and the shared vendored
>                     dep should work
>                      >     elsewhere in
>                      >      >         the project
>                      >      >
>                      >      >
>                      >      > We could publish the protos and treat them
>                     as "external"
>                      >     dependencies
>                      >      > within the Java projects which would also
>                     remove this pain point.
>                      >
> 

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <ke...@apache.org>.
I think it makes sense for each vendored dependency to be self-contained as
much as possible. It should keep it fairly simple. Things that cross their
API surface cannot be hidden, of course. Jar size is not a concern IMO.

Kenn

On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lc...@google.com> wrote:

> How should we handle the transitive dependencies of the things we want to
> vendor?
>
> For example we use gRPC which depends on Guava 20 and we also use Calcite
> which depends on Guava 19.
>
> Should the vendored gRPC/Calcite/... be self-contained so it contains all
> its dependencies, hence vendored gRPC would contain Guava 20 and vendored
> Calcite would contain Guava 19 (both under different namespaces)?
> This leads to larger jars but less vendored dependencies to maintain.
>
> Or should we produce a vendored library for those that we want to share,
> e.g. Guava 20 that could be reused across multiple vendored libraries?
> Makes the vendoring process slightly more complicated, more dependencies
> to maintain, smaller jars.
>
> Or should we produce a vendored library for each dependency?
> Lots of vendoring needed, likely tooling required to be built to maintain
> this.
>
>
>
>
> On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <kl...@google.com> wrote:
>
>> I actually created the subtasks by finding things shaded by at least one
>> module. I think each one should definitely have an on list discussion that
>> clarifies the target artifact, namespace, version, possible complications,
>> etc.
>>
>> My impression is that many many modules shade only Guava. So for build
>> time and simplification that is a big win.
>>
>> Kenn
>>
>> On Tue, Oct 23, 2018, 08:16 Thomas Weise <th...@apache.org> wrote:
>>
>>> +1 for separate artifacts
>>>
>>> I would request that we explicitly discuss and agree which dependencies
>>> we vendor though.
>>>
>>> Not everything listed in the JIRA subtasks is currently relocated.
>>>
>>> Thomas
>>>
>>>
>>> On Tue, Oct 23, 2018 at 8:04 AM David Morávek <da...@gmail.com>
>>> wrote:
>>>
>>>> +1 This should improve build times a lot. It would be great if vendored
>>>> deps could stay in the main repository.
>>>>
>>>> D.
>>>>
>>>> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>>> Looks great, Kenn!
>>>>>
>>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>>> Did that make it easier to manage in some way?
>>>>>
>>>>> Better separation of concerns, but I don't think releasing the shaded
>>>>> artifacts from the main repo is a problem. I'd even prefer not to
>>>>> split
>>>>> up the repo because it makes updates to the vendored dependencies
>>>>> slightly easier.
>>>>>
>>>>> On 23.10.18 03:25, Kenneth Knowles wrote:
>>>>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
>>>>> > collect sub-tasks. This has enough upsides throughout lots of areas
>>>>> of
>>>>> > the project that even though it is not glamorous it seems pretty
>>>>> > valuable to start on immediately. And I want to find out if there's
>>>>> a
>>>>> > pitfall lurking.
>>>>> >
>>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>>> Did
>>>>> > that make it easier to manage in some way?
>>>>> >
>>>>> > Kenn
>>>>> >
>>>>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org
>>>>> > <ma...@apache.org>> wrote:
>>>>> >
>>>>> >     +1 for publishing vendored Jars independently. It will improve
>>>>> build
>>>>> >     time and ease IntelliJ integration.
>>>>> >
>>>>> >     Flink also publishes shaded dependencies separately:
>>>>> >
>>>>> >     - https://github.com/apache/flink-shaded
>>>>> >     - https://issues.apache.org/jira/browse/FLINK-6529
>>>>> >
>>>>> >     AFAIK their main motivation was to get rid of duplicate shaded
>>>>> classes
>>>>> >     on the classpath. We don't appear to have that problem because we
>>>>> >     already have a separate "vendor" project.
>>>>> >
>>>>> >      >  - With shading, it is hard (impossible?) to step into
>>>>> dependency
>>>>> >     code in IntelliJ's debugger, because the actual symbol at runtime
>>>>> >     does not match what is in the external jars
>>>>> >
>>>>> >     This would be solved by releasing the sources of the shaded jars.
>>>>> >      From a
>>>>> >     legal perspective, this could be problematic as alluded to here:
>>>>> >     https://github.com/apache/flink-shaded/issues/25
>>>>> >
>>>>> >     -Max
>>>>> >
>>>>> >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>>>> >      > I have tried several times to improve the build system and
>>>>> intellij
>>>>> >      > integration and each attempt ended with little progress when
>>>>> dealing
>>>>> >      > with vendored code. My latest attempt has been the most
>>>>> promising
>>>>> >     where
>>>>> >      > I take the vendored classes/jars and decompile them
>>>>> generating the
>>>>> >      > source that Intellij can then use. I have a branch[1] that
>>>>> >     demonstrates
>>>>> >      > the idea. It works pretty well (and up until a change where we
>>>>> >     started
>>>>> >      > vendoring gRPC, was impractical to do. Instructions to try it
>>>>> out
>>>>> >     are:
>>>>> >      >
>>>>> >      > // Clean up any remnants of prior builds/intellij projects
>>>>> >      > git clean -fdx
>>>>> >      > // Generated the source for vendored/shaded modules
>>>>> >      > ./gradlew decompile
>>>>> >      >
>>>>> >      > // Remove the "generated" Java sources for protos so they
>>>>> don't
>>>>> >     conflict with the decompiled sources.
>>>>> >      > rm -rf model/pipeline/build/generated/source/proto
>>>>> >      > rm -rf model/job-management/build/generated/source/proto
>>>>> >      > rm -rf model/fn-execution/build/generated/source/proto
>>>>> >      > // Import the project into Intellij, most code completion now
>>>>> >     works still some issues with a few classes.
>>>>> >      > // Note that the Java decompiler doesn't generate valid
>>>>> source so
>>>>> >     still need to delegate to Gradle for build/run/test actions
>>>>> >      > // Other decompilers may do a better/worse job but haven't
>>>>> tried
>>>>> >     them.
>>>>> >      >
>>>>> >      >
>>>>> >      > The problems that I face are that the generated Java source
>>>>> from the
>>>>> >      > protos and the decompiled source from the compiled version of
>>>>> that
>>>>> >      > source post shading are both being imported as content roots
>>>>> and
>>>>> >     then
>>>>> >      > conflict. Also, the CFR decompiler isn't producing valid
>>>>> source, if
>>>>> >      > people could try others and report their mileage, we may find
>>>>> one
>>>>> >     that
>>>>> >      > works and then we would be able to use intellij to build/run
>>>>> our
>>>>> >     code
>>>>> >      > and not need to delegate all our build/run/test actions to
>>>>> Gradle.
>>>>> >      >
>>>>> >      > After all these attempts I have done, vendoring the
>>>>> dependencies
>>>>> >     outside
>>>>> >      > of the project seems like a sane approach and unless someone
>>>>> >     wants to
>>>>> >      > take a stab at the best progress I have made above, I would go
>>>>> >     with what
>>>>> >      > Kenn is suggesting even though it will mean that we will need
>>>>> to
>>>>> >     perform
>>>>> >      > releases every time we want to change the version of one of
>>>>> our
>>>>> >     vendored
>>>>> >      > dependencies.
>>>>> >      >
>>>>> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>>>>> >      >
>>>>> >      >
>>>>> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <
>>>>> kenn@apache.org
>>>>> >     <ma...@apache.org>
>>>>> >      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
>>>>> >      >
>>>>> >      >     Another reason to push on this is to get build times down.
>>>>> >     Once only
>>>>> >      >     generated proto classes use the shadow plugin we'll cut
>>>>> the build
>>>>> >      >     time in ~half? And there is no reason to constantly
>>>>> re-vendor.
>>>>> >      >
>>>>> >      >     Kenn
>>>>> >      >
>>>>> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>>>>> >     <klk@google.com <ma...@google.com>
>>>>> >      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>>>>> >      >
>>>>> >      >         Hi all,
>>>>> >      >
>>>>> >      >         A while ago we had pretty good consensus that we
>>>>> should
>>>>> >     vendor
>>>>> >      >         ("pre-shade") specific dependencies, and there's
>>>>> start on it
>>>>> >      >         here:
>>>>> https://github.com/apache/beam/tree/master/vendor
>>>>> >      >
>>>>> >      >         IntelliJ notes:
>>>>> >      >
>>>>> >      >           - With shading, it is hard (impossible?) to step
>>>>> into
>>>>> >      >         dependency code in IntelliJ's debugger, because the
>>>>> actual
>>>>> >      >         symbol at runtime does not match what is in the
>>>>> external jars
>>>>> >      >
>>>>> >      >
>>>>> >      > Intellij can step through the classes if they were published
>>>>> >     outside the
>>>>> >      > project since it can decompile them. The source won't be
>>>>> legible.
>>>>> >      > Decompiling the source as in my example effectively shows the
>>>>> >     same issue.
>>>>> >      >
>>>>> >      >           - With vendoring, if the vendored dependencies are
>>>>> part
>>>>> >     of the
>>>>> >      >         project then IntelliJ gets confused because it
>>>>> operates on
>>>>> >      >         source, not the produced jars
>>>>> >      >
>>>>> >      >
>>>>> >      > Yes, I tried several ways to get intellij to ignore the source
>>>>> >     and use
>>>>> >      > the output jars but no luck.
>>>>> >      >
>>>>> >      >         The second one has a quick fix for most cases*: don't
>>>>> >     make the
>>>>> >      >         vendored dep a subproject, but just separately build
>>>>> and
>>>>> >     publish
>>>>> >      >         it. Since a vendored dependency should change much
>>>>> more
>>>>> >      >         infrequently (or if we bake the version into the name,
>>>>> >     ~never)
>>>>> >      >         this means we publish once and save headache and build
>>>>> >     time forever.
>>>>> >      >
>>>>> >      >         WDYT? Have I overlooked something? How about we set up
>>>>> >     vendored
>>>>> >      >         versions of guava, protobuf, grpc, and publish them.
>>>>> We don't
>>>>> >      >         have to actually start using them yet, and can do it
>>>>> >     incrementally.
>>>>> >      >
>>>>> >      >
>>>>> >      > Currently we are relocating code depending on the version
>>>>> string.
>>>>> >     If the
>>>>> >      > major version is >= 1, we use only the major version within
>>>>> the
>>>>> >     package
>>>>> >      > string and rely on semantic versioning provided by the
>>>>> dependency
>>>>> >     to not
>>>>> >      > break people. If the major version is 0, we assume the
>>>>> dependency is
>>>>> >      > unstable and use the full version as part of the package
>>>>> string
>>>>> >     during
>>>>> >      > relocation.
>>>>> >      >
>>>>> >      > The downside of using the full version string for relocated
>>>>> packages:
>>>>> >      > 1) Users will end up with multiple copies of dependencies that
>>>>> >     differ
>>>>> >      > only by the minor or patch version increasing the size of
>>>>> their
>>>>> >     application.
>>>>> >      > 2) Bumping up the version of a dependency now requires the
>>>>> import
>>>>> >      > statement in all java files to be updated (not too difficult
>>>>> with
>>>>> >     some
>>>>> >      > sed/grep skills)
>>>>> >      >
>>>>> >      > The upside of using the full version string in the relocated
>>>>> package:
>>>>> >      > 1) We don't have to worry about whether a dependency maintains
>>>>> >     semantic
>>>>> >      > versioning which means our users won't have to worry about
>>>>> that
>>>>> >     either.
>>>>> >      > 2) This increases the odds that a user will load multiple
>>>>> slightly
>>>>> >      > different versions of the same dependency which is known to be
>>>>> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't
>>>>> be on
>>>>> >     the
>>>>> >      > classpath with Netty 4.1.28 even though they are both shaded
>>>>> due to
>>>>> >      > issues of how JNI with tcnative works).
>>>>> >      >
>>>>> >      >
>>>>> >      >         (side note: what do other projects like Flink do?)
>>>>> >      >
>>>>> >      >         Kenn
>>>>> >      >
>>>>> >      >         *for generated proto classes, they need to be altered
>>>>> after
>>>>> >      >         being generated so shading happens there, but
>>>>> actually only
>>>>> >      >         relocation and the shared vendored dep should work
>>>>> >     elsewhere in
>>>>> >      >         the project
>>>>> >      >
>>>>> >      >
>>>>> >      > We could publish the protos and treat them as "external"
>>>>> >     dependencies
>>>>> >      > within the Java projects which would also remove this pain
>>>>> point.
>>>>> >
>>>>>
>>>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
How should we handle the transitive dependencies of the things we want to
vendor?

For example we use gRPC which depends on Guava 20 and we also use Calcite
which depends on Guava 19.

Should the vendored gRPC/Calcite/... be self-contained so it contains all
its dependencies, hence vendored gRPC would contain Guava 20 and vendored
Calcite would contain Guava 19 (both under different namespaces)?
This leads to larger jars but less vendored dependencies to maintain.

Or should we produce a vendored library for those that we want to share,
e.g. Guava 20 that could be reused across multiple vendored libraries?
Makes the vendoring process slightly more complicated, more dependencies to
maintain, smaller jars.

Or should we produce a vendored library for each dependency?
Lots of vendoring needed, likely tooling required to be built to maintain
this.




On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <kl...@google.com> wrote:

> I actually created the subtasks by finding things shaded by at least one
> module. I think each one should definitely have an on list discussion that
> clarifies the target artifact, namespace, version, possible complications,
> etc.
>
> My impression is that many many modules shade only Guava. So for build
> time and simplification that is a big win.
>
> Kenn
>
> On Tue, Oct 23, 2018, 08:16 Thomas Weise <th...@apache.org> wrote:
>
>> +1 for separate artifacts
>>
>> I would request that we explicitly discuss and agree which dependencies
>> we vendor though.
>>
>> Not everything listed in the JIRA subtasks is currently relocated.
>>
>> Thomas
>>
>>
>> On Tue, Oct 23, 2018 at 8:04 AM David Morávek <da...@gmail.com>
>> wrote:
>>
>>> +1 This should improve build times a lot. It would be great if vendored
>>> deps could stay in the main repository.
>>>
>>> D.
>>>
>>> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> Looks great, Kenn!
>>>>
>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>> Did that make it easier to manage in some way?
>>>>
>>>> Better separation of concerns, but I don't think releasing the shaded
>>>> artifacts from the main repo is a problem. I'd even prefer not to split
>>>> up the repo because it makes updates to the vendored dependencies
>>>> slightly easier.
>>>>
>>>> On 23.10.18 03:25, Kenneth Knowles wrote:
>>>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
>>>> > collect sub-tasks. This has enough upsides throughout lots of areas
>>>> of
>>>> > the project that even though it is not glamorous it seems pretty
>>>> > valuable to start on immediately. And I want to find out if there's a
>>>> > pitfall lurking.
>>>> >
>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>> Did
>>>> > that make it easier to manage in some way?
>>>> >
>>>> > Kenn
>>>> >
>>>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org
>>>> > <ma...@apache.org>> wrote:
>>>> >
>>>> >     +1 for publishing vendored Jars independently. It will improve
>>>> build
>>>> >     time and ease IntelliJ integration.
>>>> >
>>>> >     Flink also publishes shaded dependencies separately:
>>>> >
>>>> >     - https://github.com/apache/flink-shaded
>>>> >     - https://issues.apache.org/jira/browse/FLINK-6529
>>>> >
>>>> >     AFAIK their main motivation was to get rid of duplicate shaded
>>>> classes
>>>> >     on the classpath. We don't appear to have that problem because we
>>>> >     already have a separate "vendor" project.
>>>> >
>>>> >      >  - With shading, it is hard (impossible?) to step into
>>>> dependency
>>>> >     code in IntelliJ's debugger, because the actual symbol at runtime
>>>> >     does not match what is in the external jars
>>>> >
>>>> >     This would be solved by releasing the sources of the shaded jars.
>>>> >      From a
>>>> >     legal perspective, this could be problematic as alluded to here:
>>>> >     https://github.com/apache/flink-shaded/issues/25
>>>> >
>>>> >     -Max
>>>> >
>>>> >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>>> >      > I have tried several times to improve the build system and
>>>> intellij
>>>> >      > integration and each attempt ended with little progress when
>>>> dealing
>>>> >      > with vendored code. My latest attempt has been the most
>>>> promising
>>>> >     where
>>>> >      > I take the vendored classes/jars and decompile them generating
>>>> the
>>>> >      > source that Intellij can then use. I have a branch[1] that
>>>> >     demonstrates
>>>> >      > the idea. It works pretty well (and up until a change where we
>>>> >     started
>>>> >      > vendoring gRPC, was impractical to do. Instructions to try it
>>>> out
>>>> >     are:
>>>> >      >
>>>> >      > // Clean up any remnants of prior builds/intellij projects
>>>> >      > git clean -fdx
>>>> >      > // Generated the source for vendored/shaded modules
>>>> >      > ./gradlew decompile
>>>> >      >
>>>> >      > // Remove the "generated" Java sources for protos so they don't
>>>> >     conflict with the decompiled sources.
>>>> >      > rm -rf model/pipeline/build/generated/source/proto
>>>> >      > rm -rf model/job-management/build/generated/source/proto
>>>> >      > rm -rf model/fn-execution/build/generated/source/proto
>>>> >      > // Import the project into Intellij, most code completion now
>>>> >     works still some issues with a few classes.
>>>> >      > // Note that the Java decompiler doesn't generate valid source
>>>> so
>>>> >     still need to delegate to Gradle for build/run/test actions
>>>> >      > // Other decompilers may do a better/worse job but haven't
>>>> tried
>>>> >     them.
>>>> >      >
>>>> >      >
>>>> >      > The problems that I face are that the generated Java source
>>>> from the
>>>> >      > protos and the decompiled source from the compiled version of
>>>> that
>>>> >      > source post shading are both being imported as content roots
>>>> and
>>>> >     then
>>>> >      > conflict. Also, the CFR decompiler isn't producing valid
>>>> source, if
>>>> >      > people could try others and report their mileage, we may find
>>>> one
>>>> >     that
>>>> >      > works and then we would be able to use intellij to build/run
>>>> our
>>>> >     code
>>>> >      > and not need to delegate all our build/run/test actions to
>>>> Gradle.
>>>> >      >
>>>> >      > After all these attempts I have done, vendoring the
>>>> dependencies
>>>> >     outside
>>>> >      > of the project seems like a sane approach and unless someone
>>>> >     wants to
>>>> >      > take a stab at the best progress I have made above, I would go
>>>> >     with what
>>>> >      > Kenn is suggesting even though it will mean that we will need
>>>> to
>>>> >     perform
>>>> >      > releases every time we want to change the version of one of our
>>>> >     vendored
>>>> >      > dependencies.
>>>> >      >
>>>> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>>>> >      >
>>>> >      >
>>>> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <
>>>> kenn@apache.org
>>>> >     <ma...@apache.org>
>>>> >      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
>>>> >      >
>>>> >      >     Another reason to push on this is to get build times down.
>>>> >     Once only
>>>> >      >     generated proto classes use the shadow plugin we'll cut
>>>> the build
>>>> >      >     time in ~half? And there is no reason to constantly
>>>> re-vendor.
>>>> >      >
>>>> >      >     Kenn
>>>> >      >
>>>> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>>>> >     <klk@google.com <ma...@google.com>
>>>> >      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>>>> >      >
>>>> >      >         Hi all,
>>>> >      >
>>>> >      >         A while ago we had pretty good consensus that we should
>>>> >     vendor
>>>> >      >         ("pre-shade") specific dependencies, and there's start
>>>> on it
>>>> >      >         here:
>>>> https://github.com/apache/beam/tree/master/vendor
>>>> >      >
>>>> >      >         IntelliJ notes:
>>>> >      >
>>>> >      >           - With shading, it is hard (impossible?) to step into
>>>> >      >         dependency code in IntelliJ's debugger, because the
>>>> actual
>>>> >      >         symbol at runtime does not match what is in the
>>>> external jars
>>>> >      >
>>>> >      >
>>>> >      > Intellij can step through the classes if they were published
>>>> >     outside the
>>>> >      > project since it can decompile them. The source won't be
>>>> legible.
>>>> >      > Decompiling the source as in my example effectively shows the
>>>> >     same issue.
>>>> >      >
>>>> >      >           - With vendoring, if the vendored dependencies are
>>>> part
>>>> >     of the
>>>> >      >         project then IntelliJ gets confused because it
>>>> operates on
>>>> >      >         source, not the produced jars
>>>> >      >
>>>> >      >
>>>> >      > Yes, I tried several ways to get intellij to ignore the source
>>>> >     and use
>>>> >      > the output jars but no luck.
>>>> >      >
>>>> >      >         The second one has a quick fix for most cases*: don't
>>>> >     make the
>>>> >      >         vendored dep a subproject, but just separately build
>>>> and
>>>> >     publish
>>>> >      >         it. Since a vendored dependency should change much more
>>>> >      >         infrequently (or if we bake the version into the name,
>>>> >     ~never)
>>>> >      >         this means we publish once and save headache and build
>>>> >     time forever.
>>>> >      >
>>>> >      >         WDYT? Have I overlooked something? How about we set up
>>>> >     vendored
>>>> >      >         versions of guava, protobuf, grpc, and publish them.
>>>> We don't
>>>> >      >         have to actually start using them yet, and can do it
>>>> >     incrementally.
>>>> >      >
>>>> >      >
>>>> >      > Currently we are relocating code depending on the version
>>>> string.
>>>> >     If the
>>>> >      > major version is >= 1, we use only the major version within the
>>>> >     package
>>>> >      > string and rely on semantic versioning provided by the
>>>> dependency
>>>> >     to not
>>>> >      > break people. If the major version is 0, we assume the
>>>> dependency is
>>>> >      > unstable and use the full version as part of the package string
>>>> >     during
>>>> >      > relocation.
>>>> >      >
>>>> >      > The downside of using the full version string for relocated
>>>> packages:
>>>> >      > 1) Users will end up with multiple copies of dependencies that
>>>> >     differ
>>>> >      > only by the minor or patch version increasing the size of their
>>>> >     application.
>>>> >      > 2) Bumping up the version of a dependency now requires the
>>>> import
>>>> >      > statement in all java files to be updated (not too difficult
>>>> with
>>>> >     some
>>>> >      > sed/grep skills)
>>>> >      >
>>>> >      > The upside of using the full version string in the relocated
>>>> package:
>>>> >      > 1) We don't have to worry about whether a dependency maintains
>>>> >     semantic
>>>> >      > versioning which means our users won't have to worry about that
>>>> >     either.
>>>> >      > 2) This increases the odds that a user will load multiple
>>>> slightly
>>>> >      > different versions of the same dependency which is known to be
>>>> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't be
>>>> on
>>>> >     the
>>>> >      > classpath with Netty 4.1.28 even though they are both shaded
>>>> due to
>>>> >      > issues of how JNI with tcnative works).
>>>> >      >
>>>> >      >
>>>> >      >         (side note: what do other projects like Flink do?)
>>>> >      >
>>>> >      >         Kenn
>>>> >      >
>>>> >      >         *for generated proto classes, they need to be altered
>>>> after
>>>> >      >         being generated so shading happens there, but actually
>>>> only
>>>> >      >         relocation and the shared vendored dep should work
>>>> >     elsewhere in
>>>> >      >         the project
>>>> >      >
>>>> >      >
>>>> >      > We could publish the protos and treat them as "external"
>>>> >     dependencies
>>>> >      > within the Java projects which would also remove this pain
>>>> point.
>>>> >
>>>>
>>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <kl...@google.com>.
I actually created the subtasks by finding things shaded by at least one
module. I think each one should definitely have an on list discussion that
clarifies the target artifact, namespace, version, possible complications,
etc.

My impression is that many many modules shade only Guava. So for build time
and simplification that is a big win.

Kenn

On Tue, Oct 23, 2018, 08:16 Thomas Weise <th...@apache.org> wrote:

> +1 for separate artifacts
>
> I would request that we explicitly discuss and agree which dependencies we
> vendor though.
>
> Not everything listed in the JIRA subtasks is currently relocated.
>
> Thomas
>
>
> On Tue, Oct 23, 2018 at 8:04 AM David Morávek <da...@gmail.com>
> wrote:
>
>> +1 This should improve build times a lot. It would be great if vendored
>> deps could stay in the main repository.
>>
>> D.
>>
>> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <mx...@apache.org>
>> wrote:
>>
>>> Looks great, Kenn!
>>>
>>> > Max: what is the story behind having a separate flink-shaded repo? Did
>>> that make it easier to manage in some way?
>>>
>>> Better separation of concerns, but I don't think releasing the shaded
>>> artifacts from the main repo is a problem. I'd even prefer not to split
>>> up the repo because it makes updates to the vendored dependencies
>>> slightly easier.
>>>
>>> On 23.10.18 03:25, Kenneth Knowles wrote:
>>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
>>> > collect sub-tasks. This has enough upsides throughout lots of areas of
>>> > the project that even though it is not glamorous it seems pretty
>>> > valuable to start on immediately. And I want to find out if there's a
>>> > pitfall lurking.
>>> >
>>> > Max: what is the story behind having a separate flink-shaded repo? Did
>>> > that make it easier to manage in some way?
>>> >
>>> > Kenn
>>> >
>>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org
>>> > <ma...@apache.org>> wrote:
>>> >
>>> >     +1 for publishing vendored Jars independently. It will improve
>>> build
>>> >     time and ease IntelliJ integration.
>>> >
>>> >     Flink also publishes shaded dependencies separately:
>>> >
>>> >     - https://github.com/apache/flink-shaded
>>> >     - https://issues.apache.org/jira/browse/FLINK-6529
>>> >
>>> >     AFAIK their main motivation was to get rid of duplicate shaded
>>> classes
>>> >     on the classpath. We don't appear to have that problem because we
>>> >     already have a separate "vendor" project.
>>> >
>>> >      >  - With shading, it is hard (impossible?) to step into
>>> dependency
>>> >     code in IntelliJ's debugger, because the actual symbol at runtime
>>> >     does not match what is in the external jars
>>> >
>>> >     This would be solved by releasing the sources of the shaded jars.
>>> >      From a
>>> >     legal perspective, this could be problematic as alluded to here:
>>> >     https://github.com/apache/flink-shaded/issues/25
>>> >
>>> >     -Max
>>> >
>>> >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>> >      > I have tried several times to improve the build system and
>>> intellij
>>> >      > integration and each attempt ended with little progress when
>>> dealing
>>> >      > with vendored code. My latest attempt has been the most
>>> promising
>>> >     where
>>> >      > I take the vendored classes/jars and decompile them generating
>>> the
>>> >      > source that Intellij can then use. I have a branch[1] that
>>> >     demonstrates
>>> >      > the idea. It works pretty well (and up until a change where we
>>> >     started
>>> >      > vendoring gRPC, was impractical to do. Instructions to try it
>>> out
>>> >     are:
>>> >      >
>>> >      > // Clean up any remnants of prior builds/intellij projects
>>> >      > git clean -fdx
>>> >      > // Generated the source for vendored/shaded modules
>>> >      > ./gradlew decompile
>>> >      >
>>> >      > // Remove the "generated" Java sources for protos so they don't
>>> >     conflict with the decompiled sources.
>>> >      > rm -rf model/pipeline/build/generated/source/proto
>>> >      > rm -rf model/job-management/build/generated/source/proto
>>> >      > rm -rf model/fn-execution/build/generated/source/proto
>>> >      > // Import the project into Intellij, most code completion now
>>> >     works still some issues with a few classes.
>>> >      > // Note that the Java decompiler doesn't generate valid source
>>> so
>>> >     still need to delegate to Gradle for build/run/test actions
>>> >      > // Other decompilers may do a better/worse job but haven't tried
>>> >     them.
>>> >      >
>>> >      >
>>> >      > The problems that I face are that the generated Java source
>>> from the
>>> >      > protos and the decompiled source from the compiled version of
>>> that
>>> >      > source post shading are both being imported as content roots and
>>> >     then
>>> >      > conflict. Also, the CFR decompiler isn't producing valid
>>> source, if
>>> >      > people could try others and report their mileage, we may find
>>> one
>>> >     that
>>> >      > works and then we would be able to use intellij to build/run our
>>> >     code
>>> >      > and not need to delegate all our build/run/test actions to
>>> Gradle.
>>> >      >
>>> >      > After all these attempts I have done, vendoring the dependencies
>>> >     outside
>>> >      > of the project seems like a sane approach and unless someone
>>> >     wants to
>>> >      > take a stab at the best progress I have made above, I would go
>>> >     with what
>>> >      > Kenn is suggesting even though it will mean that we will need to
>>> >     perform
>>> >      > releases every time we want to change the version of one of our
>>> >     vendored
>>> >      > dependencies.
>>> >      >
>>> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>>> >      >
>>> >      >
>>> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <
>>> kenn@apache.org
>>> >     <ma...@apache.org>
>>> >      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
>>> >      >
>>> >      >     Another reason to push on this is to get build times down.
>>> >     Once only
>>> >      >     generated proto classes use the shadow plugin we'll cut the
>>> build
>>> >      >     time in ~half? And there is no reason to constantly
>>> re-vendor.
>>> >      >
>>> >      >     Kenn
>>> >      >
>>> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>>> >     <klk@google.com <ma...@google.com>
>>> >      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>>> >      >
>>> >      >         Hi all,
>>> >      >
>>> >      >         A while ago we had pretty good consensus that we should
>>> >     vendor
>>> >      >         ("pre-shade") specific dependencies, and there's start
>>> on it
>>> >      >         here: https://github.com/apache/beam/tree/master/vendor
>>> >      >
>>> >      >         IntelliJ notes:
>>> >      >
>>> >      >           - With shading, it is hard (impossible?) to step into
>>> >      >         dependency code in IntelliJ's debugger, because the
>>> actual
>>> >      >         symbol at runtime does not match what is in the
>>> external jars
>>> >      >
>>> >      >
>>> >      > Intellij can step through the classes if they were published
>>> >     outside the
>>> >      > project since it can decompile them. The source won't be
>>> legible.
>>> >      > Decompiling the source as in my example effectively shows the
>>> >     same issue.
>>> >      >
>>> >      >           - With vendoring, if the vendored dependencies are
>>> part
>>> >     of the
>>> >      >         project then IntelliJ gets confused because it operates
>>> on
>>> >      >         source, not the produced jars
>>> >      >
>>> >      >
>>> >      > Yes, I tried several ways to get intellij to ignore the source
>>> >     and use
>>> >      > the output jars but no luck.
>>> >      >
>>> >      >         The second one has a quick fix for most cases*: don't
>>> >     make the
>>> >      >         vendored dep a subproject, but just separately build and
>>> >     publish
>>> >      >         it. Since a vendored dependency should change much more
>>> >      >         infrequently (or if we bake the version into the name,
>>> >     ~never)
>>> >      >         this means we publish once and save headache and build
>>> >     time forever.
>>> >      >
>>> >      >         WDYT? Have I overlooked something? How about we set up
>>> >     vendored
>>> >      >         versions of guava, protobuf, grpc, and publish them. We
>>> don't
>>> >      >         have to actually start using them yet, and can do it
>>> >     incrementally.
>>> >      >
>>> >      >
>>> >      > Currently we are relocating code depending on the version
>>> string.
>>> >     If the
>>> >      > major version is >= 1, we use only the major version within the
>>> >     package
>>> >      > string and rely on semantic versioning provided by the
>>> dependency
>>> >     to not
>>> >      > break people. If the major version is 0, we assume the
>>> dependency is
>>> >      > unstable and use the full version as part of the package string
>>> >     during
>>> >      > relocation.
>>> >      >
>>> >      > The downside of using the full version string for relocated
>>> packages:
>>> >      > 1) Users will end up with multiple copies of dependencies that
>>> >     differ
>>> >      > only by the minor or patch version increasing the size of their
>>> >     application.
>>> >      > 2) Bumping up the version of a dependency now requires the
>>> import
>>> >      > statement in all java files to be updated (not too difficult
>>> with
>>> >     some
>>> >      > sed/grep skills)
>>> >      >
>>> >      > The upside of using the full version string in the relocated
>>> package:
>>> >      > 1) We don't have to worry about whether a dependency maintains
>>> >     semantic
>>> >      > versioning which means our users won't have to worry about that
>>> >     either.
>>> >      > 2) This increases the odds that a user will load multiple
>>> slightly
>>> >      > different versions of the same dependency which is known to be
>>> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't be
>>> on
>>> >     the
>>> >      > classpath with Netty 4.1.28 even though they are both shaded
>>> due to
>>> >      > issues of how JNI with tcnative works).
>>> >      >
>>> >      >
>>> >      >         (side note: what do other projects like Flink do?)
>>> >      >
>>> >      >         Kenn
>>> >      >
>>> >      >         *for generated proto classes, they need to be altered
>>> after
>>> >      >         being generated so shading happens there, but actually
>>> only
>>> >      >         relocation and the shared vendored dep should work
>>> >     elsewhere in
>>> >      >         the project
>>> >      >
>>> >      >
>>> >      > We could publish the protos and treat them as "external"
>>> >     dependencies
>>> >      > within the Java projects which would also remove this pain
>>> point.
>>> >
>>>
>>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Thomas Weise <th...@apache.org>.
+1 for separate artifacts

I would request that we explicitly discuss and agree which dependencies we
vendor though.

Not everything listed in the JIRA subtasks is currently relocated.

Thomas


On Tue, Oct 23, 2018 at 8:04 AM David Morávek <da...@gmail.com>
wrote:

> +1 This should improve build times a lot. It would be great if vendored
> deps could stay in the main repository.
>
> D.
>
> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <mx...@apache.org>
> wrote:
>
>> Looks great, Kenn!
>>
>> > Max: what is the story behind having a separate flink-shaded repo? Did
>> that make it easier to manage in some way?
>>
>> Better separation of concerns, but I don't think releasing the shaded
>> artifacts from the main repo is a problem. I'd even prefer not to split
>> up the repo because it makes updates to the vendored dependencies
>> slightly easier.
>>
>> On 23.10.18 03:25, Kenneth Knowles wrote:
>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
>> > collect sub-tasks. This has enough upsides throughout lots of areas of
>> > the project that even though it is not glamorous it seems pretty
>> > valuable to start on immediately. And I want to find out if there's a
>> > pitfall lurking.
>> >
>> > Max: what is the story behind having a separate flink-shaded repo? Did
>> > that make it easier to manage in some way?
>> >
>> > Kenn
>> >
>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     +1 for publishing vendored Jars independently. It will improve build
>> >     time and ease IntelliJ integration.
>> >
>> >     Flink also publishes shaded dependencies separately:
>> >
>> >     - https://github.com/apache/flink-shaded
>> >     - https://issues.apache.org/jira/browse/FLINK-6529
>> >
>> >     AFAIK their main motivation was to get rid of duplicate shaded
>> classes
>> >     on the classpath. We don't appear to have that problem because we
>> >     already have a separate "vendor" project.
>> >
>> >      >  - With shading, it is hard (impossible?) to step into dependency
>> >     code in IntelliJ's debugger, because the actual symbol at runtime
>> >     does not match what is in the external jars
>> >
>> >     This would be solved by releasing the sources of the shaded jars.
>> >      From a
>> >     legal perspective, this could be problematic as alluded to here:
>> >     https://github.com/apache/flink-shaded/issues/25
>> >
>> >     -Max
>> >
>> >     On 20.10.18 01:11, Lukasz Cwik wrote:
>> >      > I have tried several times to improve the build system and
>> intellij
>> >      > integration and each attempt ended with little progress when
>> dealing
>> >      > with vendored code. My latest attempt has been the most promising
>> >     where
>> >      > I take the vendored classes/jars and decompile them generating
>> the
>> >      > source that Intellij can then use. I have a branch[1] that
>> >     demonstrates
>> >      > the idea. It works pretty well (and up until a change where we
>> >     started
>> >      > vendoring gRPC, was impractical to do. Instructions to try it out
>> >     are:
>> >      >
>> >      > // Clean up any remnants of prior builds/intellij projects
>> >      > git clean -fdx
>> >      > // Generated the source for vendored/shaded modules
>> >      > ./gradlew decompile
>> >      >
>> >      > // Remove the "generated" Java sources for protos so they don't
>> >     conflict with the decompiled sources.
>> >      > rm -rf model/pipeline/build/generated/source/proto
>> >      > rm -rf model/job-management/build/generated/source/proto
>> >      > rm -rf model/fn-execution/build/generated/source/proto
>> >      > // Import the project into Intellij, most code completion now
>> >     works still some issues with a few classes.
>> >      > // Note that the Java decompiler doesn't generate valid source so
>> >     still need to delegate to Gradle for build/run/test actions
>> >      > // Other decompilers may do a better/worse job but haven't tried
>> >     them.
>> >      >
>> >      >
>> >      > The problems that I face are that the generated Java source from
>> the
>> >      > protos and the decompiled source from the compiled version of
>> that
>> >      > source post shading are both being imported as content roots and
>> >     then
>> >      > conflict. Also, the CFR decompiler isn't producing valid source,
>> if
>> >      > people could try others and report their mileage, we may find one
>> >     that
>> >      > works and then we would be able to use intellij to build/run our
>> >     code
>> >      > and not need to delegate all our build/run/test actions to
>> Gradle.
>> >      >
>> >      > After all these attempts I have done, vendoring the dependencies
>> >     outside
>> >      > of the project seems like a sane approach and unless someone
>> >     wants to
>> >      > take a stab at the best progress I have made above, I would go
>> >     with what
>> >      > Kenn is suggesting even though it will mean that we will need to
>> >     perform
>> >      > releases every time we want to change the version of one of our
>> >     vendored
>> >      > dependencies.
>> >      >
>> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>> >      >
>> >      >
>> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <
>> kenn@apache.org
>> >     <ma...@apache.org>
>> >      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
>> >      >
>> >      >     Another reason to push on this is to get build times down.
>> >     Once only
>> >      >     generated proto classes use the shadow plugin we'll cut the
>> build
>> >      >     time in ~half? And there is no reason to constantly
>> re-vendor.
>> >      >
>> >      >     Kenn
>> >      >
>> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>> >     <klk@google.com <ma...@google.com>
>> >      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>> >      >
>> >      >         Hi all,
>> >      >
>> >      >         A while ago we had pretty good consensus that we should
>> >     vendor
>> >      >         ("pre-shade") specific dependencies, and there's start
>> on it
>> >      >         here: https://github.com/apache/beam/tree/master/vendor
>> >      >
>> >      >         IntelliJ notes:
>> >      >
>> >      >           - With shading, it is hard (impossible?) to step into
>> >      >         dependency code in IntelliJ's debugger, because the
>> actual
>> >      >         symbol at runtime does not match what is in the external
>> jars
>> >      >
>> >      >
>> >      > Intellij can step through the classes if they were published
>> >     outside the
>> >      > project since it can decompile them. The source won't be legible.
>> >      > Decompiling the source as in my example effectively shows the
>> >     same issue.
>> >      >
>> >      >           - With vendoring, if the vendored dependencies are part
>> >     of the
>> >      >         project then IntelliJ gets confused because it operates
>> on
>> >      >         source, not the produced jars
>> >      >
>> >      >
>> >      > Yes, I tried several ways to get intellij to ignore the source
>> >     and use
>> >      > the output jars but no luck.
>> >      >
>> >      >         The second one has a quick fix for most cases*: don't
>> >     make the
>> >      >         vendored dep a subproject, but just separately build and
>> >     publish
>> >      >         it. Since a vendored dependency should change much more
>> >      >         infrequently (or if we bake the version into the name,
>> >     ~never)
>> >      >         this means we publish once and save headache and build
>> >     time forever.
>> >      >
>> >      >         WDYT? Have I overlooked something? How about we set up
>> >     vendored
>> >      >         versions of guava, protobuf, grpc, and publish them. We
>> don't
>> >      >         have to actually start using them yet, and can do it
>> >     incrementally.
>> >      >
>> >      >
>> >      > Currently we are relocating code depending on the version string.
>> >     If the
>> >      > major version is >= 1, we use only the major version within the
>> >     package
>> >      > string and rely on semantic versioning provided by the dependency
>> >     to not
>> >      > break people. If the major version is 0, we assume the
>> dependency is
>> >      > unstable and use the full version as part of the package string
>> >     during
>> >      > relocation.
>> >      >
>> >      > The downside of using the full version string for relocated
>> packages:
>> >      > 1) Users will end up with multiple copies of dependencies that
>> >     differ
>> >      > only by the minor or patch version increasing the size of their
>> >     application.
>> >      > 2) Bumping up the version of a dependency now requires the import
>> >      > statement in all java files to be updated (not too difficult with
>> >     some
>> >      > sed/grep skills)
>> >      >
>> >      > The upside of using the full version string in the relocated
>> package:
>> >      > 1) We don't have to worry about whether a dependency maintains
>> >     semantic
>> >      > versioning which means our users won't have to worry about that
>> >     either.
>> >      > 2) This increases the odds that a user will load multiple
>> slightly
>> >      > different versions of the same dependency which is known to be
>> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't be on
>> >     the
>> >      > classpath with Netty 4.1.28 even though they are both shaded due
>> to
>> >      > issues of how JNI with tcnative works).
>> >      >
>> >      >
>> >      >         (side note: what do other projects like Flink do?)
>> >      >
>> >      >         Kenn
>> >      >
>> >      >         *for generated proto classes, they need to be altered
>> after
>> >      >         being generated so shading happens there, but actually
>> only
>> >      >         relocation and the shared vendored dep should work
>> >     elsewhere in
>> >      >         the project
>> >      >
>> >      >
>> >      > We could publish the protos and treat them as "external"
>> >     dependencies
>> >      > within the Java projects which would also remove this pain point.
>> >
>>
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by David Morávek <da...@gmail.com>.
+1 This should improve build times a lot. It would be great if vendored
deps could stay in the main repository.

D.

On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <mx...@apache.org> wrote:

> Looks great, Kenn!
>
> > Max: what is the story behind having a separate flink-shaded repo? Did
> that make it easier to manage in some way?
>
> Better separation of concerns, but I don't think releasing the shaded
> artifacts from the main repo is a problem. I'd even prefer not to split
> up the repo because it makes updates to the vendored dependencies
> slightly easier.
>
> On 23.10.18 03:25, Kenneth Knowles wrote:
> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
> > collect sub-tasks. This has enough upsides throughout lots of areas of
> > the project that even though it is not glamorous it seems pretty
> > valuable to start on immediately. And I want to find out if there's a
> > pitfall lurking.
> >
> > Max: what is the story behind having a separate flink-shaded repo? Did
> > that make it easier to manage in some way?
> >
> > Kenn
> >
> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     +1 for publishing vendored Jars independently. It will improve build
> >     time and ease IntelliJ integration.
> >
> >     Flink also publishes shaded dependencies separately:
> >
> >     - https://github.com/apache/flink-shaded
> >     - https://issues.apache.org/jira/browse/FLINK-6529
> >
> >     AFAIK their main motivation was to get rid of duplicate shaded
> classes
> >     on the classpath. We don't appear to have that problem because we
> >     already have a separate "vendor" project.
> >
> >      >  - With shading, it is hard (impossible?) to step into dependency
> >     code in IntelliJ's debugger, because the actual symbol at runtime
> >     does not match what is in the external jars
> >
> >     This would be solved by releasing the sources of the shaded jars.
> >      From a
> >     legal perspective, this could be problematic as alluded to here:
> >     https://github.com/apache/flink-shaded/issues/25
> >
> >     -Max
> >
> >     On 20.10.18 01:11, Lukasz Cwik wrote:
> >      > I have tried several times to improve the build system and
> intellij
> >      > integration and each attempt ended with little progress when
> dealing
> >      > with vendored code. My latest attempt has been the most promising
> >     where
> >      > I take the vendored classes/jars and decompile them generating the
> >      > source that Intellij can then use. I have a branch[1] that
> >     demonstrates
> >      > the idea. It works pretty well (and up until a change where we
> >     started
> >      > vendoring gRPC, was impractical to do. Instructions to try it out
> >     are:
> >      >
> >      > // Clean up any remnants of prior builds/intellij projects
> >      > git clean -fdx
> >      > // Generated the source for vendored/shaded modules
> >      > ./gradlew decompile
> >      >
> >      > // Remove the "generated" Java sources for protos so they don't
> >     conflict with the decompiled sources.
> >      > rm -rf model/pipeline/build/generated/source/proto
> >      > rm -rf model/job-management/build/generated/source/proto
> >      > rm -rf model/fn-execution/build/generated/source/proto
> >      > // Import the project into Intellij, most code completion now
> >     works still some issues with a few classes.
> >      > // Note that the Java decompiler doesn't generate valid source so
> >     still need to delegate to Gradle for build/run/test actions
> >      > // Other decompilers may do a better/worse job but haven't tried
> >     them.
> >      >
> >      >
> >      > The problems that I face are that the generated Java source from
> the
> >      > protos and the decompiled source from the compiled version of that
> >      > source post shading are both being imported as content roots and
> >     then
> >      > conflict. Also, the CFR decompiler isn't producing valid source,
> if
> >      > people could try others and report their mileage, we may find one
> >     that
> >      > works and then we would be able to use intellij to build/run our
> >     code
> >      > and not need to delegate all our build/run/test actions to Gradle.
> >      >
> >      > After all these attempts I have done, vendoring the dependencies
> >     outside
> >      > of the project seems like a sane approach and unless someone
> >     wants to
> >      > take a stab at the best progress I have made above, I would go
> >     with what
> >      > Kenn is suggesting even though it will mean that we will need to
> >     perform
> >      > releases every time we want to change the version of one of our
> >     vendored
> >      > dependencies.
> >      >
> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
> >      >
> >      >
> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <kenn@apache.org
> >     <ma...@apache.org>
> >      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
> >      >
> >      >     Another reason to push on this is to get build times down.
> >     Once only
> >      >     generated proto classes use the shadow plugin we'll cut the
> build
> >      >     time in ~half? And there is no reason to constantly re-vendor.
> >      >
> >      >     Kenn
> >      >
> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
> >     <klk@google.com <ma...@google.com>
> >      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
> >      >
> >      >         Hi all,
> >      >
> >      >         A while ago we had pretty good consensus that we should
> >     vendor
> >      >         ("pre-shade") specific dependencies, and there's start on
> it
> >      >         here: https://github.com/apache/beam/tree/master/vendor
> >      >
> >      >         IntelliJ notes:
> >      >
> >      >           - With shading, it is hard (impossible?) to step into
> >      >         dependency code in IntelliJ's debugger, because the actual
> >      >         symbol at runtime does not match what is in the external
> jars
> >      >
> >      >
> >      > Intellij can step through the classes if they were published
> >     outside the
> >      > project since it can decompile them. The source won't be legible.
> >      > Decompiling the source as in my example effectively shows the
> >     same issue.
> >      >
> >      >           - With vendoring, if the vendored dependencies are part
> >     of the
> >      >         project then IntelliJ gets confused because it operates on
> >      >         source, not the produced jars
> >      >
> >      >
> >      > Yes, I tried several ways to get intellij to ignore the source
> >     and use
> >      > the output jars but no luck.
> >      >
> >      >         The second one has a quick fix for most cases*: don't
> >     make the
> >      >         vendored dep a subproject, but just separately build and
> >     publish
> >      >         it. Since a vendored dependency should change much more
> >      >         infrequently (or if we bake the version into the name,
> >     ~never)
> >      >         this means we publish once and save headache and build
> >     time forever.
> >      >
> >      >         WDYT? Have I overlooked something? How about we set up
> >     vendored
> >      >         versions of guava, protobuf, grpc, and publish them. We
> don't
> >      >         have to actually start using them yet, and can do it
> >     incrementally.
> >      >
> >      >
> >      > Currently we are relocating code depending on the version string.
> >     If the
> >      > major version is >= 1, we use only the major version within the
> >     package
> >      > string and rely on semantic versioning provided by the dependency
> >     to not
> >      > break people. If the major version is 0, we assume the dependency
> is
> >      > unstable and use the full version as part of the package string
> >     during
> >      > relocation.
> >      >
> >      > The downside of using the full version string for relocated
> packages:
> >      > 1) Users will end up with multiple copies of dependencies that
> >     differ
> >      > only by the minor or patch version increasing the size of their
> >     application.
> >      > 2) Bumping up the version of a dependency now requires the import
> >      > statement in all java files to be updated (not too difficult with
> >     some
> >      > sed/grep skills)
> >      >
> >      > The upside of using the full version string in the relocated
> package:
> >      > 1) We don't have to worry about whether a dependency maintains
> >     semantic
> >      > versioning which means our users won't have to worry about that
> >     either.
> >      > 2) This increases the odds that a user will load multiple slightly
> >      > different versions of the same dependency which is known to be
> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't be on
> >     the
> >      > classpath with Netty 4.1.28 even though they are both shaded due
> to
> >      > issues of how JNI with tcnative works).
> >      >
> >      >
> >      >         (side note: what do other projects like Flink do?)
> >      >
> >      >         Kenn
> >      >
> >      >         *for generated proto classes, they need to be altered
> after
> >      >         being generated so shading happens there, but actually
> only
> >      >         relocation and the shared vendored dep should work
> >     elsewhere in
> >      >         the project
> >      >
> >      >
> >      > We could publish the protos and treat them as "external"
> >     dependencies
> >      > within the Java projects which would also remove this pain point.
> >
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Maximilian Michels <mx...@apache.org>.
Looks great, Kenn!

> Max: what is the story behind having a separate flink-shaded repo? Did that make it easier to manage in some way?

Better separation of concerns, but I don't think releasing the shaded 
artifacts from the main repo is a problem. I'd even prefer not to split 
up the repo because it makes updates to the vendored dependencies 
slightly easier.

On 23.10.18 03:25, Kenneth Knowles wrote:
> OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to 
> collect sub-tasks. This has enough upsides throughout lots of areas of 
> the project that even though it is not glamorous it seems pretty 
> valuable to start on immediately. And I want to find out if there's a 
> pitfall lurking.
> 
> Max: what is the story behind having a separate flink-shaded repo? Did 
> that make it easier to manage in some way?
> 
> Kenn
> 
> On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     +1 for publishing vendored Jars independently. It will improve build
>     time and ease IntelliJ integration.
> 
>     Flink also publishes shaded dependencies separately:
> 
>     - https://github.com/apache/flink-shaded
>     - https://issues.apache.org/jira/browse/FLINK-6529
> 
>     AFAIK their main motivation was to get rid of duplicate shaded classes
>     on the classpath. We don't appear to have that problem because we
>     already have a separate "vendor" project.
> 
>      >  - With shading, it is hard (impossible?) to step into dependency
>     code in IntelliJ's debugger, because the actual symbol at runtime
>     does not match what is in the external jars
> 
>     This would be solved by releasing the sources of the shaded jars.
>      From a
>     legal perspective, this could be problematic as alluded to here:
>     https://github.com/apache/flink-shaded/issues/25
> 
>     -Max
> 
>     On 20.10.18 01:11, Lukasz Cwik wrote:
>      > I have tried several times to improve the build system and intellij
>      > integration and each attempt ended with little progress when dealing
>      > with vendored code. My latest attempt has been the most promising
>     where
>      > I take the vendored classes/jars and decompile them generating the
>      > source that Intellij can then use. I have a branch[1] that
>     demonstrates
>      > the idea. It works pretty well (and up until a change where we
>     started
>      > vendoring gRPC, was impractical to do. Instructions to try it out
>     are:
>      >
>      > // Clean up any remnants of prior builds/intellij projects
>      > git clean -fdx
>      > // Generated the source for vendored/shaded modules
>      > ./gradlew decompile
>      >
>      > // Remove the "generated" Java sources for protos so they don't
>     conflict with the decompiled sources.
>      > rm -rf model/pipeline/build/generated/source/proto
>      > rm -rf model/job-management/build/generated/source/proto
>      > rm -rf model/fn-execution/build/generated/source/proto
>      > // Import the project into Intellij, most code completion now
>     works still some issues with a few classes.
>      > // Note that the Java decompiler doesn't generate valid source so
>     still need to delegate to Gradle for build/run/test actions
>      > // Other decompilers may do a better/worse job but haven't tried
>     them.
>      >
>      >
>      > The problems that I face are that the generated Java source from the
>      > protos and the decompiled source from the compiled version of that
>      > source post shading are both being imported as content roots and
>     then
>      > conflict. Also, the CFR decompiler isn't producing valid source, if
>      > people could try others and report their mileage, we may find one
>     that
>      > works and then we would be able to use intellij to build/run our
>     code
>      > and not need to delegate all our build/run/test actions to Gradle.
>      >
>      > After all these attempts I have done, vendoring the dependencies
>     outside
>      > of the project seems like a sane approach and unless someone
>     wants to
>      > take a stab at the best progress I have made above, I would go
>     with what
>      > Kenn is suggesting even though it will mean that we will need to
>     perform
>      > releases every time we want to change the version of one of our
>     vendored
>      > dependencies.
>      >
>      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>      >
>      >
>      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <kenn@apache.org
>     <ma...@apache.org>
>      > <mailto:kenn@apache.org <ma...@apache.org>>> wrote:
>      >
>      >     Another reason to push on this is to get build times down.
>     Once only
>      >     generated proto classes use the shadow plugin we'll cut the build
>      >     time in ~half? And there is no reason to constantly re-vendor.
>      >
>      >     Kenn
>      >
>      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>     <klk@google.com <ma...@google.com>
>      >     <mailto:klk@google.com <ma...@google.com>>> wrote:
>      >
>      >         Hi all,
>      >
>      >         A while ago we had pretty good consensus that we should
>     vendor
>      >         ("pre-shade") specific dependencies, and there's start on it
>      >         here: https://github.com/apache/beam/tree/master/vendor
>      >
>      >         IntelliJ notes:
>      >
>      >           - With shading, it is hard (impossible?) to step into
>      >         dependency code in IntelliJ's debugger, because the actual
>      >         symbol at runtime does not match what is in the external jars
>      >
>      >
>      > Intellij can step through the classes if they were published
>     outside the
>      > project since it can decompile them. The source won't be legible.
>      > Decompiling the source as in my example effectively shows the
>     same issue.
>      >
>      >           - With vendoring, if the vendored dependencies are part
>     of the
>      >         project then IntelliJ gets confused because it operates on
>      >         source, not the produced jars
>      >
>      >
>      > Yes, I tried several ways to get intellij to ignore the source
>     and use
>      > the output jars but no luck.
>      >
>      >         The second one has a quick fix for most cases*: don't
>     make the
>      >         vendored dep a subproject, but just separately build and
>     publish
>      >         it. Since a vendored dependency should change much more
>      >         infrequently (or if we bake the version into the name,
>     ~never)
>      >         this means we publish once and save headache and build
>     time forever.
>      >
>      >         WDYT? Have I overlooked something? How about we set up
>     vendored
>      >         versions of guava, protobuf, grpc, and publish them. We don't
>      >         have to actually start using them yet, and can do it
>     incrementally.
>      >
>      >
>      > Currently we are relocating code depending on the version string.
>     If the
>      > major version is >= 1, we use only the major version within the
>     package
>      > string and rely on semantic versioning provided by the dependency
>     to not
>      > break people. If the major version is 0, we assume the dependency is
>      > unstable and use the full version as part of the package string
>     during
>      > relocation.
>      >
>      > The downside of using the full version string for relocated packages:
>      > 1) Users will end up with multiple copies of dependencies that
>     differ
>      > only by the minor or patch version increasing the size of their
>     application.
>      > 2) Bumping up the version of a dependency now requires the import
>      > statement in all java files to be updated (not too difficult with
>     some
>      > sed/grep skills)
>      >
>      > The upside of using the full version string in the relocated package:
>      > 1) We don't have to worry about whether a dependency maintains
>     semantic
>      > versioning which means our users won't have to worry about that
>     either.
>      > 2) This increases the odds that a user will load multiple slightly
>      > different versions of the same dependency which is known to be
>      > incompatible in certain situations (e.g. Netty 4.1.25 can't be on
>     the
>      > classpath with Netty 4.1.28 even though they are both shaded due to
>      > issues of how JNI with tcnative works).
>      >
>      >
>      >         (side note: what do other projects like Flink do?)
>      >
>      >         Kenn
>      >
>      >         *for generated proto classes, they need to be altered after
>      >         being generated so shading happens there, but actually only
>      >         relocation and the shared vendored dep should work
>     elsewhere in
>      >         the project
>      >
>      >
>      > We could publish the protos and treat them as "external"
>     dependencies
>      > within the Java projects which would also remove this pain point.
> 

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <ke...@apache.org>.
OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to collect
sub-tasks. This has enough upsides throughout lots of areas of the project
that even though it is not glamorous it seems pretty valuable to start on
immediately. And I want to find out if there's a pitfall lurking.

Max: what is the story behind having a separate flink-shaded repo? Did that
make it easier to manage in some way?

Kenn

On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mx...@apache.org> wrote:

> +1 for publishing vendored Jars independently. It will improve build
> time and ease IntelliJ integration.
>
> Flink also publishes shaded dependencies separately:
>
> - https://github.com/apache/flink-shaded
> - https://issues.apache.org/jira/browse/FLINK-6529
>
> AFAIK their main motivation was to get rid of duplicate shaded classes
> on the classpath. We don't appear to have that problem because we
> already have a separate "vendor" project.
>
> >  - With shading, it is hard (impossible?) to step into dependency code
> in IntelliJ's debugger, because the actual symbol at runtime does not match
> what is in the external jars
>
> This would be solved by releasing the sources of the shaded jars. From a
> legal perspective, this could be problematic as alluded to here:
> https://github.com/apache/flink-shaded/issues/25
>
> -Max
>
> On 20.10.18 01:11, Lukasz Cwik wrote:
> > I have tried several times to improve the build system and intellij
> > integration and each attempt ended with little progress when dealing
> > with vendored code. My latest attempt has been the most promising where
> > I take the vendored classes/jars and decompile them generating the
> > source that Intellij can then use. I have a branch[1] that demonstrates
> > the idea. It works pretty well (and up until a change where we started
> > vendoring gRPC, was impractical to do. Instructions to try it out are:
> >
> > // Clean up any remnants of prior builds/intellij projects
> > git clean -fdx
> > // Generated the source for vendored/shaded modules
> > ./gradlew decompile
> >
> > // Remove the "generated" Java sources for protos so they don't conflict
> with the decompiled sources.
> > rm -rf model/pipeline/build/generated/source/proto
> > rm -rf model/job-management/build/generated/source/proto
> > rm -rf model/fn-execution/build/generated/source/proto
> > // Import the project into Intellij, most code completion now works
> still some issues with a few classes.
> > // Note that the Java decompiler doesn't generate valid source so still
> need to delegate to Gradle for build/run/test actions
> > // Other decompilers may do a better/worse job but haven't tried them.
> >
> >
> > The problems that I face are that the generated Java source from the
> > protos and the decompiled source from the compiled version of that
> > source post shading are both being imported as content roots and then
> > conflict. Also, the CFR decompiler isn't producing valid source, if
> > people could try others and report their mileage, we may find one that
> > works and then we would be able to use intellij to build/run our code
> > and not need to delegate all our build/run/test actions to Gradle.
> >
> > After all these attempts I have done, vendoring the dependencies outside
> > of the project seems like a sane approach and unless someone wants to
> > take a stab at the best progress I have made above, I would go with what
> > Kenn is suggesting even though it will mean that we will need to perform
> > releases every time we want to change the version of one of our vendored
> > dependencies.
> >
> > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
> >
> >
> > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <kenn@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Another reason to push on this is to get build times down. Once only
> >     generated proto classes use the shadow plugin we'll cut the build
> >     time in ~half? And there is no reason to constantly re-vendor.
> >
> >     Kenn
> >
> >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <klk@google.com
> >     <ma...@google.com>> wrote:
> >
> >         Hi all,
> >
> >         A while ago we had pretty good consensus that we should vendor
> >         ("pre-shade") specific dependencies, and there's start on it
> >         here: https://github.com/apache/beam/tree/master/vendor
> >
> >         IntelliJ notes:
> >
> >           - With shading, it is hard (impossible?) to step into
> >         dependency code in IntelliJ's debugger, because the actual
> >         symbol at runtime does not match what is in the external jars
> >
> >
> > Intellij can step through the classes if they were published outside the
> > project since it can decompile them. The source won't be legible.
> > Decompiling the source as in my example effectively shows the same issue.
> >
> >           - With vendoring, if the vendored dependencies are part of the
> >         project then IntelliJ gets confused because it operates on
> >         source, not the produced jars
> >
> >
> > Yes, I tried several ways to get intellij to ignore the source and use
> > the output jars but no luck.
> >
> >         The second one has a quick fix for most cases*: don't make the
> >         vendored dep a subproject, but just separately build and publish
> >         it. Since a vendored dependency should change much more
> >         infrequently (or if we bake the version into the name, ~never)
> >         this means we publish once and save headache and build time
> forever.
> >
> >         WDYT? Have I overlooked something? How about we set up vendored
> >         versions of guava, protobuf, grpc, and publish them. We don't
> >         have to actually start using them yet, and can do it
> incrementally.
> >
> >
> > Currently we are relocating code depending on the version string. If the
> > major version is >= 1, we use only the major version within the package
> > string and rely on semantic versioning provided by the dependency to not
> > break people. If the major version is 0, we assume the dependency is
> > unstable and use the full version as part of the package string during
> > relocation.
> >
> > The downside of using the full version string for relocated packages:
> > 1) Users will end up with multiple copies of dependencies that differ
> > only by the minor or patch version increasing the size of their
> application.
> > 2) Bumping up the version of a dependency now requires the import
> > statement in all java files to be updated (not too difficult with some
> > sed/grep skills)
> >
> > The upside of using the full version string in the relocated package:
> > 1) We don't have to worry about whether a dependency maintains semantic
> > versioning which means our users won't have to worry about that either.
> > 2) This increases the odds that a user will load multiple slightly
> > different versions of the same dependency which is known to be
> > incompatible in certain situations (e.g. Netty 4.1.25 can't be on the
> > classpath with Netty 4.1.28 even though they are both shaded due to
> > issues of how JNI with tcnative works).
> >
> >
> >         (side note: what do other projects like Flink do?)
> >
> >         Kenn
> >
> >         *for generated proto classes, they need to be altered after
> >         being generated so shading happens there, but actually only
> >         relocation and the shared vendored dep should work elsewhere in
> >         the project
> >
> >
> > We could publish the protos and treat them as "external" dependencies
> > within the Java projects which would also remove this pain point.
>

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Maximilian Michels <mx...@apache.org>.
+1 for publishing vendored Jars independently. It will improve build 
time and ease IntelliJ integration.

Flink also publishes shaded dependencies separately:

- https://github.com/apache/flink-shaded
- https://issues.apache.org/jira/browse/FLINK-6529

AFAIK their main motivation was to get rid of duplicate shaded classes 
on the classpath. We don't appear to have that problem because we 
already have a separate "vendor" project.

>  - With shading, it is hard (impossible?) to step into dependency code in IntelliJ's debugger, because the actual symbol at runtime does not match what is in the external jars

This would be solved by releasing the sources of the shaded jars. From a 
legal perspective, this could be problematic as alluded to here: 
https://github.com/apache/flink-shaded/issues/25

-Max

On 20.10.18 01:11, Lukasz Cwik wrote:
> I have tried several times to improve the build system and intellij 
> integration and each attempt ended with little progress when dealing 
> with vendored code. My latest attempt has been the most promising where 
> I take the vendored classes/jars and decompile them generating the 
> source that Intellij can then use. I have a branch[1] that demonstrates 
> the idea. It works pretty well (and up until a change where we started 
> vendoring gRPC, was impractical to do. Instructions to try it out are:
> 
> // Clean up any remnants of prior builds/intellij projects
> git clean -fdx
> // Generated the source for vendored/shaded modules
> ./gradlew decompile
> 
> // Remove the "generated" Java sources for protos so they don't conflict with the decompiled sources.
> rm -rf model/pipeline/build/generated/source/proto
> rm -rf model/job-management/build/generated/source/proto
> rm -rf model/fn-execution/build/generated/source/proto
> // Import the project into Intellij, most code completion now works still some issues with a few classes.
> // Note that the Java decompiler doesn't generate valid source so still need to delegate to Gradle for build/run/test actions
> // Other decompilers may do a better/worse job but haven't tried them.
> 
> 
> The problems that I face are that the generated Java source from the 
> protos and the decompiled source from the compiled version of that 
> source post shading are both being imported as content roots and then 
> conflict. Also, the CFR decompiler isn't producing valid source, if 
> people could try others and report their mileage, we may find one that 
> works and then we would be able to use intellij to build/run our code 
> and not need to delegate all our build/run/test actions to Gradle.
> 
> After all these attempts I have done, vendoring the dependencies outside 
> of the project seems like a sane approach and unless someone wants to 
> take a stab at the best progress I have made above, I would go with what 
> Kenn is suggesting even though it will mean that we will need to perform 
> releases every time we want to change the version of one of our vendored 
> dependencies.
> 
> 1: https://github.com/lukecwik/incubator-beam/tree/intellij
> 
> 
> On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <kenn@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Another reason to push on this is to get build times down. Once only
>     generated proto classes use the shadow plugin we'll cut the build
>     time in ~half? And there is no reason to constantly re-vendor.
> 
>     Kenn
> 
>     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <klk@google.com
>     <ma...@google.com>> wrote:
> 
>         Hi all,
> 
>         A while ago we had pretty good consensus that we should vendor
>         ("pre-shade") specific dependencies, and there's start on it
>         here: https://github.com/apache/beam/tree/master/vendor
> 
>         IntelliJ notes:
> 
>           - With shading, it is hard (impossible?) to step into
>         dependency code in IntelliJ's debugger, because the actual
>         symbol at runtime does not match what is in the external jars
> 
> 
> Intellij can step through the classes if they were published outside the 
> project since it can decompile them. The source won't be legible. 
> Decompiling the source as in my example effectively shows the same issue.
> 
>           - With vendoring, if the vendored dependencies are part of the
>         project then IntelliJ gets confused because it operates on
>         source, not the produced jars
> 
> 
> Yes, I tried several ways to get intellij to ignore the source and use 
> the output jars but no luck.
> 
>         The second one has a quick fix for most cases*: don't make the
>         vendored dep a subproject, but just separately build and publish
>         it. Since a vendored dependency should change much more
>         infrequently (or if we bake the version into the name, ~never)
>         this means we publish once and save headache and build time forever.
> 
>         WDYT? Have I overlooked something? How about we set up vendored
>         versions of guava, protobuf, grpc, and publish them. We don't
>         have to actually start using them yet, and can do it incrementally.
> 
> 
> Currently we are relocating code depending on the version string. If the 
> major version is >= 1, we use only the major version within the package 
> string and rely on semantic versioning provided by the dependency to not 
> break people. If the major version is 0, we assume the dependency is 
> unstable and use the full version as part of the package string during 
> relocation.
> 
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ 
> only by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import 
> statement in all java files to be updated (not too difficult with some 
> sed/grep skills)
> 
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic 
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly 
> different versions of the same dependency which is known to be 
> incompatible in certain situations (e.g. Netty 4.1.25 can't be on the 
> classpath with Netty 4.1.28 even though they are both shaded due to 
> issues of how JNI with tcnative works).
> 
> 
>         (side note: what do other projects like Flink do?)
> 
>         Kenn
> 
>         *for generated proto classes, they need to be altered after
>         being generated so shading happens there, but actually only
>         relocation and the shared vendored dep should work elsewhere in
>         the project
> 
> 
> We could publish the protos and treat them as "external" dependencies 
> within the Java projects which would also remove this pain point.

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Lukasz Cwik <lc...@google.com>.
I have tried several times to improve the build system and intellij
integration and each attempt ended with little progress when dealing with
vendored code. My latest attempt has been the most promising where I take
the vendored classes/jars and decompile them generating the source that
Intellij can then use. I have a branch[1] that demonstrates the idea. It
works pretty well (and up until a change where we started vendoring gRPC,
was impractical to do. Instructions to try it out are:

// Clean up any remnants of prior builds/intellij projects
git clean -fdx
// Generated the source for vendored/shaded modules
./gradlew decompile

// Remove the "generated" Java sources for protos so they don't
conflict with the decompiled sources.
rm -rf model/pipeline/build/generated/source/proto
rm -rf model/job-management/build/generated/source/proto
rm -rf model/fn-execution/build/generated/source/proto
// Import the project into Intellij, most code completion now works
still some issues with a few classes.
// Note that the Java decompiler doesn't generate valid source so
still need to delegate to Gradle for build/run/test actions
// Other decompilers may do a better/worse job but haven't tried them.


The problems that I face are that the generated Java source from the protos
and the decompiled source from the compiled version of that source post
shading are both being imported as content roots and then conflict. Also,
the CFR decompiler isn't producing valid source, if people could try others
and report their mileage, we may find one that works and then we would be
able to use intellij to build/run our code and not need to delegate all our
build/run/test actions to Gradle.

After all these attempts I have done, vendoring the dependencies outside of
the project seems like a sane approach and unless someone wants to take a
stab at the best progress I have made above, I would go with what Kenn is
suggesting even though it will mean that we will need to perform releases
every time we want to change the version of one of our vendored
dependencies.

1: https://github.com/lukecwik/incubator-beam/tree/intellij


On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <ke...@apache.org> wrote:

> Another reason to push on this is to get build times down. Once only
> generated proto classes use the shadow plugin we'll cut the build time in
> ~half? And there is no reason to constantly re-vendor.
>
> Kenn
>
> On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com> wrote:
>
>> Hi all,
>>
>> A while ago we had pretty good consensus that we should vendor
>> ("pre-shade") specific dependencies, and there's start on it here:
>> https://github.com/apache/beam/tree/master/vendor
>>
>> IntelliJ notes:
>>
>>  - With shading, it is hard (impossible?) to step into dependency code in
>> IntelliJ's debugger, because the actual symbol at runtime does not match
>> what is in the external jars
>>
>
Intellij can step through the classes if they were published outside the
project since it can decompile them. The source won't be legible.
Decompiling the source as in my example effectively shows the same issue.


>  - With vendoring, if the vendored dependencies are part of the project
>> then IntelliJ gets confused because it operates on source, not the produced
>> jars
>>
>
Yes, I tried several ways to get intellij to ignore the source and use the
output jars but no luck.


> The second one has a quick fix for most cases*: don't make the vendored
>> dep a subproject, but just separately build and publish it. Since a
>> vendored dependency should change much more infrequently (or if we bake the
>> version into the name, ~never) this means we publish once and save headache
>> and build time forever.
>>
>> WDYT? Have I overlooked something? How about we set up vendored versions
>> of guava, protobuf, grpc, and publish them. We don't have to actually start
>> using them yet, and can do it incrementally.
>>
>
Currently we are relocating code depending on the version string. If the
major version is >= 1, we use only the major version within the package
string and rely on semantic versioning provided by the dependency to not
break people. If the major version is 0, we assume the dependency is
unstable and use the full version as part of the package string during
relocation.

The downside of using the full version string for relocated packages:
1) Users will end up with multiple copies of dependencies that differ only
by the minor or patch version increasing the size of their application.
2) Bumping up the version of a dependency now requires the import statement
in all java files to be updated (not too difficult with some sed/grep
skills)

The upside of using the full version string in the relocated package:
1) We don't have to worry about whether a dependency maintains semantic
versioning which means our users won't have to worry about that either.
2) This increases the odds that a user will load multiple slightly
different versions of the same dependency which is known to be incompatible
in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
Netty 4.1.28 even though they are both shaded due to issues of how JNI with
tcnative works).


>
>> (side note: what do other projects like Flink do?)
>>
>> Kenn
>>
>> *for generated proto classes, they need to be altered after being
>> generated so shading happens there, but actually only relocation and the
>> shared vendored dep should work elsewhere in the project
>>
>
We could publish the protos and treat them as "external" dependencies
within the Java projects which would also remove this pain point.

Re: [DISCUSS] Publish vendored dependencies independently

Posted by Kenneth Knowles <ke...@apache.org>.
Another reason to push on this is to get build times down. Once only
generated proto classes use the shadow plugin we'll cut the build time in
~half? And there is no reason to constantly re-vendor.

Kenn

On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com> wrote:

> Hi all,
>
> A while ago we had pretty good consensus that we should vendor
> ("pre-shade") specific dependencies, and there's start on it here:
> https://github.com/apache/beam/tree/master/vendor
>
> IntelliJ notes:
>
>  - With shading, it is hard (impossible?) to step into dependency code in
> IntelliJ's debugger, because the actual symbol at runtime does not match
> what is in the external jars
>  - With vendoring, if the vendored dependencies are part of the project
> then IntelliJ gets confused because it operates on source, not the produced
> jars
>
> The second one has a quick fix for most cases*: don't make the vendored
> dep a subproject, but just separately build and publish it. Since a
> vendored dependency should change much more infrequently (or if we bake the
> version into the name, ~never) this means we publish once and save headache
> and build time forever.
>
> WDYT? Have I overlooked something? How about we set up vendored versions
> of guava, protobuf, grpc, and publish them. We don't have to actually start
> using them yet, and can do it incrementally.
>
> (side note: what do other projects like Flink do?)
>
> Kenn
>
> *for generated proto classes, they need to be altered after being
> generated so shading happens there, but actually only relocation and the
> shared vendored dep should work elsewhere in the project
>