You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Evan Galpin <eg...@apache.org> on 2022/07/21 17:15:09 UTC

[Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Hi all,

I'm trying to test a change I've made locally, but by validating it on
Dataflow.  It works locally, but I want to validate on Dataflow.  I've
tried a few different attempts at module substitution in the build.gradle
config file for the pipeline I'm trying to deploy, but I haven't had any
success yet.

How might I be able to replace the beam-sdks-java-io-google-cloud-platform
module usually installed from maven with a local jar generated from
running:

"./gradlew :sdk:java:io:google-cloud-platform:jar"

Thanks,
Evan

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <eg...@apache.org>.
One final note of clarification: the pom file needs to be in the same
directory as the jar

On Fri, Jul 22, 2022 at 11:01 Evan Galpin <eg...@apache.org> wrote:

> It's working! Huge thank you to Steve Niemitz who pointed out the need for
> "--experiments=enable_custom_pubsub_sink" to prevent dataflow override
> for the module that I wanted to use custom source.
>
> Here is my full process in case it's helpful to anyone in the future (note
> one might need to change the version identifiers):
>
>
>    1. Modify files in sdks/java/io/google-cloud-platform
>    2. Add id 'com.github.johnrengelman.shadow' to plugins in
>    sdks/java/io/google-cloud-platform/build.gradle
>    3. Build a shadowJar via "./gradlew
>    :sdk:java:io:google-cloud-platform:shadowJar"
>    4. Copy the shadowJar from
>    my/path/to/beam/sdks/java/io/google-cloud-platform/build/libs/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT-all.jar
>    to
>    my/path/to/user/pipeline/top-level/libs/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.40.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT.jar
>    5. Add a pom file for the shadowJar (to emulate local maven repo):
>
>    <?xml version="1.0" encoding="utf-8"?>
>    <project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>    http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="
>    http://maven.apache.org/POM/4.0.0" xmlns:xsi="
>    http://www.w3.org/2001/XMLSchema-instance">
>        <modelVersion>4.0.0</modelVersion>
>        <groupId>org.apache.beam</groupId>
>        <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
>        <version>2.40.0-SNAPSHOT</version>
>    </project>
>
>    6. In user code pipeline "build.gradle", add a local maven repo (note
>    "./libs" is from "my/path/to/user/pipeline/top-level/libs")
>
>          repositories {
>            maven {
>                url = uri('./libs')
>            }
>            ... other repos ...
>         }
>
>    7. In user code pipeline "build.gradle", implement dependency
>    replacement of the SDK version of beam-sdks-java-io-google-cloud-platform
>
>    configurations {
>        all {
>            resolutionStrategy.dependencySubstitution {
>                substitute
>    module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0")
>    using
>    module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0-SNAPSHOT")
>            }
>        }
>    }
>
>    8. Deploy the user code pipeline including the flag:
>    --experiments=enable_custom_pubsub_sink
>
>
>
> On Thu, Jul 21, 2022 at 4:42 PM Evan Galpin <ev...@gmail.com> wrote:
>
>> Thanks Tomo, I'll check that out too as a good safeguard!  Are you
>> familiar with any process to build pre-release artifacts?  I suppose that's
>> really what I'm after is building a pre-release version of pubsubIO to
>> validate in Dataflow.
>>
>> - Evan
>>
>>
>> On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
>> wrote:
>>
>>> I don't come up with a solution (I'm not familiar with the method
>>> you're using). However I often use "getProtectionDomain()"
>>> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
>>> class. This ensures the class you modified is actually used.
>>>
>>> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Spoke too soon... still can't seem to get the new behaviour to appear
>>>> in dataflow, possibly something is being overridden?
>>>>
>>>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform"
>>>>> looks to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>>>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>>>> source and used the resulting jar as a dependency replacement when
>>>>> deploying the job to dataflow.  Looks ok.
>>>>>
>>>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I believe I have the dependencySubstitution working, but it seems as
>>>>>> though the substitution is removing transitive deps of
>>>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>>>
>>>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm trying to test a change I've made locally, but by validating it
>>>>>>> on Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>>>> success yet.
>>>>>>>
>>>>>>> How might I be able to replace the
>>>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>>>> with a local jar generated from running:
>>>>>>>
>>>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Evan
>>>>>>>
>>>>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <eg...@apache.org>.
It's working! Huge thank you to Steve Niemitz who pointed out the need for
"--experiments=enable_custom_pubsub_sink" to prevent dataflow override for
the module that I wanted to use custom source.

Here is my full process in case it's helpful to anyone in the future (note
one might need to change the version identifiers):


   1. Modify files in sdks/java/io/google-cloud-platform
   2. Add id 'com.github.johnrengelman.shadow' to plugins in
   sdks/java/io/google-cloud-platform/build.gradle
   3. Build a shadowJar via "./gradlew
   :sdk:java:io:google-cloud-platform:shadowJar"
   4. Copy the shadowJar from
   my/path/to/beam/sdks/java/io/google-cloud-platform/build/libs/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT-all.jar
   to
   my/path/to/user/pipeline/top-level/libs/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.40.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT.jar
   5. Add a pom file for the shadowJar (to emulate local maven repo):

   <?xml version="1.0" encoding="utf-8"?>
   <project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
   http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="
   http://maven.apache.org/POM/4.0.0" xmlns:xsi="
   http://www.w3.org/2001/XMLSchema-instance">
       <modelVersion>4.0.0</modelVersion>
       <groupId>org.apache.beam</groupId>
       <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
       <version>2.40.0-SNAPSHOT</version>
   </project>

   6. In user code pipeline "build.gradle", add a local maven repo (note
   "./libs" is from "my/path/to/user/pipeline/top-level/libs")

         repositories {
           maven {
               url = uri('./libs')
           }
           ... other repos ...
        }

   7. In user code pipeline "build.gradle", implement dependency
   replacement of the SDK version of beam-sdks-java-io-google-cloud-platform

   configurations {
       all {
           resolutionStrategy.dependencySubstitution {
               substitute
   module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0")
   using
   module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0-SNAPSHOT")
           }
       }
   }

   8. Deploy the user code pipeline including the flag:
   --experiments=enable_custom_pubsub_sink



On Thu, Jul 21, 2022 at 4:42 PM Evan Galpin <ev...@gmail.com> wrote:

> Thanks Tomo, I'll check that out too as a good safeguard!  Are you
> familiar with any process to build pre-release artifacts?  I suppose that's
> really what I'm after is building a pre-release version of pubsubIO to
> validate in Dataflow.
>
> - Evan
>
>
> On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
> wrote:
>
>> I don't come up with a solution (I'm not familiar with the method
>> you're using). However I often use "getProtectionDomain()"
>> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
>> class. This ensures the class you modified is actually used.
>>
>> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Spoke too soon... still can't seem to get the new behaviour to appear in
>>> dataflow, possibly something is being overridden?
>>>
>>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>>> source and used the resulting jar as a dependency replacement when
>>>> deploying the job to dataflow.  Looks ok.
>>>>
>>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> I believe I have the dependencySubstitution working, but it seems as
>>>>> though the substitution is removing transitive deps of
>>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>>
>>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm trying to test a change I've made locally, but by validating it
>>>>>> on Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>>> success yet.
>>>>>>
>>>>>> How might I be able to replace the
>>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>>> with a local jar generated from running:
>>>>>>
>>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>
>>
>> --
>> Regards,
>> Tomo
>>
>

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <ev...@gmail.com>.
Thanks Tomo, I'll check that out too as a good safeguard!  Are you familiar
with any process to build pre-release artifacts?  I suppose that's really
what I'm after is building a pre-release version of pubsubIO to validate in
Dataflow.

- Evan


On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
wrote:

> I don't come up with a solution (I'm not familiar with the method
> you're using). However I often use "getProtectionDomain()"
> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
> class. This ensures the class you modified is actually used.
>
> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Spoke too soon... still can't seem to get the new behaviour to appear in
>> dataflow, possibly something is being overridden?
>>
>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>> source and used the resulting jar as a dependency replacement when
>>> deploying the job to dataflow.  Looks ok.
>>>
>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> I believe I have the dependencySubstitution working, but it seems as
>>>> though the substitution is removing transitive deps of
>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>
>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm trying to test a change I've made locally, but by validating it on
>>>>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>> success yet.
>>>>>
>>>>> How might I be able to replace the
>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>> with a local jar generated from running:
>>>>>
>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>
>>>>> Thanks,
>>>>> Evan
>>>>>
>>>>
>
> --
> Regards,
> Tomo
>

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Tomo Suzuki via dev <de...@beam.apache.org>.
I don't come up with a solution (I'm not familiar with the method
you're using). However I often use "getProtectionDomain()"
https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
class. This ensures the class you modified is actually used.

On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:

> Spoke too soon... still can't seem to get the new behaviour to appear in
> dataflow, possibly something is being overridden?
>
> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>> source and used the resulting jar as a dependency replacement when
>> deploying the job to dataflow.  Looks ok.
>>
>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> I believe I have the dependencySubstitution working, but it seems as
>>> though the substitution is removing transitive deps of
>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>
>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to test a change I've made locally, but by validating it on
>>>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>>> tried a few different attempts at module substitution in the build.gradle
>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>> success yet.
>>>>
>>>> How might I be able to replace the
>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>> with a local jar generated from running:
>>>>
>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>

-- 
Regards,
Tomo

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <eg...@apache.org>.
Spoke too soon... still can't seem to get the new behaviour to appear in
dataflow, possibly something is being overridden?

On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:

> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
> be working. Added `  id 'com.github.johnrengelman.shadow'` to
> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
> source and used the resulting jar as a dependency replacement when
> deploying the job to dataflow.  Looks ok.
>
> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>
>> I believe I have the dependencySubstitution working, but it seems as
>> though the substitution is removing transitive deps of
>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>
>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to test a change I've made locally, but by validating it on
>>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>> tried a few different attempts at module substitution in the build.gradle
>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>> success yet.
>>>
>>> How might I be able to replace the
>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>> with a local jar generated from running:
>>>
>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>
>>> Thanks,
>>> Evan
>>>
>>

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <eg...@apache.org>.
Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
be working. Added `  id 'com.github.johnrengelman.shadow'` to
`build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
source and used the resulting jar as a dependency replacement when
deploying the job to dataflow.  Looks ok.

On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:

> I believe I have the dependencySubstitution working, but it seems as
> though the substitution is removing transitive deps of
> "beam-sdks-java-io-google-cloud-platform", hmm...
>
> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Hi all,
>>
>> I'm trying to test a change I've made locally, but by validating it on
>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>> tried a few different attempts at module substitution in the build.gradle
>> config file for the pipeline I'm trying to deploy, but I haven't had any
>> success yet.
>>
>> How might I be able to replace the
>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>> with a local jar generated from running:
>>
>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>
>> Thanks,
>> Evan
>>
>

Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

Posted by Evan Galpin <eg...@apache.org>.
I believe I have the dependencySubstitution working, but it seems as though
the substitution is removing transitive deps of
"beam-sdks-java-io-google-cloud-platform", hmm...

On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:

> Hi all,
>
> I'm trying to test a change I've made locally, but by validating it on
> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
> tried a few different attempts at module substitution in the build.gradle
> config file for the pipeline I'm trying to deploy, but I haven't had any
> success yet.
>
> How might I be able to replace the beam-sdks-java-io-google-cloud-platform
> module usually installed from maven with a local jar generated from
> running:
>
> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>
> Thanks,
> Evan
>