You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Evan Galpin <eg...@apache.org> on 2022/07/21 17:15:09 UTC
[Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Hi all,
I'm trying to test a change I've made locally, but by validating it on
Dataflow. It works locally, but I want to validate on Dataflow. I've
tried a few different attempts at module substitution in the build.gradle
config file for the pipeline I'm trying to deploy, but I haven't had any
success yet.
How might I be able to replace the beam-sdks-java-io-google-cloud-platform
module usually installed from maven with a local jar generated from
running:
"./gradlew :sdk:java:io:google-cloud-platform:jar"
Thanks,
Evan
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <eg...@apache.org>.
One final note of clarification: the pom file needs to be in the same
directory as the jar
On Fri, Jul 22, 2022 at 11:01 Evan Galpin <eg...@apache.org> wrote:
> It's working! Huge thank you to Steve Niemitz who pointed out the need for
> "--experiments=enable_custom_pubsub_sink" to prevent dataflow override
> for the module that I wanted to use custom source.
>
> Here is my full process in case it's helpful to anyone in the future (note
> one might need to change the version identifiers):
>
>
> 1. Modify files in sdks/java/io/google-cloud-platform
> 2. Add id 'com.github.johnrengelman.shadow' to plugins in
> sdks/java/io/google-cloud-platform/build.gradle
> 3. Build a shadowJar via "./gradlew
> :sdk:java:io:google-cloud-platform:shadowJar"
> 4. Copy the shadowJar from
> my/path/to/beam/sdks/java/io/google-cloud-platform/build/libs/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT-all.jar
> to
> my/path/to/user/pipeline/top-level/libs/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.40.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT.jar
> 5. Add a pom file for the shadowJar (to emulate local maven repo):
>
> <?xml version="1.0" encoding="utf-8"?>
> <project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="
> http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance">
> <modelVersion>4.0.0</modelVersion>
> <groupId>org.apache.beam</groupId>
> <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
> <version>2.40.0-SNAPSHOT</version>
> </project>
>
> 6. In user code pipeline "build.gradle", add a local maven repo (note
> "./libs" is from "my/path/to/user/pipeline/top-level/libs")
>
> repositories {
> maven {
> url = uri('./libs')
> }
> ... other repos ...
> }
>
> 7. In user code pipeline "build.gradle", implement dependency
> replacement of the SDK version of beam-sdks-java-io-google-cloud-platform
>
> configurations {
> all {
> resolutionStrategy.dependencySubstitution {
> substitute
> module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0")
> using
> module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0-SNAPSHOT")
> }
> }
> }
>
> 8. Deploy the user code pipeline including the flag:
> --experiments=enable_custom_pubsub_sink
>
>
>
> On Thu, Jul 21, 2022 at 4:42 PM Evan Galpin <ev...@gmail.com> wrote:
>
>> Thanks Tomo, I'll check that out too as a good safeguard! Are you
>> familiar with any process to build pre-release artifacts? I suppose that's
>> really what I'm after is building a pre-release version of pubsubIO to
>> validate in Dataflow.
>>
>> - Evan
>>
>>
>> On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
>> wrote:
>>
>>> I don't come up with a solution (I'm not familiar with the method
>>> you're using). However I often use "getProtectionDomain()"
>>> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
>>> class. This ensures the class you modified is actually used.
>>>
>>> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Spoke too soon... still can't seem to get the new behaviour to appear
>>>> in dataflow, possibly something is being overridden?
>>>>
>>>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform"
>>>>> looks to be working. Added ` id 'com.github.johnrengelman.shadow'` to
>>>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>>>> source and used the resulting jar as a dependency replacement when
>>>>> deploying the job to dataflow. Looks ok.
>>>>>
>>>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I believe I have the dependencySubstitution working, but it seems as
>>>>>> though the substitution is removing transitive deps of
>>>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>>>
>>>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm trying to test a change I've made locally, but by validating it
>>>>>>> on Dataflow. It works locally, but I want to validate on Dataflow. I've
>>>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>>>> success yet.
>>>>>>>
>>>>>>> How might I be able to replace the
>>>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>>>> with a local jar generated from running:
>>>>>>>
>>>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Evan
>>>>>>>
>>>>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <eg...@apache.org>.
It's working! Huge thank you to Steve Niemitz who pointed out the need for
"--experiments=enable_custom_pubsub_sink" to prevent dataflow override for
the module that I wanted to use custom source.
Here is my full process in case it's helpful to anyone in the future (note
one might need to change the version identifiers):
1. Modify files in sdks/java/io/google-cloud-platform
2. Add id 'com.github.johnrengelman.shadow' to plugins in
sdks/java/io/google-cloud-platform/build.gradle
3. Build a shadowJar via "./gradlew
:sdk:java:io:google-cloud-platform:shadowJar"
4. Copy the shadowJar from
my/path/to/beam/sdks/java/io/google-cloud-platform/build/libs/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT-all.jar
to
my/path/to/user/pipeline/top-level/libs/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.40.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT.jar
5. Add a pom file for the shadowJar (to emulate local maven repo):
<?xml version="1.0" encoding="utf-8"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="
http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>2.40.0-SNAPSHOT</version>
</project>
6. In user code pipeline "build.gradle", add a local maven repo (note
"./libs" is from "my/path/to/user/pipeline/top-level/libs")
repositories {
maven {
url = uri('./libs')
}
... other repos ...
}
7. In user code pipeline "build.gradle", implement dependency
replacement of the SDK version of beam-sdks-java-io-google-cloud-platform
configurations {
all {
resolutionStrategy.dependencySubstitution {
substitute
module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0")
using
module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0-SNAPSHOT")
}
}
}
8. Deploy the user code pipeline including the flag:
--experiments=enable_custom_pubsub_sink
On Thu, Jul 21, 2022 at 4:42 PM Evan Galpin <ev...@gmail.com> wrote:
> Thanks Tomo, I'll check that out too as a good safeguard! Are you
> familiar with any process to build pre-release artifacts? I suppose that's
> really what I'm after is building a pre-release version of pubsubIO to
> validate in Dataflow.
>
> - Evan
>
>
> On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
> wrote:
>
>> I don't come up with a solution (I'm not familiar with the method
>> you're using). However I often use "getProtectionDomain()"
>> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
>> class. This ensures the class you modified is actually used.
>>
>> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Spoke too soon... still can't seem to get the new behaviour to appear in
>>> dataflow, possibly something is being overridden?
>>>
>>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>>> to be working. Added ` id 'com.github.johnrengelman.shadow'` to
>>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>>> source and used the resulting jar as a dependency replacement when
>>>> deploying the job to dataflow. Looks ok.
>>>>
>>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> I believe I have the dependencySubstitution working, but it seems as
>>>>> though the substitution is removing transitive deps of
>>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>>
>>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm trying to test a change I've made locally, but by validating it
>>>>>> on Dataflow. It works locally, but I want to validate on Dataflow. I've
>>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>>> success yet.
>>>>>>
>>>>>> How might I be able to replace the
>>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>>> with a local jar generated from running:
>>>>>>
>>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>
>>
>> --
>> Regards,
>> Tomo
>>
>
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <ev...@gmail.com>.
Thanks Tomo, I'll check that out too as a good safeguard! Are you familiar
with any process to build pre-release artifacts? I suppose that's really
what I'm after is building a pre-release version of pubsubIO to validate in
Dataflow.
- Evan
On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <de...@beam.apache.org>
wrote:
> I don't come up with a solution (I'm not familiar with the method
> you're using). However I often use "getProtectionDomain()"
> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
> class. This ensures the class you modified is actually used.
>
> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Spoke too soon... still can't seem to get the new behaviour to appear in
>> dataflow, possibly something is being overridden?
>>
>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>> to be working. Added ` id 'com.github.johnrengelman.shadow'` to
>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>> source and used the resulting jar as a dependency replacement when
>>> deploying the job to dataflow. Looks ok.
>>>
>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> I believe I have the dependencySubstitution working, but it seems as
>>>> though the substitution is removing transitive deps of
>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>
>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm trying to test a change I've made locally, but by validating it on
>>>>> Dataflow. It works locally, but I want to validate on Dataflow. I've
>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>> success yet.
>>>>>
>>>>> How might I be able to replace the
>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>>> with a local jar generated from running:
>>>>>
>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>
>>>>> Thanks,
>>>>> Evan
>>>>>
>>>>
>
> --
> Regards,
> Tomo
>
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Tomo Suzuki via dev <de...@beam.apache.org>.
I don't come up with a solution (I'm not familiar with the method
you're using). However I often use "getProtectionDomain()"
https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
class. This ensures the class you modified is actually used.
On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <eg...@apache.org> wrote:
> Spoke too soon... still can't seem to get the new behaviour to appear in
> dataflow, possibly something is being overridden?
>
> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>> to be working. Added ` id 'com.github.johnrengelman.shadow'` to
>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>> source and used the resulting jar as a dependency replacement when
>> deploying the job to dataflow. Looks ok.
>>
>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> I believe I have the dependencySubstitution working, but it seems as
>>> though the substitution is removing transitive deps of
>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>
>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to test a change I've made locally, but by validating it on
>>>> Dataflow. It works locally, but I want to validate on Dataflow. I've
>>>> tried a few different attempts at module substitution in the build.gradle
>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>> success yet.
>>>>
>>>> How might I be able to replace the
>>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>>> with a local jar generated from running:
>>>>
>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>
--
Regards,
Tomo
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <eg...@apache.org>.
Spoke too soon... still can't seem to get the new behaviour to appear in
dataflow, possibly something is being overridden?
On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <eg...@apache.org> wrote:
> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
> be working. Added ` id 'com.github.johnrengelman.shadow'` to
> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
> source and used the resulting jar as a dependency replacement when
> deploying the job to dataflow. Looks ok.
>
> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
>
>> I believe I have the dependencySubstitution working, but it seems as
>> though the substitution is removing transitive deps of
>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>
>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to test a change I've made locally, but by validating it on
>>> Dataflow. It works locally, but I want to validate on Dataflow. I've
>>> tried a few different attempts at module substitution in the build.gradle
>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>> success yet.
>>>
>>> How might I be able to replace the
>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>> with a local jar generated from running:
>>>
>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>
>>> Thanks,
>>> Evan
>>>
>>
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <eg...@apache.org>.
Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
be working. Added ` id 'com.github.johnrengelman.shadow'` to
`build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
source and used the resulting jar as a dependency replacement when
deploying the job to dataflow. Looks ok.
On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <eg...@apache.org> wrote:
> I believe I have the dependencySubstitution working, but it seems as
> though the substitution is removing transitive deps of
> "beam-sdks-java-io-google-cloud-platform", hmm...
>
> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Hi all,
>>
>> I'm trying to test a change I've made locally, but by validating it on
>> Dataflow. It works locally, but I want to validate on Dataflow. I've
>> tried a few different attempts at module substitution in the build.gradle
>> config file for the pipeline I'm trying to deploy, but I haven't had any
>> success yet.
>>
>> How might I be able to replace the
>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>> with a local jar generated from running:
>>
>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>
>> Thanks,
>> Evan
>>
>
Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar
Posted by Evan Galpin <eg...@apache.org>.
I believe I have the dependencySubstitution working, but it seems as though
the substitution is removing transitive deps of
"beam-sdks-java-io-google-cloud-platform", hmm...
On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <eg...@apache.org> wrote:
> Hi all,
>
> I'm trying to test a change I've made locally, but by validating it on
> Dataflow. It works locally, but I want to validate on Dataflow. I've
> tried a few different attempts at module substitution in the build.gradle
> config file for the pipeline I'm trying to deploy, but I haven't had any
> success yet.
>
> How might I be able to replace the beam-sdks-java-io-google-cloud-platform
> module usually installed from maven with a local jar generated from
> running:
>
> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>
> Thanks,
> Evan
>