You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/06/05 15:12:41 UTC

[GitHub] [beam] tkram01 opened a new pull request #14953: jackson needed to run under EMR to avoid class not found exceptions

tkram01 opened a new pull request #14953:
URL: https://github.com/apache/beam/pull/14953


   Fix for BEAM-10430


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-872566789


   What is the next step on this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1062468982


   /cc @KevinGG @ibzib 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1072699025


   Just to follow up - I did a test by building my Flink shadowJar without the `implementation group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-jsr310', version: '2.12.3'` dependency and I was able to send a Flink pipeline successfully to Dataproc. If the dependency issue related to the `jackson-module-jaxb-annotations-2.10.5.jar` can be resolved, then 2.0 Dataproc images shouldn't have any other issues when it comes to running Flink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iemejia commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-876540227


   I somehow forgot about this one. I still do not understand why the jackson dependencies that come from `beam-java-sdk-core` are not resolved here, and why they should be defined explicitly in the runner even if it is not using it. Maybe @je-ik or @dmvk can have an intuition on this, maybe it is because of some weird classloading detail on Flink?
   
   Also the requested update to use the default Beam version of jackson is missing, but that's minor, but good to align.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] closed pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #14953:
URL: https://github.com/apache/beam/pull/14953


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1069400463


   Hi @je-ik, after some investigation through SSHing into my cluster nodes, it appears that these dependencies are being introduced by: `/usr/lib/hadoop-yarn/lib/jackson-module-jaxb-annotations-2.10.5.jar`. I was not able to find a dependency using the missing datatype, so I tried downloading the other dependency under /usr/lib/hadoop-yarn/jackson-datatype-jsr310-2.12.3.jar on each of the nodes. I also copied over the existing dependency into /usr/lib/hadoop-yarn/jackson-module-jaxb-annotations-2.10.5.jar for all nodes, then started a new Yarn session, but that did not seem to resolve the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1075417774


   @je-ik, I did some further investigation, and it seems that just deleting the jackson-module-jaxb-annotations jar is enough to successfully run Flink jobs on Dataproc without the dependencies being locally built. Since the dependency is deprecated, let's first see if this is something we can resolve just on the Dataproc side of things.
   
   For now, I made a workaround for my use-case by setting the image version used by Dataproc to be the default image, which gets updated as new images come out. If a future image contains the fix, we don't have to do anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tkram01 commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
tkram01 commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-855950399


   EMR does include jackson 2.9.10. I don't know if it is a version issue or a classpath issue but the only way I could get it to work was to include the jackson jars in the uber jar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887809841


   @je-ik Which one's version? Beam? Never heard of "corner version", what is that?
   
   ---
   Dataproc has fixed Flink versions.
   Dataproc 2.0 [maps](https://cloud.devsite.corp.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.0) to Flink 1.12.
   Dataproc 1.5 [maps](https://cloud.devsite.corp.google.com/dataproc/docs/concepts/versioning/dataproc-release-1.5) to Flink 1.9.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iemejia edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
iemejia edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-876540227


   I somehow forgot about this one. I still do not understand why the jackson dependencies that come from `beam-java-sdk-core` are not resolved here, and why they should be defined explicitly in the runner even if it is not using them. Maybe @je-ik or @dmvk can have an intuition on this, maybe it is because of some weird classloading detail on Flink?
   
   Also the requested update to use the default Beam version of jackson is missing, but that's minor, but good to align.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887809841


   @je-ik Which one's version? Beam? Never heard of "corner version", what is that?
   
   ---
   Dataproc has fixed Flink versions.
   Dataproc 2.0 [maps](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.0) to Flink 1.12.
   Dataproc 1.5 [maps](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-1.5) to Flink 1.9.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887877137


   From the stack trace,
   ```
   at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
   
   at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
   ```
   it looks like Beam's `PipelineOptionsFactory` needs this dependency.  https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java#L25


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887814177


   I meant to try to eliminate the versions that work and that do not work.
    a) fix version of Beam and Flink and try Dataproc 1.5 and 2.0
    b) fix version of Dataproc and Beam and try Flink 1.9 and 1.12
    c) fix version of Flink and Dataproc and try Beam 2.29 and 2.31
    
   That would help a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1030616429


   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
ibzib commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887935720


   > From the stack trace,
   > 
   > ```
   > at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
   > 
   > at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
   > ```
   > 
   > it looks like Beam's `PipelineOptionsFactory` needs this dependency.
   > 
   > https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java#L25
   
   It's not that simple unfortunately. While Beam depends on some Jackson artifacts, it does not and should not depend on the artifact `jackson-module-jaxb-annotations` (containing package `com.fasterxml.jackson.module.jaxb`). The problem is that somehow `com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule` is erroneously being registered as a [service provider](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html?is-external=true) for `com.fasterxml.jackson.databind.Module` (which is part of Jackson core, which is a real Beam dependency). So our best guess so far is that the JaxbAnnotationModule service is being registered by some dependency which is common to Dataproc and EMR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887840112


   I can optionally install a Flink component when creating a Dataproc cluster, just like other Dataproc optional [components](https://cloud.google.com/dataproc/docs/concepts/components/overview#available_optional_components). But I don't have great many details on Dataproc 1.5 vs. 2.0.
   
   You may be onto something. I tried the following while using [Beam Flink compatibility](https://beam.apache.org/documentation/runners/flink/#flink-version-compatibility) to guide myself. The issues seem to concentrate in Dataproc 2.0.
   
   | Dataproc | Beam | Flink | I can try | Worked|
   |---|---|---|---|---|
   |2.0|2.31|1.12|Yes|No - missing dep|
   |2.0|2.30|1.12|Yes|No - missing dep|
   |1.5|2.29|1.9|Yes|Yes|
   |1.5|2.28|1.9|Yes|Yes|
   |1.5|2.27|1.9|Yes|Yes|
   |1.5|2.26|1.9|Yes|Yes|
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] zhangandyx edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
zhangandyx edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-883882953


   Running into this on a non-example project on EMR as well (Beam 2.30, v5.33 and 6.3). @iemejia do you know where the jackson 2.9 deps come from? I bundled the exact versions across the board, but if EMR is injecting a different one, that could be the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] zhangandyx commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
zhangandyx commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887990947


   Is it possibly the hadoop jar? Looks like it pulls in a shaded jackson
   module, but that could very well be it.
   
   On Tue, Jul 27, 2021 at 6:13 PM Kyle Weaver ***@***.***>
   wrote:
   
   > From the stack trace,
   >
   > at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
   >
   > at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
   >
   > it looks like Beam's PipelineOptionsFactory needs this dependency.
   >
   >
   > https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java#L25
   >
   > It's not that simple unfortunately. While Beam depends on some Jackson
   > artifacts, it does not and should not depend on the artifact
   > jackson-module-jaxb-annotations (containing package
   > com.fasterxml.jackson.module.jaxb). The problem is that somehow
   > com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule is erroneously
   > being registered as a service provider
   > <https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html?is-external=true>
   > for com.fasterxml.jackson.databind.Module (which is part of Jackson core,
   > which is a real Beam dependency). So our best guess so far is that the
   > JaxbAnnotationModule service is being registered by some dependency which
   > is common to Dataproc and EMR.
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/14953#issuecomment-887935720>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAKEQPHJVIY3MLBWVCJRGUTTZ5KRVANCNFSM46EVE7QA>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] closed pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #14953:
URL: https://github.com/apache/beam/pull/14953


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1037214127


   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1037214127


   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] zhangandyx commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
zhangandyx commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-883882953


   @iemejia do you know where the jackson 2.9 deps come from? I bundled the exact versions across the board, but if EMR is injecting a different one, that could be the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iemejia commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-855848499


   Now that I think more about this if the runner does not use at all these dependencies we probably should not add them. I wonder if these dependencies are missing from the EMR side (and we should document this) or if they are somehow misconfigured because of the classpath priorities being unaligned :S


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887816052


   Ah, ok. What is the main difference of Dataproc 1.5 to 2.0? I pretty much think that the issue is not Beam issue nor Flink issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] iemejia commented on a change in pull request #14953: jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
iemejia commented on a change in pull request #14953:
URL: https://github.com/apache/beam/pull/14953#discussion_r646454566



##########
File path: runners/flink/flink_runner.gradle
##########
@@ -34,6 +34,11 @@ applyJavaNature(
 
 description = "Apache Beam :: Runners :: Flink $flink_version"
 
+dependencies {
+  implementation group: 'com.fasterxml.jackson.module', name: 'jackson-module-jaxb-annotations', version: '2.12.3'

Review comment:
       Hello, thanks for contributing this fix! Which version of Flink (EMR) were you able to run with this fix (just out of curiosity).
   
   Can you put this on the main dependencies block
   https://github.com/apache/beam/blob/92386d781b5d502c4ea47a0894b72ca57854553d/runners/flink/flink_runner.gradle#L176
   
   And can you use the default library definitions (and add the jsr310 one there).
   https://github.com/apache/beam/blob/92386d781b5d502c4ea47a0894b72ca57854553d/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L580




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tkram01 commented on a change in pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
tkram01 commented on a change in pull request #14953:
URL: https://github.com/apache/beam/pull/14953#discussion_r646608472



##########
File path: runners/flink/flink_runner.gradle
##########
@@ -34,6 +34,11 @@ applyJavaNature(
 
 description = "Apache Beam :: Runners :: Flink $flink_version"
 
+dependencies {
+  implementation group: 'com.fasterxml.jackson.module', name: 'jackson-module-jaxb-annotations', version: '2.12.3'

Review comment:
       I tested this on EMR 6.3.0 and 5.33




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1062074876


   Hi folks,
   
   I am currently working on enabling a feature that relies on a 2.0 Dataproc image ([BEAM-13973](https://issues.apache.org/jira/browse/BEAM-13973)). I am looking to enable Interactive Beam to have the capability of creating a Dataproc cluster and sending a Flink job to it. For such a job to run successfully though, the dependencies listed in this PR are necessary. For this feature, I am using a 2.0 image because the 1.5 Dataproc images all use Flink 1.9.3, and it appears that Flink 1.9 has been deprecated for nearly a year now.
   
   Would there be any other potential workarounds that we can add into Beam to have Flink work on Dataproc? Would it be suitable to add these dependencies for now and label them with a ticket addressing this behavior with Dataproc and EMR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1074152397


   I believe so. Though for this, the versioning does not seem to have much of an effect here, I was able to successfully execute a job on Dataproc with both versions.
   
   As this dependency is necessary for me to fully enable an automatic process to send Flink pipelines to Dataproc, without needing users to locally build the shadowJar with it included, would it be possible to include only the `jackson-module-jaxb-annotations` dependency and have a Jira ticket with a to-do to remove it after it has been resolved on the Dataproc side? This way, we can guarantee that this dependency issue does not show up in a future version of Beam. Doing so will also make it possible for users to follow use-cases such as the content covered in the [Dataproc Flink component documentation](https://cloud.google.com/dataproc/docs/concepts/components/flink) using a version of Flink that has not been deprecated on the Beam side (the working example uses a Dataproc 1.5 image and Flink 1.9, but we no longer support that Flink version). Additionally, it does not appear that the `jackson-datatype-jsr310` dependency is needed for me to run Flink pipelines on Dataproc, so only 
 adding the jaxb-annotations should suffice.
   
   +Dagang Wei (@functicons), who helped me investigate the dependencies on the Dataproc side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-877417054


   > @anguillanneuf left some interesting comments on #15151.
   > 
   >     1. The exception also happens on Dataproc, so it's not just EMR.
   > 
   >     2. The Spark runner includes the same dependencies, likely for the same reason.
   
   I'd say we should investigate this to know the correct cause. FlinkRunner itself is not (as far as I was able to verify) declaring or importing the JAXB annotations. Also, it works on non EMR / Dataproc cases. Is it possible that this really relates to the examples only? Can the issue be there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-910839631


   Would Beam be open to switching to gson from jackson?
   Here's a `googleapis` example where we made the change: https://github.com/googleapis/java-pubsublite-spark/pull/25


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
ibzib commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-877389925


   @anguillanneuf left some interesting comments on #15151.
   
   1. The exception also happens on Dataproc, so it's not just EMR.
   2. The Spark runner includes the same dependencies, likely for the same reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] zhangandyx edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
zhangandyx edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-883882953


   Running into this on a non-example project on EMR as well (Beam 2.30, EMRv5.33/6.3). @iemejia do you know where the jackson 2.9 deps come from? I bundled the exact versions across the board, but if EMR is injecting a different one, that could be the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1074863728


   Alright, can we:
    a) create tracking issue to remove the conflicting dependency from Dataproc (probably `hadoop-yarn` somehow)?
    b) add the jaxb annotations with a tracking Jira in Beam to remove it once the upstream Dataproc issue is resolved?
    
   I fully understand the need to run examples using recent runner. If we cannot simply fix Dataproc (and EMR), then this might be the way to go. Seems like adding jar with annotations should not break anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-876990149


   Looks to me, that this is not Beam issue. Probably either in YARN or EMR (or combination). I think we should not add the dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-910839631


   Would Beam be open to switching to `gson` from `jackson`?
   Here's a `googleapis` example where we made the change: https://github.com/googleapis/java-pubsublite-spark/pull/25


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1069400463


   Hi @je-ik, after some investigation through SSHing into my cluster nodes, it appears that these dependencies are being introduced by: `/usr/lib/hadoop-yarn/lib/jackson-module-jaxb-annotations-2.10.5.jar`. I was not able to find a dependency using the missing datatype, so I tried downloading the other dependency under /usr/lib/hadoop-yarn/jackson-datatype-jsr310-2.12.3.jar on each of the nodes. I also copied over the existing dependency into /usr/lib/hadoop-yarn/jackson-module-jaxb-annotations-2.10.5.jar for all nodes, then started a new Yarn session, but that did not seem to resolve the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1073664626


   I'm not sure if I understand correctly. I'll recap my understanding - the dependency that brings JAXB is `hadoop-yarn`, correct? If that is dependency of Dataproc, then it seems to me, that the missing dependency should be added there. Maybe it is a version clash? Looks like versions `2.10.5` and `2.12.3` are involved in this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887807980


   @anguillanneuf If you have access to the failing environment(s), could you try to narrow down the version that is failing? Maybe using the corner versions of Beam, Flink and Dataproc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887809841


   @je-ik Which one's version? Beam? Never heard of "corner version", what is that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC edited a comment on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC edited a comment on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1062074876


   Hi folks,
   
   I am currently working on enabling a feature that relies on a 2.0 Dataproc image ([BEAM-13973](https://issues.apache.org/jira/browse/BEAM-13973)). I am looking to enable Interactive Beam to have the capability of creating a Dataproc cluster and sending a Flink job to it. For such a job to run successfully though, the dependencies listed in this PR are necessary. For this feature, I am using a 2.0 image because the 1.5 Dataproc images all use Flink 1.9.3, and it appears that Flink 1.9 has been deprecated for nearly a year now.
   
   Would there be any other potential workarounds that we can add into Beam to have Flink work on Dataproc? Would it be suitable to add these dependencies for now and label them with a ticket addressing this behavior with Dataproc and EMR?
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] je-ik commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
je-ik commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1062724725


   @VictorPlusC can you please inspect the classpath of the job being submitted to Dataproc and see which dependency brings the `com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule` service provider into `META-INF/services`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] VictorPlusC commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
VictorPlusC commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1072699025


   Just to follow up - I did a test without the `implementation group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-jsr310', version: '2.12.3'` dependency and I was able to send a Flink pipeline successfully to Dataproc. If the dependency issue related to the `jackson-module-jaxb-annotations-2.10.5.jar` can be resolved, then 2.0 Dataproc images shouldn't have any other issues when it comes to running Flink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] anguillanneuf commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
anguillanneuf commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-887801298


   Beam's self-generated word count [example](https://beam.apache.org/get-started/quickstart-java/) fails on Dataproc==2.0, Beam==2.31.0, Flink==1.12 without this dependency. But it works with Dataproc==1.5, Beam==2.29, and Flink==1.9. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on pull request #14953: [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions

Posted by GitBox <gi...@apache.org>.
ibzib commented on pull request #14953:
URL: https://github.com/apache/beam/pull/14953#issuecomment-1074271639


   jackson-module-jaxb-annotations is deprecated. https://github.com/FasterXML/jackson-module-jaxb-annotations
   
   > NOTE: This module has become part of [Jackson Base Modules](https://github.com/FasterXML/jackson-modules-base) repo. as of Jackson 2.9
   >
   > This repo still exists to allow release of patch versions of older versions; it will be hidden (made private) in near future.
   
   Looks like Dataproc is now on Jackson 2.10. https://cloud.google.com/dataproc/docs/release-notes#November_09_2020 
   
   So jackson-module-jaxb-annotations shouldn't be listed as a service provider at all. Dataproc (or whichever of its dependencies is responsible) should remove it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org