You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/08/21 17:23:00 UTC

[jira] [Comment Edited] (BEAM-10776) Unwanted JDK jars staged when running cross-language pipelines

    [ https://issues.apache.org/jira/browse/BEAM-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182033#comment-17182033 ] 

Luke Cwik edited comment on BEAM-10776 at 8/21/20, 5:22 PM:
------------------------------------------------------------

Typically all jars on the classpath are included since we have no way to know whether a jar is needed or not during execution.
What is the gradle configuration being used for the classpath (./gradlew :path:to:project:dependencies)?
Which JDK version are you using? (if java11, is JPMS being enabled?)
How is the JDK being launched?
Is it a separate process?

The default is controlled by [ClasspathScanningResourcesDetector|https://github.com/apache/beam/blob/6b472e1de8ba5769127f6c330a23cc7c0af80527/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetector.java#L31] and is configurable by this [PipelineResourceOptions|https://github.com/apache/beam/blob/26f6dd58b9fe608476ccc33601b2e26fc0343080/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java#L29] if you need to change it.

I looked at some of the jars and it looks like they aren't the JDK but some additional deps like Nashorn (a javascript engine for the JDK)



was (Author: lcwik):
Typically all jars on the classpath are included since we have no way to know whether a jar is needed or not during execution.
What is the gradle configuration being used for the classpath (./gradlew :path:to:project:dependencies)?
Which JDK version are you using? (if java11, is JPMS being enabled?)
How is the JDK being launched?
Is it a separate process?

The default is [ClasspathScanningResourcesDetector|https://github.com/apache/beam/blob/6b472e1de8ba5769127f6c330a23cc7c0af80527/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetector.java#L31] and is configurable by this [PipelineResourceOptions|https://github.com/apache/beam/blob/26f6dd58b9fe608476ccc33601b2e26fc0343080/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java#L29] if you need to change it.



> Unwanted JDK jars staged when running cross-language pipelines
> --------------------------------------------------------------
>
>                 Key: BEAM-10776
>                 URL: https://issues.apache.org/jira/browse/BEAM-10776
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language
>            Reporter: Chamikara Madhusanka Jayalath
>            Priority: P2
>
> When running cross-language Kafka on Dataflow I see following jars being staged.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/nashorn-BJZNQ7N8Lsfq-WSM0IMsRCwFMC3RIxBOEjrlB1YwKOw.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/nashorn-BJZNQ7N8Lsfq-WSM0IMsRCwFMC3RIxBOEjrlB1YwKOw.jar in 40 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/cldrdata-aZ6XIS6LfPilqVFbS_bWm1wMWGm3jxtjh0vjlRuqp5M.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/cldrdata-aZ6XIS6LfPilqVFbS_bWm1wMWGm3jxtjh0vjlRuqp5M.jar in 177 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jfxrt-B2UJQqvuEI-15FPV1mcdw80YRUIDMg1Kr82FxWK_DZ8.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jfxrt-B2UJQqvuEI-15FPV1mcdw80YRUIDMg1Kr82FxWK_DZ8.jar in 285 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/dnsns-zNxWyUaaHIkUFJRt-aNZudjc3eroySNUeRkxdxidGbY.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/dnsns-zNxWyUaaHIkUFJRt-aNZudjc3eroySNUeRkxdxidGbY.jar in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/localedata-Wt0bN9j6XmIH4BaRLouHZX6p6iIoQsbZ2AkomxZTOYM.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/localedata-Wt0bN9j6XmIH4BaRLouHZX6p6iIoQsbZ2AkomxZTOYM.jar in 16 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jaccess-5wlKULhaKWM_gmKVtH_QBwVqH4awlxxRdNNfz0z0Imw.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/jaccess-5wlKULhaKWM_gmKVtH_QBwVqH4awlxxRdNNfz0z0Imw.jar in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/MRJToolkit-jU5qhDBc0cNjn7g3yrGHYO78BRC09T-sE8Syqo9mRjg.jar...
> INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/MRJToolkit-jU5qhDBc0cNjn7g3yrGHYO78BRC09T-sE8Syqo9mRjg.jar in 0 seconds.
> INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://clouddfe-chamikara/temp/kafka-taxi-20200820-132559.1597955225.717180/beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-A94br32q87Prj7b_mG4_kPEdz9NSJ-0NwgHWEwwU4Qc.jar...
>  
> Out of these we just need 'beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-A94br32q87Prj7b_mG4_kPEdz9NSJ-0NwgHWEwwU4Qc.jar'. Rest seems to be due to us including all jars from classpath in the expansion service response.
>  
> [https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L407]
>  
> We should figure out a way to filter out these additional jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)