You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "Abacn (via GitHub)" <gi...@apache.org> on 2024/02/13 15:46:59 UTC

[I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Abacn opened a new issue, #30297:
URL: https://github.com/apache/beam/issues/30297

   ### What happened?
   
   There is a dependency chain of runners-google-cloud-dataflow-java -> sdks-java-io-kafka -> io.confluent dependencies. These dependencies do not exist on Maven central.
   
   The io-kafka classes is only used here:https://github.com/apache/beam/blob/b5cfd9523cde3fee8610956682541d4e90ee967b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L568
   
   - For batch pipeline not using KafkaIO, this is an unnecessary dependency and leads to multiple artifacts need to be downloaded from io.confluent repository and staging
   - For streaming pipeline not using KafkaIO, this is still unnecessary. 
   - For streaming pipeline using KafkaIO, it should explicitly add beam-sdks-java-io-kafka dependency.
   
   We should change it as a provided dependency
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1942204711

   Yes, this is already a problem. I found this: https://github.com/apache/beam/issues/21096#issuecomment-1228233492. After #30300 we should have this use case  resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1942170115

   Are the `io.confluent` dependencies a problem already? Are they publicly available?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1942475353

   Yea :-( https://discuss.gradle.org/t/gradle-transitive-dependencies-with-repository/44232/4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #30297: [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka
URL: https://github.com/apache/beam/issues/30297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1941887137

   An workaround
   
   ```
   try {
       overridesBuilder.add(KafkaIO.Read.KAFKA_READ_OVERRIDE);
     } catch (NoClassDefFoundError e) {
       LOG.info("Class KafkaIO was not found on classpath");
     }
   ```
   however it is generally anti-pattern to catch an Error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1942207761

   > I see that the KafkaIO module has
   > 
   > ```
   >   mavenRepositories: [
   >     [id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
   >   ],
   > ```
   > 
   > It is actually also a problem if that does not work for transitive dependency.
   
   Yes, exactly. I found this because I am setting up a raw gradle project to test some non-gcp IO (JmsIO specifically) pipelines on Dataflow, and find I have to declare
   ```
   maven {
         url "https://packages.confluent.io/maven"
     }
   ```
   even though I did not use GCP IO nor KafkaIO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [unnecessary dependency]: runners-google-cloud-dataflow-java depends on kafka [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #30297:
URL: https://github.com/apache/beam/issues/30297#issuecomment-1942174308

   I see that the KafkaIO module has
   
   ```
     mavenRepositories: [
       [id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
     ],
   ```
   
   It is actually also a problem if that does not work for transitive dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org