You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/19 15:58:05 UTC

[GitHub] [pulsar] lhotari opened a new pull request #9638: Reduce Pulsar IO Connectors size

lhotari opened a new pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638


   Fixes #9572
   
   ### Motivation
   
   See #9572 for problem description. The total size of Pulsar IO files is currently 1952MB. The changes in #9246 made the problem worse, but the root cause of the problem is not #9246 . The Pulsar IO Connectors contained unnecessary files already before that change. This PR doesn't intend to fix the problems of leaked dependencies that is an issue caused by #9246 . 
   
   It is safe to exclude all dependencies that are part of Pulsar Functions Worker's system classloader. The reason for this is that classloaders use parent-first lookups (by default, and also in Pulsar Functions Worker). The simplest way to do this exclusion is to use the `provided` scope when applicable.
   
   Reducing the Pulsar IO files size is necessary to get the size of the pulsar-test-latest-version docker image size reduced so that it is feasible to transfer the docker image over the network in order to build the image once in the Pulsar CI GitHub Actions workflow and reuse the image in the integration tests that use that image.
   
   ### Modifications
   
   - Use provided scope when applicable to exclude the dependencies which are part of Pulsar Functions Worker's system classloader.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-783318008


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791216170


   > However, because of the existing user base of pulsar-all image, it would be a breaking change from the user's perspective if support for pulsar-all is suddenly stopped without a migration path.
   
   > There is a lot of usage for pulsar-all image. For example, the pulsar-helm-chart uses the pulsar-all image in the default configuration:
   https://github.com/apache/pulsar-helm-chart/blob/67818a48cb41b3e4b6d685fb598332a9d9a320bb/charts/pulsar/values.yaml#L150-L172
   
   We don't have to stop building the pulsar-all image that contains all the connectors.  At the same time, we also don't need to be keep using that image for our integration tests.  We can also introduce different flavors of images that contain different sets for connectors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784244466


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari closed pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari closed pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-783708700


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791247021


   One more: #9817


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790847094


   @lhotari if the primary purpose if to reduce the test image size, perhaps a simple solution would be just to build separate pulsar images with different connectors or sets of connectors and the integration tests can be modified to use the correct one when testing a connector.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785268314


   btw. I noticed in some local tests that pulsar-io/flume connector is most likely broken. There aren't any integration tests to verify it's behavior. The Flume libraries depend on older Avro version which isn't compatible with the one used in Pulsar. That's why I reverted some changes I had made in pulsar-io/flume/pom.xml to mark Avro dependencies with provided . T  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784328017


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790809949


   @lhotari `functionInstanceClsLoader ` defined in `JavaInstanceMain`:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L87
   
   Does NOT load the user's function JARs. It is suppose to load the Pulsar Function framework JARs thus it does load all of the Pulsar platform dependencies.
   
   The root classloader that contains only the interfaces in which the user defined function interacts with framework is defined here: 
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99 
   
   and passed into the `JavaInstaceStarter` here
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99
   
   The root classloader is subsequently pass into the `ThreadRuntimeFactory` here:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/JavaInstanceStarter.java#L213
   
   The `ThreadRuntime` will use the root classloader, which only contains those few interfaces, as the parent classloader of the user code function classloader.  Thus, the classloader that loads the user function JARs will not contain all the dependencies of Pulsar.  By the way `ThreadRuntime` is used by both `ProcessRuntime` and `KubernetesRuntime` underneath but don't be confused by this with actually configuring the worker to use `ThreadRuntime`.
   
   When `ThreadRuntime` is configured as the runtime to be used by the worker, this root classloader will not be set and and default to `Thread.currentThread().getContextClassLoader()` which contains all of pulsar's dependencies:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactory.java#L91 
   
   However, this is not the case for   `ProcessRuntime` and `KubernetesRuntime` and this is the difference between `ThreadRuntime` and the other two runtimes. Again, this is something the that `ThreadRuntime` is doing incorrectly, but I / we haven't gotten around to fix it.
   
   In general, for platform that supports third party plugins or executing user submitted code, it is best if classpaths are isolated and transitive dependencies are not shared across platform and user code.  This will cause a lot of dependency versioning issues and limit what versions dependencies user submitted code can use.
   
   @lhotari I appreciate your effort to solve the issues with testing and to understand the Pulsar Function code.  Perhaps we can find another solution here? Looking forward to working with you in the Pulsar community!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790809949


   @lhotari `functionInstanceClsLoader ` defined in `JavaInstanceMain`:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L87
   
   Does NOT load the user's function JARs. It is suppose to load the Pulsar Function framework JARs thus it does load all of the Pulsar platform dependencies.
   
   The root classloader that contains only the interfaces in which the user defined function interacts with framework is defined here: 
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99 
   
   and passed into the `JavaInstaceStarter` here
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99
   
   The root classloader is subsequently pass into the `ThreadRuntimeFactory` here:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/JavaInstanceStarter.java#L213
   
   The `ThreadRuntime` will use the root classloader, which only contains those few interfaces, as the parent classloader of the user code function classloader.  Thus, the classloader that loads the user function JARs will not contain all the dependencies of Pulsar.  By the way `ThreadRuntime` is used by both `ProcessRuntime` and `KubernetesRuntime` underneath but don't be confused by this with actually configuring the worker to use `ThreadRuntime`.
   
   When `ThreadRuntime` is configured as the runtime to be used by the worker, this root classloader will not be set and and default to `Thread.currentThread().getContextClassLoader()` which contains all of pulsar's dependencies:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactory.java#L91 
   
   However, this is not the case for   `ProcessRuntime` and `KubernetesRuntime` and this is the difference between `ThreadRuntime` and the other two runtimes.
   
   In general, for platform that support plugins or executing user submitted code, it is best if classpaths are isolated and transitive dependencies are not shared across platform and user code.  This will cause a lot of dependency versioning issues and limit what versions dependencies user submitted code can use.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791202132


   > It seems that the Pulsar dependencies are in the parent classloader of the NAR classloader in all configurations of Pulsar Functions: thread, process and k8s runtime.  Because of this, any library that is part of Pulsar dependencies will get loaded from the parent classloader and not from the jar files embedded in the NAR file. 
   
   Again, this not true or not suppose to be true.  I wrote and designed the classloading mechanism in functions so at least i know the intention if for user code to be loaded in a separate classloader than doesn't contain all of pulsar dependencies.  However, currently because of issue such as  #9246 , the classloader for user code maybe polluted with unnecessary deps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790875396


   > @lhotari if the primary purpose if to reduce the test image size, perhaps a simple solution would be just to build separate pulsar images with different connectors or sets of connectors and the integration tests can be modified to use the correct one when testing a connector.
   
   For my CI case it's the primary purpose. @merlimat suggested to write a spec as a GH issue for what you are proposing. However that doesn't resolve the problem that Pulsar users have. The pulsar-all image size is huge. Atm 3GB, but in 2.7 it's 2.25GB which is also too large. IIRC, There's about 400MB coming from duplicate connector jars (which are Pulsar core deps) also in 2.7 pulsar-all docker image.
   
   Isn't there a simple solution to get rid of the unnecessary jar files in the .nar files?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791202132


   > It seems that the Pulsar dependencies are in the parent classloader of the NAR classloader in all configurations of Pulsar Functions: thread, process and k8s runtime.
   
   Again, this not true or not suppose to be true.  If this is true there is not point really even having the classloading mechanism I pointed out in the code and there will also be no point in using NARs to begin with.  Maybe true for now because of the issue #9246...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785232699


   > This is true for ThreadRuntime currently but not for ProcessRuntime / kubernetes runtijme. ThreadRuntime is also not doing the right thing since Ideally user provided code is isolated on completely separate classloader that only share common interfaces with the framework classloader.
   
   @jerrypeng  Thanks for pointing this out. I spent a few hours investigating this to get a better understanding how the classloaders are configured in Pulsar Functions. 
   
   I found the documentation of the classloader hierarchy for Process Runtime in JavaInstanceMain class: https://github.com/apache/pulsar/blob/00463adfd5c59958479b9b3a8e26140ab07fd8c1/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L34-L47
   
   There's also logic that impacts how the classpaths get configured:
   https://github.com/apache/pulsar/blob/64971778d88a8dab6e6e50563e9c30d5425c8cc1/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/RuntimeUtils.java#L286-L297
   
   By default `functionInstanceClassPath` is always null/empty, for both Process Runtime and also for Kubernetes Runtime. [Therefore `pulsar.functions.instance.classpath` will always be set to `System.getProperty("java.class.path")` by default](https://github.com/apache/pulsar/blob/64971778d88a8dab6e6e50563e9c30d5425c8cc1/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/RuntimeUtils.java#L286-L297).
   
   Because of the above logic where `pulsar.functions.instance.classpath` will always contain the classpath of the process that launches the processes, it leads to the same behavior for thread, process and kubernetes runtimes, where the **Pulsar provided dependencies will always get loaded from the parent classloader of the "user classloader"**. I don't see any actual classloading differences between the thread runtime, process runtime and kubernetes runtime when the default configuration is used (`functionInstanceClassPath` is left unconfigured in `conf/functions_worker.yml`) . I might be missing something, so I'd be happy to learn more about this area in Pulsar.
   
   > 
   > While I am all for decreasing the sizes of connectors. This will have other consequences.
   
   There are integration tests which [test both the process runtime](https://github.com/apache/pulsar/blob/master/tests/integration/src/test/java/org/apache/pulsar/tests/integration/functions/PulsarFunctionsProcessTest.java) and the thread runtime for [some connectors](https://github.com/apache/pulsar/blob/a5b3cf6fe694b57adffc7e9a69c8520f3d85b2e3/tests/integration/src/test/java/org/apache/pulsar/tests/integration/functions/PulsarFunctionsTest.java#L79-L90). It seems like a major gap that not all connectors are tested with the thread runtime and the process runtime. You are right that there will be unknown consequences as long as there isn't test coverage for both scenarios.
   
   As explained in the previous observations of the thread vs. process/k8s runtime (no actual difference in classloading with default settings), the changes made in this PR are safe in general.
   @jerrypeng WDYT?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-793741618


   > I liked the motivation. I am also good with making `pulsar-io-core` as `provided` dependency. However, I see you also turn some of the dependencies shared between the Pulsar runtime and connector implementation to the `provided` scope. I am not comfortable with that change. Because even they are the same dependency, they will be loaded via different class loaders. I am not sure turning them into `provided` will not break any connector implementation.
   
   @sijie Thank you for reviewing. Yes you are right that it's not fine to change the dependencies. I verified some details and wrote a report about that in the previous comment. It's indeed that  #9246 (open issue #9640) is causing the issues and it also pollutes the classes that are available in the current parent classloader of the function user code classloader. 
   
   It seems that it would be fine to make the dependencies `provided` which are part of pulsar-functions/runtime-all/pom.xml since that creates the java-instance.jar . The size of these dependencies is very low (in 2.7.0, it's about 2MB) so it wouldn't save much in the overall size which was the goal of this PR. 
   Since that's the case, I'll close this PR. 
   
   I hope #9640 gets addressed asap since I think it could be considered as a blocker for the 2.8.0 release.
   It seems that there's a PR #9842 in WIP state currently. Great!
   
   
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-793754009


   > I already checked in #9807 and #9808 that are including only the connectors that are actually tested and squeezed some extra 100s of MB
   
   > One more: #9817
   
   Thank you @merlimat for putting effort in this. Just what I needed. These will unblock me in the refactored Pulsar CI experiment. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785105304


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785184545


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on a change in pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#discussion_r580187738



##########
File path: pulsar-io/aerospike/pom.xml
##########
@@ -36,16 +36,19 @@
       <groupId>${project.groupId}</groupId>
       <artifactId>pulsar-io-core</artifactId>
       <version>${project.version}</version>
+      <scope>provided</scope>
     </dependency>
 
     <dependency>
       <groupId>com.fasterxml.jackson.core</groupId>
       <artifactId>jackson-databind</artifactId>
+      <scope>provided</scope>

Review comment:
       Yes, there's direct usage of Jackson in https://github.com/apache/pulsar/blob/65fed8b7fc258a2980b522f32db0f37f017bf84d/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeSinkConfig.java#L22-L23 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#discussion_r581300349



##########
File path: pulsar-io/aerospike/pom.xml
##########
@@ -36,16 +36,19 @@
       <groupId>${project.groupId}</groupId>
       <artifactId>pulsar-io-core</artifactId>
       <version>${project.version}</version>
+      <scope>provided</scope>
     </dependency>
 
     <dependency>
       <groupId>com.fasterxml.jackson.core</groupId>
       <artifactId>jackson-databind</artifactId>
+      <scope>provided</scope>

Review comment:
       Gotcha. As a side note, we could provide here a standard method to perform the yaml ser/de since most (all?) connectors are going to do this. 
   
   This way, the connector don't need to depend on that and the serialization code is already embedded in the runtime.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791233972


   I already checked in #9807 and #9808 that are including only the connectors that are actually tested and squeezed some extra 100s of MB


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790809949


   @lhotari `functionInstanceClsLoader ` defined in `JavaInstanceMain`:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L87
   
   Does NOT load the user's function JARs. It is suppose to load the Pulsar Function framework JARs thus it does load all of the Pulsar platform dependencies.
   
   The root classloader that contains only the interfaces in which the user defined function interacts with framework is defined here: 
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99 
   
   and passed into the `JavaInstaceStarter` here
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime-all/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceMain.java#L99
   
   The root classloader is subsequently pass into the `ThreadRuntimeFactory` here:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/JavaInstanceStarter.java#L213
   
   The `ThreadRuntime` will use the root classloader, which only contains those few interfaces, as the parent classloader of the user code function classloader.  Thus, the classloader that loads the user function JARs will not contain all the dependencies of Pulsar.  By the way `ThreadRuntime` is used by both `ProcessRuntime` and `KubernetesRuntime` underneath but don't be confused by this with actually configuring the worker to use `ThreadRuntime`.
   
   When `ThreadRuntime` is configured as the runtime to be used by the worker, this root classloader will not be set and and default to `Thread.currentThread().getContextClassLoader()` which contains all of pulsar's dependencies:
   
   https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactory.java#L91 
   
   However, this is not the case for   `ProcessRuntime` and `KubernetesRuntime` and this is the difference between `ThreadRuntime` and the other two runtimes.
   
   In general, for platform that supports third party plugins or executing user submitted code, it is best if classpaths are isolated and transitive dependencies are not shared across platform and user code.  This will cause a lot of dependency versioning issues and limit what versions dependencies user submitted code can use.
   
   @lhotari I appreciate your effort to solve the issues with testing and to understand the Pulsar Function code.  Perhaps we can find another solution here? Looking forward to working with you in the Pulsar community!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng removed a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng removed a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784568756


   > It is safe to exclude all dependencies that are part of Pulsar Functions Worker's system classloader.
   
   This is true for ThreadRuntime currently but not for ProcessRuntime / kubernetes runtijme.  ThreadRuntime is also not doing the right thing since Ideally user provided code is isolated on completely separate classloader that only share common instances with the framework classloader


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785268314


   btw. I noticed in some local tests that pulsar-io/flume connector is most likely broken. There aren't any integration tests to verify it's behavior. The Flume libraries depend on older Avro version which isn't compatible with the one used in Pulsar. That's why I reverted some changes I had made in pulsar-io/flume/pom.xml to mark Avro dependencies with provided .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791202132


   > It seems that the Pulsar dependencies are in the parent classloader of the NAR classloader in all configurations of Pulsar Functions: thread, process and k8s runtime.
   
   Again, this not true or not suppose to be true.  I wrote and designed the classloading mechanism in functions so at least i know the intention if for user code to be loaded in a separate classloader than doesn't contain all of pulsar dependencies.  However, currently because of issue such as  #9246 , the classloader for user code maybe polluted with unnecessary deps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#discussion_r579332487



##########
File path: pulsar-io/aerospike/pom.xml
##########
@@ -36,16 +36,19 @@
       <groupId>${project.groupId}</groupId>
       <artifactId>pulsar-io-core</artifactId>
       <version>${project.version}</version>
+      <scope>provided</scope>
     </dependency>
 
     <dependency>
       <groupId>com.fasterxml.jackson.core</groupId>
       <artifactId>jackson-databind</artifactId>
+      <scope>provided</scope>

Review comment:
       Why do we even need these dependencies here (other than `pulsar-io-core`)? 
   Is the connector making direct use of Jackson? or is this used for tests?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-782202349


   > Reducing the Pulsar IO files size is necessary to get the size of the pulsar-test-latest-version docker image size reduced so that it is feasible to transfer the docker image over the network in order to build the image once in the Pulsar CI GitHub Actions workflow and reuse the image in the integration tests that use that image.
   
   Not withstanding that reducing size is good, one thing we discussed yesterday, around moving connectors to separate repo, was to have the integration tests to just build a minimal image (not like pulsar-all) and use that for tests (as you already did in #9625). That image will be able to get cached within GH actions, because it won't contain all the connectors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790819422


   @jerrypeng Thanks for your reply. I'll take a closer look tomorrow and go through the details.
   
   As long as we find a solution to get reduce the Pulsar IO Connectors size, I'm fine with that.
   It seems that there should be a way to get rid of the dependencies that are Pulsar dependencies. The total size of Pulsar Connectors is about 2GB at the moment and that makes the pulsar-all image huge. StreamNative's [streamnative/pulsar-all:2.8.0-rc-202103022223](https://hub.docker.com/layers/streamnative/pulsar-all/2.8.0-rc-202103022223/images/sha256-f73c3ef16277b7c75958c4333e50f1ca762768ca498bd8098821a2f005ad844e?context=explore) is 2.92GB (compressed image size, I assume).
   Do you have another solution in mind?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-785637817


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-783369484


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784568756


   > It is safe to exclude all dependencies that are part of Pulsar Functions Worker's system classloader.
   
   This is true for ThreadRuntime currently but not for ProcessRuntime / kubernetes runtijme.  ThreadRuntime is also not doing the right thing since Ideally user provided code is isolated on completely separate classloader that only share common instances with the framework classloader


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790819422


   @jerrypeng Thanks for your reply. I'll take a more detailed look tomorrow to go through the details.
   
   As long as we find a solution to get reduce the Pulsar IO Connectors size, I'm fine with that.
   It seems that there should be a way to get rid of the dependencies that are Pulsar dependencies.
   Do you have another solution in mind?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
merlimat commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-782202349


   > Reducing the Pulsar IO files size is necessary to get the size of the pulsar-test-latest-version docker image size reduced so that it is feasible to transfer the docker image over the network in order to build the image once in the Pulsar CI GitHub Actions workflow and reuse the image in the integration tests that use that image.
   
   Not withstanding that reducing size is good, one thing we discussed yesterday, around moving connectors to separate repo, was to have the integration tests to just build a minimal image (not like pulsar-all) and use that for tests (as you already did in #9625). That image will be able to get cached within GH actions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-791160397


   > Why do we have to package all the connectors into one image even for users? I doubt a user will need to use all connectors Pulsar has to offer. Perhaps we can have a strategy in which a user can choose which connectors, he or she as actually needs and that gets baked into the image. And for users that just want to use pub sub, I think there is already an image that doesn't contain any connectors.
   
   I agree on this point. 
   
   However, because of the existing user base of pulsar-all image, it would be a breaking change from the user's perspective if support for pulsar-all is suddenly stopped without a migration path.
   
   There is a lot of usage for pulsar-all image. For example, the pulsar-helm-chart uses the pulsar-all image in the default configuration:
   https://github.com/apache/pulsar-helm-chart/blob/67818a48cb41b3e4b6d685fb598332a9d9a320bb/charts/pulsar/values.yaml#L150-L172
   
   > The thing is NARs are suppose to be self contained which means all the transitive dependencies of the connector needs to be packaged in it as well. 
   
   NARs might supposed to be self contained, but the technical implementation doesn't support this. It seems that the Pulsar dependencies are in the parent classloader of the NAR classloader in all configurations of Pulsar Functions: thread, process and k8s runtime. **Because of this, any library that is part of Pulsar dependencies will get loaded from the parent classloader and not from the jar files embedded in the NAR file.**  That is the reason why adding any classes that are part of Pulsar dependencies is unnecessary duplication in the Pulsar connector .nar files.
   This might not have been the original intention of the Pulsar Functions design.
   
   I'll spend some more time to verify with experimenting and testing whether it is the case or not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-783431720


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790834459


   Also because this PR:
   
   https://github.com/apache/pulsar/pull/9246/
   
   That separated pulsar-client-admin into two modules pulsar-client-admin and pulsar-client-admin-api.  pulsar-cient-admin-api module pulls in a bunch of dependencies which doesn't make sense for a "api" module.  The java-instance that gets loaded by the root classloader should only contain those few interfaces but now contain
   
   ` org.apache.pulsar:pulsar-functions-runtime-all:jar:2.7.0
   [INFO] +- org.apache.pulsar:pulsar-io-core:jar:2.7.0:compile
   [INFO] |  \- (org.apache.pulsar:pulsar-functions-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] +- org.apache.pulsar:pulsar-functions-api:jar:2.7.0:compile
   [INFO] |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.7.30; omitted for duplicate)
   [INFO] |  +- (org.apache.pulsar:pulsar-client-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] |  \- org.apache.pulsar:pulsar-client-admin-api:jar:2.7.0:compile
   [INFO] |     +- org.apache.pulsar:pulsar-common:jar:2.7.0:compile
   [INFO] |     |  +- (org.apache.pulsar:pulsar-client-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] |     |  +- io.swagger:swagger-annotations:jar:1.6.2:compile
   [INFO] |     |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.7.30; omitted for duplicate)
   [INFO] |     |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.11.1:compile (version managed from 2.9.9.3)
   [INFO] |     |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.11.1:compile
   [INFO] |     |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.11.1:compile (version managed from 2.9.9)
   [INFO] |     |  +- com.google.guava:guava:jar:30.1-jre:compile (version managed from 29.0-android)
   [INFO] |     |  |  +- com.google.guava:failureaccess:jar:1.0.1:compile
   [INFO] |     |  |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
   [INFO] |     |  |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
   [INFO] |     |  |  +- org.checkerframework:checker-qual:jar:3.5.0:compile
   [INFO] |     |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.4:compile
   [INFO] |     |  |  \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
   [INFO] |     |  +- io.netty:netty-handler:jar:4.1.51.Final:compile
   [INFO] |     |  |  +- io.netty:netty-common:jar:4.1.51.Final:compile
   [INFO] |     |  |  +- io.netty:netty-resolver:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  \- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  +- io.netty:netty-buffer:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  \- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  +- io.netty:netty-transport:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  +- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  |  +- (io.netty:netty-buffer:jar:4.1.51.Final:compile - omitted for duplicate)
   ...`
   
   This is also a problem.  @lhotari this is why the tests pass for the connectors even though dependencies are not packaged explicitly.  @sijie that PR has become very problematic for many reasons now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790819422


   @jerrypeng Thanks for your reply. I'll take a closer look tomorrow and go through the details.
   
   As long as we find a solution to get reduce the Pulsar IO Connectors size, I'm fine with that.
   It seems that there should be a way to get rid of the dependencies that are Pulsar dependencies.
   Do you have another solution in mind?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng edited a comment on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng edited a comment on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790834459


   Also because this PR:
   
   https://github.com/apache/pulsar/pull/9246/
   
   That separated pulsar-client-admin into two modules pulsar-client-admin and pulsar-client-admin-api.  pulsar-cient-admin-api module pulls in a bunch of dependencies which doesn't make sense for a "api" module.  The java-instance that gets loaded by the root classloader should only contain those few interfaces but now contain
   
    org.apache.pulsar:pulsar-functions-runtime-all:jar:2.7.0
   [INFO] +- org.apache.pulsar:pulsar-io-core:jar:2.7.0:compile
   [INFO] |  \- (org.apache.pulsar:pulsar-functions-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] +- org.apache.pulsar:pulsar-functions-api:jar:2.7.0:compile
   [INFO] |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.7.30; omitted for duplicate)
   [INFO] |  +- (org.apache.pulsar:pulsar-client-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] |  \- org.apache.pulsar:pulsar-client-admin-api:jar:2.7.0:compile
   [INFO] |     +- org.apache.pulsar:pulsar-common:jar:2.7.0:compile
   [INFO] |     |  +- (org.apache.pulsar:pulsar-client-api:jar:2.7.0:compile - omitted for duplicate)
   [INFO] |     |  +- io.swagger:swagger-annotations:jar:1.6.2:compile
   [INFO] |     |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.7.30; omitted for duplicate)
   [INFO] |     |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.11.1:compile (version managed from 2.9.9.3)
   [INFO] |     |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.11.1:compile
   [INFO] |     |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.11.1:compile (version managed from 2.9.9)
   [INFO] |     |  +- com.google.guava:guava:jar:30.1-jre:compile (version managed from 29.0-android)
   [INFO] |     |  |  +- com.google.guava:failureaccess:jar:1.0.1:compile
   [INFO] |     |  |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
   [INFO] |     |  |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
   [INFO] |     |  |  +- org.checkerframework:checker-qual:jar:3.5.0:compile
   [INFO] |     |  |  +- com.google.errorprone:error_prone_annotations:jar:2.3.4:compile
   [INFO] |     |  |  \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
   [INFO] |     |  +- io.netty:netty-handler:jar:4.1.51.Final:compile
   [INFO] |     |  |  +- io.netty:netty-common:jar:4.1.51.Final:compile
   [INFO] |     |  |  +- io.netty:netty-resolver:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  \- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  +- io.netty:netty-buffer:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  \- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  +- io.netty:netty-transport:jar:4.1.51.Final:compile
   [INFO] |     |  |  |  +- (io.netty:netty-common:jar:4.1.51.Final:compile - omitted for duplicate)
   [INFO] |     |  |  |  +- (io.netty:netty-buffer:jar:4.1.51.Final:compile - omitted for duplicate)
   ...
   
   This is also a problem.  @lhotari this is why the tests pass for the connectors even though dependencies are not packaged explicitly.  @sijie that PR has become very problematic for many reasons now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790876507


   > Also because this PR:
   > 
   > https://github.com/apache/pulsar/pull/9246/
   > 
   >
   > ...
   > 
   > This is also a problem.  @lhotari this is why the tests pass for the connectors even though dependencies are not packaged explicitly.  @sijie that PR has become very problematic for many reasons now.
   
   #9640 has been reported about this issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-790892094


   @lhotari thanks for filling the issue. 
   
   > However that doesn't resolve the problem that Pulsar users have.
   
   Why do we have to package all the connectors into one image even for users?  I doubt a user will need to use all connectors Pulsar has to offer.  Perhaps we can have a strategy in which a user can choose which connectors, he or she as actually needs and that gets baked into the image.  And for users that just want to use pub sub, I think there is already an image that doesn't contain any connectors.
   
   > Isn't there a simple solution to get rid of the unnecessary jar files in the .nar files?
   
   The thing is NARs are suppose to be self contained which means all the transitive dependencies of the connector needs to be packaged in it as well.  There are advantages to this as I mentioned before, however the downside is will duplicate dependencies across all connectors. However, this allows connectors evolve in a more of a vacuum than be dependent versions of libraries used by other connectors or pulsar.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784556444


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784837067


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784918115


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9638: Reduce Pulsar IO Connectors size

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9638:
URL: https://github.com/apache/pulsar/pull/9638#issuecomment-784306660


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org