You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/07/19 09:54:02 UTC

[GitHub] [incubator-seatunnel] zhangyuge1 opened a new issue, #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

zhangyuge1 opened a new issue, #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   In the previous version , spark used the --jars parameter to specify the path of connectors when submitting tasks.
   22/07/19 17:48:23 INFO SparkContainer: Execute SeaTunnel Spark Job: ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.spark.SeatunnelSpark" --name "SeaTunnel" --master "local" --deploy-mode "client" **--jars "/tmp/spark/seatunnel/connectors/spark/seatunnel-connector-spark-console-2.1.3-SNAPSHOT.jar,/tmp/spark/seatunnel/connectors/spark/seatunnel-connector-spark-fake-2.1.3-SNAPSHOT.jar"** --conf "spark.executor.memory=1g" --conf "spark.master=local" --conf "spark.executor.cores=1" --conf "spark.app.name=SeaTunnel" --conf "spark.executor.instances=2" /tmp/spark/seatunnel/lib/seatunnel-core-spark.jar --master local --deploy-mode client --config /tmp/fake/fakesource_to_console.conf
   
   But the dev branch does not do this.
   Execute SeaTunnel Spark Job: ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.starter.spark.SeatunnelSpark" --name "SeaTunnel" --master "local" --deploy-mode "client" --conf "spark.executor.memory=1g" --conf "spark.master=local" --conf "spark.executor.cores=1" --conf "spark.app.name=SeaTunnel" --conf "spark.executor.instances=2" /tmp/spark/seatunnel/lib/seatunnel-spark-starter.jar --master local --deploy-mode client --config /tmp/fake/fakesource_to_console.conf
   
   Now when I run org.apache.seatunnel.e2e.spark.v2.fake.FakeSourceToConsoleIT test, an error has occurred.
   ![image](https://user-images.githubusercontent.com/49311144/179722721-b7bfa41d-3210-48e3-8267-f06ad16a8693.png)
   
   
   
   ### SeaTunnel Version
   
   dev
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set spark configuration here
     # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
     spark.app.name = "SeaTunnel"
     spark.executor.instances = 2
     spark.executor.cores = 1
     spark.executor.memory = "1g"
     spark.master = local
   }
   
   source {
     # This is a example input plugin **only for test and demonstrate the feature input plugin**
     Fake {
       result_table_name = "my_dataset"
     }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   e2e test
   ```
   
   
   ### Error Exception
   
   ```log
   java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource
   ```
   
   
   ### Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189017364

   > > If so, why our ci always pass? @CalvinKirs @ruanwenjun 😂
   > 
   > We need to use Assert sink plugin to assert the data, I remember in v1 e2e, if the job failed, the linux code is not 0.
   
   ![image](https://user-images.githubusercontent.com/16631152/179754966-472501c0-2805-4705-976b-ab8755e3dcbb.png)
   yup, 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189000882

   > If so, why our ci always pass? @CalvinKirs @ruanwenjun 😂
   
   We need to use Assert sink plugin to assert the data, I remember in v1 e2e, if the job failed, the linux code is not 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1188993771

   > If so, why our ci always pass? @CalvinKirs @ruanwenjun 😂
   
   I will take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1188888325

   > I find we will find the v2 connectors from `spark` dir
   > 
   > https://github.com/apache/incubator-seatunnel/blob/c61bbdd78cea0e2dc34a18a806ba07329b2254c9/seatunnel-core/seatunnel-spark-starter/src/main/java/org/apache/seatunnel/core/starter/spark/SparkStarter.java#L224-L236
   > 
   > 
   > But we copy the v2 connector in `seatunnel` dir,
   > https://github.com/apache/incubator-seatunnel/blob/c61bbdd78cea0e2dc34a18a806ba07329b2254c9/seatunnel-e2e/seatunnel-spark-connector-v2-e2e/src/test/java/org/apache/seatunnel/e2e/spark/SparkContainer.java#L145-L148
   > 
   > 
   > so we cannot find the connector in e2e. Are you willing to fix this?
   
   Yep, we should find jar in `connectors/seatunnel` dir, not `connectors/spark` dir


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] kayleyang commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by "kayleyang (via GitHub)" <gi...@apache.org>.
kayleyang commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1413254409

   > 
   
   why you should find jar in `connectors/seatunnel` dir, not `connectors/spark` dir, `install-plugin.sh` install the spark connectors to `connectors/spark` dir, and this arg mean use spark engine


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X closed issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
Hisoka-X closed issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource
URL: https://github.com/apache/incubator-seatunnel/issues/2212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1190151034

   Hi @lhyundeadsoul already fixed on #2221 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1188852373

   @Hisoka-X Please take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189803968

   I find this bug doesn't cause by jar loading(because I tested by use code #2193, it's not work). This is problem `SerializationUtils.deserialize` can't work correctly on spark. Check this: https://www.mail-archive.com/user@commons.apache.org/msg11765.html .So I think maybe we should  create a deserialize tool by ourself or modify common.lang3 source code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1188883283

   I find we will find the v2 connectors from `spark` dir
   ```
   private List<Path> getConnectorJarDependencies() {
           Path pluginRootDir = Common.connectorJarDir("spark");
           if (!Files.exists(pluginRootDir) || !Files.isDirectory(pluginRootDir)) {
               return Collections.emptyList();
           }
           Config config = new ConfigBuilder(Paths.get(commandArgs.getConfigFile())).getConfig();
           Set<URL> pluginJars = new HashSet<>();
           SparkSourcePluginDiscovery sparkSourcePluginDiscovery = new SparkSourcePluginDiscovery();
           SparkSinkPluginDiscovery sparkSinkPluginDiscovery = new SparkSinkPluginDiscovery();
           pluginJars.addAll(sparkSourcePluginDiscovery.getPluginJarPaths(getPluginIdentifiers(config, PluginType.SOURCE)));
           pluginJars.addAll(sparkSinkPluginDiscovery.getPluginJarPaths(getPluginIdentifiers(config, PluginType.SINK)));
           return pluginJars.stream().map(url -> new File(url.getPath()).toPath()).collect(Collectors.toList());
       }
   ```
   But we copy the v2 connector in `seatunnel` dir, so we cannot find the connector in e2e. Are you willing to fix this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189160233

   I change the submit jar logic in SparkStarter, and now the jar can submit to spark by --jars
   ```
   Execute SeaTunnel Spark Job: ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.starter.spark.SeatunnelSpark" --name "SeaTunnel" --master "local" --deploy-mode "client" --jars "/tmp/spark/seatunnel/connectors/seatunnel/connector-fake-2.1.3-SNAPSHOT.jar,/tmp/spark/seatunnel/connectors/seatunnel/connector-console-2.1.3-SNAPSHOT.jar" --conf "spark.executor.memory=1g" --conf "spark.master=local" --conf "job.mode=BATCH" --conf "spark.executor.cores=1" --conf "spark.app.name=SeaTunnel" --conf "spark.executor.instances=2" /tmp/spark/seatunnel/lib/seatunnel-spark-starter.jar --master local --deploy-mode client --config /tmp/fake/fakesource_to_console.conf
   ```
   But still get error
   ```java
   t(SparkSubmit.scala:924)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:348)
   	at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
   	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
   	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
   	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
   	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
   	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
   	at org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:223)
   	... 26 more
   ```
   This is caused by we use find the plugin jar at path, and we use `URLClassloader` to load the class, so we cannot deserialize the plugin class, since the plugin class is only load by the `URLClassloader`, this can be fixed by #2193.
   
   BTW, we need to change the `getConnectorJarDependencies` in SparkStarter, since we put the new connector understand `seatunnel` so we need to find it from `seatunnel` and use `SeaTunnelSourcePluginDiscovery` rather than `SparkSourcePluginDiscovery`.
   ```java
       private List<Path> getConnectorJarDependencies() {
           Path pluginRootDir = Common.connectorJarDir("seatunnel");
           LOGGER.info("Connector plugin dir is: {}", pluginRootDir);
           if (!Files.exists(pluginRootDir) || !Files.isDirectory(pluginRootDir)) {
               LOGGER.warn("Cannot find connector plugin from {}", pluginRootDir);
               return Collections.emptyList();
           }
           Config config = new ConfigBuilder(Paths.get(commandArgs.getConfigFile())).getConfig();
           Set<URL> pluginJars = new HashSet<>();
           SeaTunnelSourcePluginDiscovery sparkSourcePluginDiscovery = new SeaTunnelSourcePluginDiscovery();
           SeaTunnelSinkPluginDiscovery sparkSinkPluginDiscovery = new SeaTunnelSinkPluginDiscovery();
           pluginJars.addAll(sparkSourcePluginDiscovery.getPluginJarPaths(getPluginIdentifiers(config, PluginType.SOURCE)));
           pluginJars.addAll(sparkSinkPluginDiscovery.getPluginJarPaths(getPluginIdentifiers(config, PluginType.SINK)));
           return pluginJars.stream().map(url -> new File(url.getPath()).toPath()).collect(Collectors.toList());
       }
   
      private List<PluginIdentifier> getPluginIdentifiers(Config config, PluginType... pluginTypes) {
           return Arrays.stream(pluginTypes).flatMap((Function<PluginType, Stream<PluginIdentifier>>) pluginType -> {
               List<? extends Config> configList = config.getConfigList(pluginType.getType());
               return configList.stream()
                       .map(pluginConfig -> PluginIdentifier
                               .of("seatunnel",
                                       pluginType.getType(),
                                       pluginConfig.getString("plugin_name")));
           }).collect(Collectors.toList());
       }
   ```
   @zhangyuge1 Could you please help to fix this. cc @CalvinKirs @Hisoka-X .
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] lhyundeadsoul commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
lhyundeadsoul commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189882830

   I met a similar error in which there is no LocalFile Sink ClassNotFoundException: https://github.com/apache/incubator-seatunnel/pull/2214#issuecomment-1189873125
   This block my PR CI. So if you guys fix this, please let me know. Thx.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] lhyundeadsoul commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
lhyundeadsoul commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1190989764

   > > If so, why our ci always pass? @CalvinKirs @ruanwenjun 😂
   > 
   > We need to use Assert sink plugin to assert the data, I remember in v1 e2e, if the job failed, the linux code is not 0.
   
   Assert Sink plugin has already been supported in v2 connector. @ruanwenjun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1188936638

   If so, why our ci always pass? @CalvinKirs @ruanwenjun 😂


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on issue #2212: [Bug] [org.apache.seatunnel.e2e.spark.v2.fake] java.lang.ClassNotFoundException: org.apache.seatunnel.connectors.seatunnel.fake.source.FakeSource

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #2212:
URL: https://github.com/apache/incubator-seatunnel/issues/2212#issuecomment-1189808594

   > I find this bug doesn't cause by jar loading(because I tested by use code #2193, it's not work). This is problem `SerializationUtils.deserialize` can't work correctly on spark. Check this: https://www.mail-archive.com/user@commons.apache.org/msg11765.html .So I think maybe we should create a deserialize tool by ourself or modify common.lang3 source code.
   Thanks for your feedback, Modifying the source code isn't an ideal solution, is there any other version of ` commons-lang` that can solve this problem?  or we take the first solution.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org