You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by José Luis Pedrosa <jl...@gmail.com> on 2019/07/04 17:13:08 UTC

Spark 2.4.3 with hadoop 3.2 docker image.

Hi All

I'm trying to create docker images that can access azure services using
abfs hadoop driver, which is only available in haddop 3.2.

So I downloaded spark without Hadoop and generated spark images using the
docker-image-tool.sh  itself.

In a new image using the resulting image as FROM, I've added hadoop 3.2
binary distro and following
https://spark.apache.org/docs/2.2.0/hadoop-provided.html I've set:

export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)


Then when launching the jobs in K8s, it turns out, that the driver uses
internally spark-submit for the driver
<https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L89>
but
it seems that launches with java directly for the executor
<https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L111>
Result is that drivers can run correctly, but executors fails due to
missing sl4j class

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)


If I'd add it manually to the class path, then another hadoop class would
be missing.

What is the right way to generate a docker image for spark 2.4 with a
custom hadoop distribution?


Thanks and regards
JL

Re: Spark 2.4.3 with hadoop 3.2 docker image.

Posted by Julien Laurenceau <ju...@pepitedata.com>.
Hi
Did you try using the image build by mesosphere ?
I am not sure they already build the combo 2.4 / 3.2 but they provide a
project on github that Can be used to generate tour custom combo. It is
named mesosphere/spark-build
Regards

Le jeu. 4 juil. 2019 à 19:13, José Luis Pedrosa <jl...@gmail.com> a
écrit :

> Hi All
>
> I'm trying to create docker images that can access azure services using
> abfs hadoop driver, which is only available in haddop 3.2.
>
> So I downloaded spark without Hadoop and generated spark images using the
> docker-image-tool.sh  itself.
>
> In a new image using the resulting image as FROM, I've added hadoop 3.2
> binary distro and following
> https://spark.apache.org/docs/2.2.0/hadoop-provided.html I've set:
>
> export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
>
>
> Then when launching the jobs in K8s, it turns out, that the driver uses
> internally spark-submit for the driver
> <https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L89> but
> it seems that launches with java directly for the executor
> <https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L111>
> Result is that drivers can run correctly, but executors fails due to
> missing sl4j class
>
> Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
> at java.lang.Class.getMethod0(Class.java:3018)
> at java.lang.Class.getMethod(Class.java:1784)
> at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
> at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
>
> If I'd add it manually to the class path, then another hadoop class would
> be missing.
>
> What is the right way to generate a docker image for spark 2.4 with a
> custom hadoop distribution?
>
>
> Thanks and regards
> JL
>