You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Niels Pardon (Jira)" <ji...@apache.org> on 2023/10/16 17:22:00 UTC

[jira] [Created] (SPARK-45557) Spark Connect can not be started because of missing user home dir in Docker container

Niels Pardon created SPARK-45557:
------------------------------------

             Summary: Spark Connect can not be started because of missing user home dir in Docker container
                 Key: SPARK-45557
                 URL: https://issues.apache.org/jira/browse/SPARK-45557
             Project: Spark
          Issue Type: Bug
          Components: Spark Docker
    Affects Versions: 3.5.0, 3.4.1, 3.4.0
            Reporter: Niels Pardon


I was trying to start Spark Connect within a container using the Spark Docker container images and ran into an issue where Ivy could not pull the Spark Connect JAR since the user home /home/spark does not exist.

Steps to reproduce:

1. Start the Spark container with `/bin/bash` as the command:
{code:java}
docker run -it --rm apache/spark:3.5.0 /bin/bash {code}
2. Try to start Spark Connect within the container:

 
{code:java}
/opt/spark/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.0 {code}
which lead to this output:

 

 
{code:java}
starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0
  	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
  	at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
  	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
  	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
  	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
  	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
  	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
  	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
  	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
  	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
full log in /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out {code}
where then the full log file looks like this:
{code:java}
Spark Command: /opt/java/openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 spark-internal
========================================
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/spark/.ivy2/cache
The jars for the packages stored in: /home/spark/.ivy2/jars
org.apache.spark#spark-connect_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0
	confs: [default]
Exception in thread "main" java.io.FileNotFoundException: /home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml (No such file or directory)
	at java.base/java.io.FileOutputStream.open0(Native Method)
	at java.base/java.io.FileOutputStream.open(Unknown Source)
	at java.base/java.io.FileOutputStream.<init>(Unknown Source)
	at java.base/java.io.FileOutputStream.<init>(Unknown Source)
	at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:71)
	at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:63)
	at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:553)
	at org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:184)
	at org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:259)
	at org.apache.ivy.Ivy.resolve(Ivy.java:522)
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
	at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
 

The issue is that the user home /home/spark directory does not exist.
{code:java}
$ ls -l /home
total 0 
${code}
It seems there is an easy fix: simply switching from useradd to adduser in the Dockerfile should get the user home directory created.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org