You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by "Rabe, Jens" <je...@iwes.fraunhofer.de> on 2018/10/08 10:31:20 UTC
Submitting a PySpark batch job ignores jars sent with it
Hello,
I defined a custom format to read data into spark. This works when used in Scala Spark or e.g. from Zeppelin, also with PySpark.
I now try to use this from Livy. I post something like this to http://mylivy:8998/batches:
{
"file":"/path/to/myjob.py",
"args":["foo", "bar"],
"jars":"/path/to/myformat-assembly.jar"
}
In the log I see the jar gets loaded and added:
"2018-10-08 12:23:28 INFO SparkContext:54 - Added JAR file:/// path/to/myformat-assembly.jar at spark://172.30.10.10:45613/jars/ myformat-assembly.jar with timestamp 1538994208755"
But my PySpark job doesn't find the format:
"Traceback (most recent call last):",
" File \"/path/to/myjob.py \", line 13, in <module>",
" data = spark.read.format(\"my.custom.format\").load(path)",
" File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 166, in load",
" File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\", line 1257, in __call__",
" File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 63, in deco",
" File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py\", line 328, in get_return_value",
"py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.",
": java.lang.ClassNotFoundException: Failed to find data source: my.custom.format. Please find packages at http://spark.apache.org/third-party-projects.html",
When opening a session (which loads the same library jar) and sending the respective command, it fails as well.
However, I just added a simple object into this library, and calling this works (like using sc._jvm.somepackage.Foo.bar())
What am I missing?
RE: Submitting a PySpark batch job ignores jars sent with it
Posted by "Rabe, Jens" <je...@iwes.fraunhofer.de>.
Please disregard, I used an obsolete version of the jar which did indeed not have the classes in...
From: Rabe, Jens <je...@iwes.fraunhofer.de>
Sent: Montag, 8. Oktober 2018 12:31
To: user@livy.incubator.apache.org
Subject: Submitting a PySpark batch job ignores jars sent with it
Hello,
I defined a custom format to read data into spark. This works when used in Scala Spark or e.g. from Zeppelin, also with PySpark.
I now try to use this from Livy. I post something like this to http://mylivy:8998/batches:
{
"file":"/path/to/myjob.py",
"args":["foo", "bar"],
"jars":"/path/to/myformat-assembly.jar"
}
In the log I see the jar gets loaded and added:
"2018-10-08 12:23:28 INFO SparkContext:54 - Added JAR file:/// path/to/myformat-assembly.jar at spark://172.30.10.10:45613/jars/ myformat-assembly.jar with timestamp 1538994208755"
But my PySpark job doesn't find the format:
"Traceback (most recent call last):",
" File \"/path/to/myjob.py \", line 13, in <module>",
" data = spark.read.format(\"my.custom.format\").load(path)",
" File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 166, in load",
" File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\", line 1257, in __call__",
" File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 63, in deco",
" File \"/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py\", line 328, in get_return_value",
"py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.",
": java.lang.ClassNotFoundException: Failed to find data source: my.custom.format. Please find packages at http://spark.apache.org/third-party-projects.html",
When opening a session (which loads the same library jar) and sending the respective command, it fails as well.
However, I just added a simple object into this library, and calling this works (like using sc._jvm.somepackage.Foo.bar())
What am I missing?