You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Sourav Mazumder <so...@gmail.com> on 2015/10/20 15:54:32 UTC
Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Looks like right now there is no way one can pass additional jar files to
spark_submit from Zeppelin.The same works fine if I am not using
spark_submit option (by not specifying spark_home).
When I checked the code interpreter.sh I found that for the class path it
only passes the zeppelin-spark*.jar available in
zeppelin_home/interpreter/spark directory.
I suggest to put this as a bug/enhancement. The solution should be pretty
easy by making some small changes in interpreter.sh (I've done the same and
could make it work with some external_lib folder under
zeppelin_home/interpreter/spark directory).
Regards,
Sourav
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by Sourav Mazumder <so...@gmail.com>.
Thanks a lot Moon. This works.
Regards,
Sourav
On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:
> Sourav,
>
> There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
> uses spark-submit command.
>
> 1. Using %dep interpreter, as Vinay mentioned.
> eg)
> %dep
> z.load("group:artifact:version")
>
> %spark
> import ....
>
> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
> eg)
> spark.files /path/to/my.jar
>
> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
> ZEPPELIN_HOME/conf/zeppelin-env.sh
> eg)
> export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
> note) does not work for pyspark yet.
> https://issues.apache.org/jira/browse/ZEPPELIN-339
>
> Hope this helps.
>
> Best,
> moon
>
> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
> wrote:
>
>> Saurav,
>>
>>
>> Agree this would be a useful feature. Right now Zeppelin can import
>> dependencies from Maven with %dep interpreter. But this does not understand
>> spark-packages.
>>
>>
>> -Vinay
>>
>>
>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Looks like right now there is no way one can pass additional jar files
>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>> spark_submit option (by not specifying spark_home).
>>>
>>> When I checked the code interpreter.sh I found that for the class path
>>> it only passes the zeppelin-spark*.jar available in
>>> zeppelin_home/interpreter/spark directory.
>>>
>>> I suggest to put this as a bug/enhancement. The solution should be
>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>> same and could make it work with some external_lib folder under
>>> zeppelin_home/interpreter/spark directory).
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by lisak <li...@gmail.com>.
Hi,
the problems with %deps were varying error messages about that it shouldn't
be executed before %spark interpreter and also it seemed to me that it was
always working for the first time and if I ran it for second time I got
weird errors about different content in jvm classes...
Anyway what works for me is : using `--jars` for libraries that executors
need and doing only :
z.addRepo("sonatype-snapshots",
"https://oss.sonatype.org/content/repositories/snapshots", true)
z.load("com.example:spark-extensions_2.10:0.09-SNAPSHOT")
without %deps, for dependencies that are used in driver/notebook ...
^^ this combination seems to be working and not causing any problems at all
Btw for using hadoop-aws the best thing one can do is :
--jars=file:/some/path/aws-java-sdk-1.7.14.jar,file:/some/path/hadoop-aws-2.6.0.jar
Otherwise one is always fighting problems with clashes of transitive
dependencies
--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1797.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by Felix Cheung <fe...@hotmail.com>.
There might be some difference since the former has some dependencies resolutions.
What is the issue you have with %dep?What is the dependency it has conflict with Hadoop-aws?
_____________________________
From: lisak <li...@gmail.com>
Sent: Monday, December 14, 2015 12:03 PM
Subject: Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed
To: <us...@zeppelin.incubator.apache.org>
I guess that :
%deps
z.reset()
z.load("org.apache.hadoop:hadoop-aws:2.6.0")
isn't the same as doing :
SPARK_SUBMIT_OPTIONS="--packages org.apache.hadoop:hadoop-aws:2.6.0"
because the latter leads to weird dependency conflicts
--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1784.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by lisak <li...@gmail.com>.
I guess that :
%deps
z.reset()
z.load("org.apache.hadoop:hadoop-aws:2.6.0")
isn't the same as doing :
SPARK_SUBMIT_OPTIONS="--packages org.apache.hadoop:hadoop-aws:2.6.0"
because the latter leads to weird dependency conflicts
--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1784.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by lisak <li...@gmail.com>.
Hey,
I'm still struggling with %deps so I tried :
SPARK_SUBMIT_OPTIONS="--repositories
https://oss.sonatype.org/content/repositories/snapshots --packages
com.example:spark-extensions_2.10:0.09-SNAPSHOT,org.apache.hadoop:hadoop-aws:2.6.0"
and when I first run a notebook I can see log entries about downloading the
deps and all its transitive dependencies BUT in the mean time I get this
error in notebook :
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:139)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:266)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:199)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:320)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
and it never recovers from it, if I run it again I get :
org.apache.thrift.TApplicationException: Internal error processing
getFormType
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getFormType(RemoteInterpreterService.java:288)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getFormType(RemoteInterpreterService.java:275)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:281)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:199)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:320)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1783.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by moon soo Lee <mo...@apache.org>.
If you're using 0.5.0-incubating, you can check "2 Loading Spark
Properties" in
http://zeppelin.incubator.apache.org/docs/interpreter/spark.html page.
Best,
moon
On Sat, Nov 14, 2015 at 4:37 PM Girish Reddy <gi...@springml.com> wrote:
> Does the spark.files option only work in 0.6? Any workaround for earlier
> versions to load external jars?
>
> On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Sourav,
>>
>> There're couple of ways to add external jar when Zeppelin
>> (0.6.0-SNAPSHOT) uses spark-submit command.
>>
>> 1. Using %dep interpreter, as Vinay mentioned.
>> eg)
>> %dep
>> z.load("group:artifact:version")
>>
>> %spark
>> import ....
>>
>> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
>> eg)
>> spark.files /path/to/my.jar
>>
>> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
>> ZEPPELIN_HOME/conf/zeppelin-env.sh
>> eg)
>> export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
>> note) does not work for pyspark yet.
>> https://issues.apache.org/jira/browse/ZEPPELIN-339
>>
>> Hope this helps.
>>
>> Best,
>> moon
>>
>> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
>> wrote:
>>
>>> Saurav,
>>>
>>>
>>> Agree this would be a useful feature. Right now Zeppelin can import
>>> dependencies from Maven with %dep interpreter. But this does not understand
>>> spark-packages.
>>>
>>>
>>> -Vinay
>>>
>>>
>>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>>> Looks like right now there is no way one can pass additional jar files
>>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>>> spark_submit option (by not specifying spark_home).
>>>>
>>>> When I checked the code interpreter.sh I found that for the class path
>>>> it only passes the zeppelin-spark*.jar available in
>>>> zeppelin_home/interpreter/spark directory.
>>>>
>>>> I suggest to put this as a bug/enhancement. The solution should be
>>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>>> same and could make it work with some external_lib folder under
>>>> zeppelin_home/interpreter/spark directory).
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>
>>>
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by Girish Reddy <gi...@springml.com>.
Does the spark.files option only work in 0.6? Any workaround for earlier
versions to load external jars?
On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:
> Sourav,
>
> There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
> uses spark-submit command.
>
> 1. Using %dep interpreter, as Vinay mentioned.
> eg)
> %dep
> z.load("group:artifact:version")
>
> %spark
> import ....
>
> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
> eg)
> spark.files /path/to/my.jar
>
> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
> ZEPPELIN_HOME/conf/zeppelin-env.sh
> eg)
> export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
> note) does not work for pyspark yet.
> https://issues.apache.org/jira/browse/ZEPPELIN-339
>
> Hope this helps.
>
> Best,
> moon
>
> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
> wrote:
>
>> Saurav,
>>
>>
>> Agree this would be a useful feature. Right now Zeppelin can import
>> dependencies from Maven with %dep interpreter. But this does not understand
>> spark-packages.
>>
>>
>> -Vinay
>>
>>
>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Looks like right now there is no way one can pass additional jar files
>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>> spark_submit option (by not specifying spark_home).
>>>
>>> When I checked the code interpreter.sh I found that for the class path
>>> it only passes the zeppelin-spark*.jar available in
>>> zeppelin_home/interpreter/spark directory.
>>>
>>> I suggest to put this as a bug/enhancement. The solution should be
>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>> same and could make it work with some external_lib folder under
>>> zeppelin_home/interpreter/spark directory).
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by moon soo Lee <mo...@apache.org>.
Sourav,
There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
uses spark-submit command.
1. Using %dep interpreter, as Vinay mentioned.
eg)
%dep
z.load("group:artifact:version")
%spark
import ....
2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
eg)
spark.files /path/to/my.jar
3. By exporting SPARK_SUBMIT_OPTIONS env variable in
ZEPPELIN_HOME/conf/zeppelin-env.sh
eg)
export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
note) does not work for pyspark yet.
https://issues.apache.org/jira/browse/ZEPPELIN-339
Hope this helps.
Best,
moon
On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com> wrote:
> Saurav,
>
>
> Agree this would be a useful feature. Right now Zeppelin can import
> dependencies from Maven with %dep interpreter. But this does not understand
> spark-packages.
>
>
> -Vinay
>
>
> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> Looks like right now there is no way one can pass additional jar files to
>> spark_submit from Zeppelin.The same works fine if I am not using
>> spark_submit option (by not specifying spark_home).
>>
>> When I checked the code interpreter.sh I found that for the class path it
>> only passes the zeppelin-spark*.jar available in
>> zeppelin_home/interpreter/spark directory.
>>
>> I suggest to put this as a bug/enhancement. The solution should be pretty
>> easy by making some small changes in interpreter.sh (I've done the same and
>> could make it work with some external_lib folder under
>> zeppelin_home/interpreter/spark directory).
>>
>> Regards,
>> Sourav
>>
>
>
Re: Cannot use arbitrary external jar files in spark_submit through
Zeppelin - should be fixed
Posted by Vinay Shukla <vi...@gmail.com>.
Saurav,
Agree this would be a useful feature. Right now Zeppelin can import
dependencies from Maven with %dep interpreter. But this does not understand
spark-packages.
-Vinay
On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:
> Looks like right now there is no way one can pass additional jar files to
> spark_submit from Zeppelin.The same works fine if I am not using
> spark_submit option (by not specifying spark_home).
>
> When I checked the code interpreter.sh I found that for the class path it
> only passes the zeppelin-spark*.jar available in
> zeppelin_home/interpreter/spark directory.
>
> I suggest to put this as a bug/enhancement. The solution should be pretty
> easy by making some small changes in interpreter.sh (I've done the same and
> could make it work with some external_lib folder under
> zeppelin_home/interpreter/spark directory).
>
> Regards,
> Sourav
>