You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Sourav Mazumder <so...@gmail.com> on 2015/10/20 15:54:32 UTC

Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Looks like right now there is no way one can pass additional jar files to
spark_submit from Zeppelin.The same works fine if I am not using
spark_submit option (by not specifying spark_home).

When I checked the code interpreter.sh I found that for the class path it
only passes the zeppelin-spark*.jar available in
zeppelin_home/interpreter/spark directory.

I suggest to put this as a bug/enhancement. The solution should be pretty
easy by making some small changes in interpreter.sh (I've done the same and
could make it work with some external_lib folder under
zeppelin_home/interpreter/spark directory).

Regards,
Sourav

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by Sourav Mazumder <so...@gmail.com>.
Thanks a lot Moon. This works.

Regards,
Sourav

On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:

> Sourav,
>
> There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
> uses spark-submit command.
>
> 1. Using %dep interpreter, as Vinay mentioned.
>     eg)
>        %dep
>         z.load("group:artifact:version")
>
>         %spark
>         import ....
>
> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
>      eg)
>         spark.files  /path/to/my.jar
>
> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
> ZEPPELIN_HOME/conf/zeppelin-env.sh
>      eg)
>         export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
>      note) does not work for pyspark yet.
> https://issues.apache.org/jira/browse/ZEPPELIN-339
>
> Hope this helps.
>
> Best,
> moon
>
> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
> wrote:
>
>> Saurav,
>>
>>
>> Agree this would be a useful feature. Right now Zeppelin can import
>> dependencies from Maven with %dep interpreter. But this does not understand
>> spark-packages.
>>
>>
>> -Vinay
>>
>>
>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Looks like right now there is no way one can pass additional jar files
>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>> spark_submit option (by not specifying spark_home).
>>>
>>> When I checked the code interpreter.sh I found that for the class path
>>> it only passes the zeppelin-spark*.jar available in
>>> zeppelin_home/interpreter/spark directory.
>>>
>>> I suggest to put this as a bug/enhancement. The solution should be
>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>> same and could make it work with some external_lib folder under
>>> zeppelin_home/interpreter/spark directory).
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by lisak <li...@gmail.com>.
Hi,

the problems with %deps were varying error messages about that it shouldn't
be executed before %spark interpreter and also it seemed to me that it was
always working for the first time and if I ran it for second time I got
weird errors about different content in jvm classes...

Anyway what works for me is : using `--jars` for libraries that executors
need and doing only : 

z.addRepo("sonatype-snapshots",
"https://oss.sonatype.org/content/repositories/snapshots", true)
z.load("com.example:spark-extensions_2.10:0.09-SNAPSHOT")

without %deps, for dependencies that are used in driver/notebook ...

^^ this combination seems to be working and not causing any problems at all

Btw for using hadoop-aws the best thing one can do is : 

--jars=file:/some/path/aws-java-sdk-1.7.14.jar,file:/some/path/hadoop-aws-2.6.0.jar

Otherwise one is always fighting problems with clashes of transitive
dependencies 



--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1797.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by Felix Cheung <fe...@hotmail.com>.
There might be some difference since the former has some dependencies resolutions.
What is the issue you have with %dep?What is the dependency it has conflict with Hadoop-aws?


    _____________________________
From: lisak <li...@gmail.com>
Sent: Monday, December 14, 2015 12:03 PM
Subject: Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed
To:  <us...@zeppelin.incubator.apache.org>


                   I guess that :    
    
 %deps   
 z.reset()   
 z.load("org.apache.hadoop:hadoop-aws:2.6.0")   
    
 isn't the same as doing :   
    
 SPARK_SUBMIT_OPTIONS="--packages org.apache.hadoop:hadoop-aws:2.6.0"    
    
 because the latter leads to weird dependency conflicts    
    
    
    
    
    
 --   
 View this message in context:    http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1784.html   
 Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.   
       


  

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by lisak <li...@gmail.com>.
I guess that : 

%deps
z.reset()
z.load("org.apache.hadoop:hadoop-aws:2.6.0")

isn't the same as doing :

SPARK_SUBMIT_OPTIONS="--packages org.apache.hadoop:hadoop-aws:2.6.0" 

because the latter leads to weird dependency conflicts 





--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1784.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by lisak <li...@gmail.com>.
Hey,

I'm still struggling with %deps so I tried : 

SPARK_SUBMIT_OPTIONS="--repositories
https://oss.sonatype.org/content/repositories/snapshots --packages
com.example:spark-extensions_2.10:0.09-SNAPSHOT,org.apache.hadoop:hadoop-aws:2.6.0"

and when I first run a notebook I can see log entries about downloading the
deps and all its transitive dependencies BUT in the mean time I get this
error in notebook : 

java.net.ConnectException: Connection refused 
at java.net.PlainSocketImpl.socketConnect(Native Method) 
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) 
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) 
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) 
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 
at java.net.Socket.connect(Socket.java:589) 
at org.apache.thrift.transport.TSocket.open(TSocket.java:182) 
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) 
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) 
at
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) 
at
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) 
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) 
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) 
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139) 
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:139) 
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:266) 
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) 
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:199) 
at org.apache.zeppelin.scheduler.Job.run(Job.java:170) 
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:320) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)

and it never recovers from it, if I run it again I get : 

org.apache.thrift.TApplicationException: Internal error processing
getFormType 
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:111) 
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) 
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getFormType(RemoteInterpreterService.java:288) 
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getFormType(RemoteInterpreterService.java:275) 
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:281) 
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) 
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:199) 
at org.apache.zeppelin.scheduler.Job.run(Job.java:170) 
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:320) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)





--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Cannot-use-arbitrary-external-jar-files-in-spark-submit-through-Zeppelin-should-be-fixed-tp1277p1783.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by moon soo Lee <mo...@apache.org>.
If you're using 0.5.0-incubating, you can check "2 Loading Spark
Properties" in
http://zeppelin.incubator.apache.org/docs/interpreter/spark.html page.

Best,
moon

On Sat, Nov 14, 2015 at 4:37 PM Girish Reddy <gi...@springml.com> wrote:

> Does the spark.files option only work in 0.6?  Any workaround for earlier
> versions to load external jars?
>
> On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Sourav,
>>
>> There're couple of ways to add external jar when Zeppelin
>> (0.6.0-SNAPSHOT) uses spark-submit command.
>>
>> 1. Using %dep interpreter, as Vinay mentioned.
>>     eg)
>>        %dep
>>         z.load("group:artifact:version")
>>
>>         %spark
>>         import ....
>>
>> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
>>      eg)
>>         spark.files  /path/to/my.jar
>>
>> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
>> ZEPPELIN_HOME/conf/zeppelin-env.sh
>>      eg)
>>         export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
>>      note) does not work for pyspark yet.
>> https://issues.apache.org/jira/browse/ZEPPELIN-339
>>
>> Hope this helps.
>>
>> Best,
>> moon
>>
>> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
>> wrote:
>>
>>> Saurav,
>>>
>>>
>>> Agree this would be a useful feature. Right now Zeppelin can import
>>> dependencies from Maven with %dep interpreter. But this does not understand
>>> spark-packages.
>>>
>>>
>>> -Vinay
>>>
>>>
>>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>>> Looks like right now there is no way one can pass additional jar files
>>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>>> spark_submit option (by not specifying spark_home).
>>>>
>>>> When I checked the code interpreter.sh I found that for the class path
>>>> it only passes the zeppelin-spark*.jar available in
>>>> zeppelin_home/interpreter/spark directory.
>>>>
>>>> I suggest to put this as a bug/enhancement. The solution should be
>>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>>> same and could make it work with some external_lib folder under
>>>> zeppelin_home/interpreter/spark directory).
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>
>>>

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by Girish Reddy <gi...@springml.com>.
Does the spark.files option only work in 0.6?  Any workaround for earlier
versions to load external jars?

On Tue, Oct 20, 2015 at 6:52 PM, moon soo Lee <mo...@apache.org> wrote:

> Sourav,
>
> There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
> uses spark-submit command.
>
> 1. Using %dep interpreter, as Vinay mentioned.
>     eg)
>        %dep
>         z.load("group:artifact:version")
>
>         %spark
>         import ....
>
> 2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
>      eg)
>         spark.files  /path/to/my.jar
>
> 3. By exporting SPARK_SUBMIT_OPTIONS env variable in
> ZEPPELIN_HOME/conf/zeppelin-env.sh
>      eg)
>         export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
>      note) does not work for pyspark yet.
> https://issues.apache.org/jira/browse/ZEPPELIN-339
>
> Hope this helps.
>
> Best,
> moon
>
> On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com>
> wrote:
>
>> Saurav,
>>
>>
>> Agree this would be a useful feature. Right now Zeppelin can import
>> dependencies from Maven with %dep interpreter. But this does not understand
>> spark-packages.
>>
>>
>> -Vinay
>>
>>
>> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Looks like right now there is no way one can pass additional jar files
>>> to spark_submit from Zeppelin.The same works fine if I am not using
>>> spark_submit option (by not specifying spark_home).
>>>
>>> When I checked the code interpreter.sh I found that for the class path
>>> it only passes the zeppelin-spark*.jar available in
>>> zeppelin_home/interpreter/spark directory.
>>>
>>> I suggest to put this as a bug/enhancement. The solution should be
>>> pretty easy by making some small changes in interpreter.sh (I've done the
>>> same and could make it work with some external_lib folder under
>>> zeppelin_home/interpreter/spark directory).
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by moon soo Lee <mo...@apache.org>.
Sourav,

There're couple of ways to add external jar when Zeppelin (0.6.0-SNAPSHOT)
uses spark-submit command.

1. Using %dep interpreter, as Vinay mentioned.
    eg)
       %dep
        z.load("group:artifact:version")

        %spark
        import ....

2. By adding spark.files property at SPARK_HOME/conf/spark-defaults.conf
     eg)
        spark.files  /path/to/my.jar

3. By exporting SPARK_SUBMIT_OPTIONS env variable in
ZEPPELIN_HOME/conf/zeppelin-env.sh
     eg)
        export SPARK_SUBMIT_OPTIONS="--packages group:artifact:version"
     note) does not work for pyspark yet.
https://issues.apache.org/jira/browse/ZEPPELIN-339

Hope this helps.

Best,
moon

On Wed, Oct 21, 2015 at 12:55 AM Vinay Shukla <vi...@gmail.com> wrote:

> Saurav,
>
>
> Agree this would be a useful feature. Right now Zeppelin can import
> dependencies from Maven with %dep interpreter. But this does not understand
> spark-packages.
>
>
> -Vinay
>
>
> On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> Looks like right now there is no way one can pass additional jar files to
>> spark_submit from Zeppelin.The same works fine if I am not using
>> spark_submit option (by not specifying spark_home).
>>
>> When I checked the code interpreter.sh I found that for the class path it
>> only passes the zeppelin-spark*.jar available in
>> zeppelin_home/interpreter/spark directory.
>>
>> I suggest to put this as a bug/enhancement. The solution should be pretty
>> easy by making some small changes in interpreter.sh (I've done the same and
>> could make it work with some external_lib folder under
>> zeppelin_home/interpreter/spark directory).
>>
>> Regards,
>> Sourav
>>
>
>

Re: Cannot use arbitrary external jar files in spark_submit through Zeppelin - should be fixed

Posted by Vinay Shukla <vi...@gmail.com>.
Saurav,


Agree this would be a useful feature. Right now Zeppelin can import
dependencies from Maven with %dep interpreter. But this does not understand
spark-packages.


-Vinay


On Tue, Oct 20, 2015 at 6:54 AM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Looks like right now there is no way one can pass additional jar files to
> spark_submit from Zeppelin.The same works fine if I am not using
> spark_submit option (by not specifying spark_home).
>
> When I checked the code interpreter.sh I found that for the class path it
> only passes the zeppelin-spark*.jar available in
> zeppelin_home/interpreter/spark directory.
>
> I suggest to put this as a bug/enhancement. The solution should be pretty
> easy by making some small changes in interpreter.sh (I've done the same and
> could make it work with some external_lib folder under
> zeppelin_home/interpreter/spark directory).
>
> Regards,
> Sourav
>