You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Roger Hoover <ro...@gmail.com> on 2014/06/05 00:03:21 UTC

Re: Running a spark-submit compatible app in spark-shell

It took me a little while to get back to this but it works now!!

I'm invoking the shell like this:

spark-shell --jars target/scala-2.10/spark-etl_2.10-1.0.jar

Once inside, I can invoke a method in my package to run the job.

> val reseult = etl.IP2IncomeJob.job(sc)


On Tue, May 27, 2014 at 8:42 AM, Roger Hoover <ro...@gmail.com>
wrote:

> Thanks, Andrew.  I'll give it a try.
>
>
> On Mon, May 26, 2014 at 2:22 PM, Andrew Or <an...@databricks.com> wrote:
>
>> Hi Roger,
>>
>> This was due to a bug in the Spark shell code, and is fixed in the latest
>> master (and RC11). Here is the commit that fixed it:
>> https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205.
>> Try it now and it should work. :)
>>
>> Andrew
>>
>>
>> 2014-05-26 10:35 GMT+02:00 Perttu Ranta-aho <ra...@iki.fi>:
>>
>> Hi Roger,
>>>
>>> Were you able to solve this?
>>>
>>> -Perttu
>>>
>>>
>>> On Tue, Apr 29, 2014 at 8:11 AM, Roger Hoover <ro...@gmail.com>
>>> wrote:
>>>
>>>> Patrick,
>>>>
>>>> Thank you for replying.  That didn't seem to work either.  I see the
>>>> option parsed using verbose mode.
>>>>
>>>> Parsed arguments:
>>>>  ...
>>>>   driverExtraClassPath
>>>>  /Users/rhoover/Work/spark-etl/target/scala-2.10/spark-etl_2.10-1.0.jar
>>>>
>>>> But the jar still doesn't show up if I run ":cp" in the repl and the
>>>> import still fails.
>>>>
>>>> scala> import etl._
>>>> <console>:7: error: not found: value etl
>>>>        import etl._
>>>>
>>>> Not sure if this helps, but I noticed with Spark 0.9.1 that the import
>>>> only seems to work went I add the -usejavacp option to the spark-shell
>>>> command.  I don't really understand why.
>>>>
>>>> With the latest code, I tried adding these options to the spark-shell
>>>> command without success: -usejavacp -Dscala.usejavacp=true
>>>>
>>>>
>>>> On Mon, Apr 28, 2014 at 6:30 PM, Patrick Wendell <pw...@gmail.com>
>>>> wrote:
>>>>
>>>>> What about if you run ./bin/spark-shell
>>>>> --driver-class-path=/path/to/your/jar.jar
>>>>>
>>>>> I think either this or the --jars flag should work, but it's possible
>>>>> there is a bug with the --jars flag when calling the Repl.
>>>>>
>>>>>
>>>>> On Mon, Apr 28, 2014 at 4:30 PM, Roger Hoover <ro...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> A couple of issues:
>>>>>> 1) the jar doesn't show up on the classpath even though SparkSubmit
>>>>>> had it in the --jars options.  I tested this by running > :cp in spark-shell
>>>>>> 2) After adding it the classpath using (:cp
>>>>>> /Users/rhoover/Work/spark-etl/target/scala-2.10/spark-etl_2.10-1.0.jar), it
>>>>>> still fails.  When I do that in the scala repl, it works.
>>>>>>
>>>>>> BTW, I'm using the latest code from the master branch
>>>>>> (8421034e793c0960373a0a1d694ce334ad36e747)
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 28, 2014 at 3:40 PM, Roger Hoover <roger.hoover@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Matei,  thank you.  That seemed to work but I'm not able to import a
>>>>>>> class from my jar.
>>>>>>>
>>>>>>> Using the verbose options, I can see that my jar should be included
>>>>>>>
>>>>>>> Parsed arguments:
>>>>>>> ...
>>>>>>>   jars
>>>>>>>  /Users/rhoover/Work/spark-etl/target/scala-2.10/spark-etl_2.10-1.0.jar
>>>>>>>
>>>>>>> And I see the class I want to load in the jar:
>>>>>>>
>>>>>>> jar -tf
>>>>>>> /Users/rhoover/Work/spark-etl/target/scala-2.10/spark-etl_2.10-1.0.jar |
>>>>>>> grep IP2IncomeJob
>>>>>>> etl/IP2IncomeJob$$anonfun$1.class
>>>>>>> etl/IP2IncomeJob$$anonfun$4.class
>>>>>>> etl/IP2IncomeJob$.class
>>>>>>> etl/IP2IncomeJob$$anonfun$splitOverlappingRange$1.class
>>>>>>> etl/IP2IncomeJob.class
>>>>>>> etl/IP2IncomeJob$$anonfun$3.class
>>>>>>> etl/IP2IncomeJob$$anonfun$2.class
>>>>>>>
>>>>>>> But the import fails
>>>>>>>
>>>>>>> scala> import etl.IP2IncomeJob
>>>>>>> <console>:10: error: not found: value etl
>>>>>>>        import etl.IP2IncomeJob
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Apr 27, 2014 at 3:46 PM, Matei Zaharia <
>>>>>>> matei.zaharia@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Roger,
>>>>>>>>
>>>>>>>> You should be able to use the --jars argument of spark-shell to add
>>>>>>>> JARs onto the classpath and then work with those classes in the shell. (A
>>>>>>>> recent patch, https://github.com/apache/spark/pull/542, made
>>>>>>>> spark-shell use the same command-line arguments as spark-submit). But this
>>>>>>>> is a great question, we should test it out and see whether anything else
>>>>>>>> would make development easier.
>>>>>>>>
>>>>>>>> SBT also has an interactive shell where you can run classes in your
>>>>>>>> project, but unfortunately Spark can’t deal with closures typed directly in
>>>>>>>> that the right way. However you write your Spark logic in a method and just
>>>>>>>> call that method from the SBT shell, that should work.
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Apr 27, 2014, at 3:14 PM, Roger Hoover <ro...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > From the meetup talk about the 1.0 release, I saw that
>>>>>>>> spark-submit will be the preferred way to launch apps going forward.
>>>>>>>> >
>>>>>>>> > How do you recommend launching such jobs in a development cycle?
>>>>>>>>  For example, how can I load an app that's expecting to a given to
>>>>>>>> spark-submit into spark-shell?
>>>>>>>> >
>>>>>>>> > Also, can anyone recommend other tricks for rapid development?
>>>>>>>>  I'm new to Scala, sbt, etc.  I think sbt can watch for changes in source
>>>>>>>> files and compile them automatically.
>>>>>>>> >
>>>>>>>> > I want to be able to make code changes and quickly get into a
>>>>>>>> spark-shell to play around with them.
>>>>>>>> >
>>>>>>>> > I appreciate any advice.  Thanks,
>>>>>>>> >
>>>>>>>> > Roger
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>