You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Akshat Aranya <aa...@gmail.com> on 2014/12/18 17:36:39 UTC

Standalone Spark program

Hi,

I am building a Spark-based service which requires initialization of a
SparkContext in a main():

def main(args: Array[String]) {
    val conf = new SparkConf(false)
      .setMaster("spark://foo.example.com:7077")
      .setAppName("foobar")

    val sc = new SparkContext(conf)
    val rdd = sc.parallelize(0 until 255)
    val res =  rdd.mapPartitions(it => it).take(1)
    println(s"res=$res")
    sc.stop()
}

This code works fine via REPL, but not as a standalone program; it causes a
ClassNotFoundException.  This has me confused about how code is shipped out
to executors.  When using via REPL, does the mapPartitions closure, it=>it,
get sent out when the REPL statement is executed?  When this code is run as
a standalone program (not via spark-submit), is the compiled code expected
to be present at the the executor?

Thanks,
Akshat

Re: Standalone Spark program

Posted by Andrew Or <an...@databricks.com>.
Hey Akshat,

What is the class that is not found, is it a Spark class or classes that
you define in your own application? If the latter, then Akhil's solution
should work (alternatively you can also pass the jar through the --jars
command line option in spark-submit).

If it's a Spark class, however, it's likely that the Spark assembly jar is
not present on the worker nodes. When you build Spark on the cluster, you
will need to rsync it to the same path on all the nodes in your cluster.
For more information, see
http://spark.apache.org/docs/latest/spark-standalone.html.

-Andrew

2014-12-18 10:29 GMT-08:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
> You can build a jar of your project and add it to the sparkContext
> (sc.addJar("/path/to/your/project.jar")) then it will get shipped to the
> worker and hence no classNotfoundException!
>
> Thanks
> Best Regards
>
> On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya <aa...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am building a Spark-based service which requires initialization of a
>> SparkContext in a main():
>>
>> def main(args: Array[String]) {
>>     val conf = new SparkConf(false)
>>       .setMaster("spark://foo.example.com:7077")
>>       .setAppName("foobar")
>>
>>     val sc = new SparkContext(conf)
>>     val rdd = sc.parallelize(0 until 255)
>>     val res =  rdd.mapPartitions(it => it).take(1)
>>     println(s"res=$res")
>>     sc.stop()
>> }
>>
>> This code works fine via REPL, but not as a standalone program; it causes
>> a ClassNotFoundException.  This has me confused about how code is shipped
>> out to executors.  When using via REPL, does the mapPartitions closure,
>> it=>it, get sent out when the REPL statement is executed?  When this code
>> is run as a standalone program (not via spark-submit), is the compiled code
>> expected to be present at the the executor?
>>
>> Thanks,
>> Akshat
>>
>>

Re: Standalone Spark program

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can build a jar of your project and add it to the sparkContext
(sc.addJar("/path/to/your/project.jar")) then it will get shipped to the
worker and hence no classNotfoundException!

Thanks
Best Regards

On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya <aa...@gmail.com> wrote:
>
> Hi,
>
> I am building a Spark-based service which requires initialization of a
> SparkContext in a main():
>
> def main(args: Array[String]) {
>     val conf = new SparkConf(false)
>       .setMaster("spark://foo.example.com:7077")
>       .setAppName("foobar")
>
>     val sc = new SparkContext(conf)
>     val rdd = sc.parallelize(0 until 255)
>     val res =  rdd.mapPartitions(it => it).take(1)
>     println(s"res=$res")
>     sc.stop()
> }
>
> This code works fine via REPL, but not as a standalone program; it causes
> a ClassNotFoundException.  This has me confused about how code is shipped
> out to executors.  When using via REPL, does the mapPartitions closure,
> it=>it, get sent out when the REPL statement is executed?  When this code
> is run as a standalone program (not via spark-submit), is the compiled code
> expected to be present at the the executor?
>
> Thanks,
> Akshat
>
>