You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tao Xiao <xi...@gmail.com> on 2014/02/20 16:13:15 UTC

How to submit a job to Spark cluster?

My application source file,  *SimpleDistributedApp.scala*, is as  follows:

__________________________________________________________________
import org.apache.spark.{SparkConf, SparkContext}

object SimpleDistributedApp {
    def main(args: Array[String]) = {
        val filepath = "hdfs://
hadoop-1.certus.com:54310/user/root/samples/data"

        val conf = new SparkConf()
                    .setMaster("spark://hadoop-1.certus.com:7077")
                    .setAppName("**SimpleDistributedApp**")

.setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")

.setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
                    .set("spark.executor.memory", "1g")

        val sc = new SparkContext(conf)
        val text = sc.textFile(filepath, 3)

        val numOfHello = text.filter(line => line.contains("hello")).count()

        println("number of lines containing 'hello' is " + numOfHello)
        println("down")
    }
}
______________________________________________________________________



The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
_________________________________________________________________

name := "Simple Distributed App"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies += "org.apache.spark" %% "spark-core" %
"0.9.0-incubating"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
_________________________________________________________________


I built the application into
*$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*, using
the command
        SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package

I ran it using the command "sbt/sbt run" and it finished running
successfully.

But I'm not sure what's the correct and general way to submit and run a job
in Spark cluster. To be specific,after having built a job into a JAR file,
say *simpleApp.jar*, where should I put it and how should I submit it to
Spark cluster?

Re: How to submit a job to Spark cluster?

Posted by Tao Xiao <xi...@gmail.com>.

Nan & Mayur,

Thanks, I got it .


Best,



2014-02-21 9:24 GMT+08:00 Mayur Rustagi <ma...@gmail.com>:

> You need a driver to manage execution of jar,
> you can use Spark shell to launch the jar and itll manage the execution
> for you, you can start the spark shell add your jar in the classpath, call
> your function with sc as spark context.
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 5:10 PM, Nan Zhu <zh...@gmail.com> wrote:
>
>>  I think it is a confusing place of current web UI, even your standalone
>> app is finished without any error, the status is still KILLED
>>
>> in spark, in most cases, you don't need to rely on script to submit jobs,
>> you only need to specify the master address when construct a SparkContext
>> object,
>>
>> but if you want to submit a in-cluster driver, you will need
>> bin/spark-class,
>> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:
>>
>> In a Hadoop cluster, the following command is the general way to submit a
>> job:
>>        bin/hadoop jar <job-jar> <arguments>
>>
>>
>> Is there such a general way to submit a job into Spark cluster?
>>
>> Besides, my job finished successfully, and the Spark Web UI shows that
>> this application's state is *FINISHED*, but each executor's state is
>> *KILLED*. I could see this application has produced the expected result,
>> why is each executor's state reported as *KILLED* ?
>>
>> Completed Applications IDNameCoresMemory per NodeSubmitted Time UserState
>> Duration
>>
>>
>>
>>
>>
>>
>>
>> app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
>> **SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0 MB 2014/02/20
>> 17:39:57rootFINISHED 13 s
>>
>>
>>
>>
>>
>>
>>
>>
>> Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
>> worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
>> stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
>> 1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
>> stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
>> 0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
>> stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>
>>
>>
>> Thanks
>> Tao
>>
>>
>> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <ma...@gmail.com>:
>>
>> You are specifying the spark master in the jar
>>  .setMaster("spark://hadoop-1.certus.com:7077")
>> so sbt run is deploying the jar into the master cluster and running it.
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <zh...@gmail.com> wrote:
>>
>>  I'm not sure if I understand your question correctly
>>
>> do you mean you didn't see the application information in Spark Web UI
>> even it generates the expected results?
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>>
>> My application source file,  *SimpleDistributedApp.scala*, is as
>>  follows:
>>
>> __________________________________________________________________
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> object SimpleDistributedApp {
>>     def main(args: Array[String]) = {
>>         val filepath = "hdfs://
>> hadoop-1.certus.com:54310/user/root/samples/data"
>>
>>         val conf = new SparkConf()
>>                     .setMaster("spark://hadoop-1.certus.com:7077")
>>                     .setAppName("**SimpleDistributedApp**")
>>
>> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>>
>> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>>                     .set("spark.executor.memory", "1g")
>>
>>         val sc = new SparkContext(conf)
>>         val text = sc.textFile(filepath, 3)
>>
>>         val numOfHello = text.filter(line =>
>> line.contains("hello")).count()
>>
>>         println("number of lines containing 'hello' is " + numOfHello)
>>         println("down")
>>     }
>> }
>> ______________________________________________________________________
>>
>>
>>
>> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
>> _________________________________________________________________
>>
>> name := "Simple Distributed App"
>>
>> version := "1.0"
>>
>> scalaVersion := "2.10.3"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" %
>> "0.9.0-incubating"
>>
>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>> _________________________________________________________________
>>
>>
>> I built the application into
>> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
>> using the command
>>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>>
>> I ran it using the command "sbt/sbt run" and it finished running
>> successfully.
>>
>> But I'm not sure what's the correct and general way to submit and run a
>> job in Spark cluster. To be specific,after having built a job into a JAR
>> file, say *simpleApp.jar*, where should I put it and how should I submit
>> it to Spark cluster?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: How to submit a job to Spark cluster?

Posted by Mayur Rustagi <ma...@gmail.com>.

You need a driver to manage execution of jar,
you can use Spark shell to launch the jar and itll manage the execution for
you, you can start the spark shell add your jar in the classpath, call your
function with sc as spark context.

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Thu, Feb 20, 2014 at 5:10 PM, Nan Zhu <zh...@gmail.com> wrote:

>  I think it is a confusing place of current web UI, even your standalone
> app is finished without any error, the status is still KILLED
>
> in spark, in most cases, you don’t need to rely on script to submit jobs,
> you only need to specify the master address when construct a SparkContext
> object,
>
> but if you want to submit a in-cluster driver, you will need
> bin/spark-class,
> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>
> Best,
>
> --
> Nan Zhu
>
> On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:
>
> In a Hadoop cluster, the following command is the general way to submit a
> job:
>        bin/hadoop jar <job-jar> <arguments>
>
>
> Is there such a general way to submit a job into Spark cluster?
>
> Besides, my job finished successfully, and the Spark Web UI shows that
> this application's state is *FINISHED*, but each executor's state is
> *KILLED*. I could see this application has produced the expected result,
> why is each executor's state reported as *KILLED* ?
>
> Completed Applications IDNameCoresMemory per NodeSubmitted Time UserState
> Duration
>
>
>
>
>
>
>
> app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
> **SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0 MB 2014/02/20
> 17:39:57rootFINISHED 13 s
>
>
>
>
>
>
>
>
> Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
> worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
> stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
> 1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
> stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
> 0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
> stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>
>
>
> Thanks
> Tao
>
>
> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <ma...@gmail.com>:
>
> You are specifying the spark master in the jar
>  .setMaster("spark://hadoop-1.certus.com:7077")
> so sbt run is deploying the jar into the master cluster and running it.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <zh...@gmail.com> wrote:
>
>  I’m not sure if I understand your question correctly
>
> do you mean you didn’t see the application information in Spark Web UI
> even it generates the expected results?
>
> Best,
>
> --
> Nan Zhu
>
> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>
> My application source file,  *SimpleDistributedApp.scala*, is as  follows:
>
> __________________________________________________________________
> import org.apache.spark.{SparkConf, SparkContext}
>
> object SimpleDistributedApp {
>     def main(args: Array[String]) = {
>         val filepath = "hdfs://
> hadoop-1.certus.com:54310/user/root/samples/data"
>
>         val conf = new SparkConf()
>                     .setMaster("spark://hadoop-1.certus.com:7077")
>                     .setAppName("**SimpleDistributedApp**")
>
> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>
> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>                     .set("spark.executor.memory", "1g")
>
>         val sc = new SparkContext(conf)
>         val text = sc.textFile(filepath, 3)
>
>         val numOfHello = text.filter(line =>
> line.contains("hello")).count()
>
>         println("number of lines containing 'hello' is " + numOfHello)
>         println("down")
>     }
> }
> ______________________________________________________________________
>
>
>
> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
> _________________________________________________________________
>
> name := "Simple Distributed App"
>
> version := "1.0"
>
> scalaVersion := "2.10.3"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" %
> "0.9.0-incubating"
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
> _________________________________________________________________
>
>
> I built the application into
> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
> using the command
>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>
> I ran it using the command "sbt/sbt run" and it finished running
> successfully.
>
> But I'm not sure what's the correct and general way to submit and run a
> job in Spark cluster. To be specific,after having built a job into a JAR
> file, say *simpleApp.jar*, where should I put it and how should I submit
> it to Spark cluster?
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: How to submit a job to Spark cluster?

Posted by Nan Zhu <zh...@gmail.com>.

I think it is a confusing place of current web UI, even your standalone app is finished without any error, the status is still KILLED

in spark, in most cases, you don’t need to rely on script to submit jobs, you only need to specify the master address when construct a SparkContext object,  

but if you want to submit a in-cluster driver, you will need bin/spark-class, http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best,  

--  
Nan Zhu


On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:

> In a Hadoop cluster, the following command is the general way to submit a job:
>        bin/hadoop jar <job-jar> <arguments>
>  
>  
> Is there such a general way to submit a job into Spark cluster?   
>  
> Besides, my job finished successfully, and the Spark Web UI shows that this application's state is FINISHED, but each executor's state is KILLED. I could see this application has produced the expected result, why is each executor's state reported as KILLED ?   
>  
> Completed Applications  
> ID
> Name
> Cores
> Memory per Node
> Submitted Time
> User
> State
> Duration
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> app-20140220173957-0001 (http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001)  
> **SimpleDistributedApp** (http://hadoop-1.certus.com:4040/)  
> 12  
> 1024.0 MB  
> 2014/02/20 17:39:57
> root
> FINISHED
> 13 s
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> Executor Summary  
>  
> ExecutorID
> Worker
> Cores
> Memory
> State
> Logs
>  
>  
> 2
> worker-20140220162542-hadoop-2.certus.com-49805 (http://hadoop-2.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout (http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout) stderr (http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr)  
>  
> 1
> worker-20140220162542-hadoop-4.certus.com-40528 (http://hadoop-4.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout (http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout) stderr (http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr)  
>  
> 0
> worker-20140220162542-hadoop-3.certus.com-47386 (http://hadoop-3.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout (http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout) stderr (http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr)
>  
>  
>  
>  
>  
>  
> Thanks
> Tao
>  
>  
>  
> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <mayur.rustagi@gmail.com (mailto:mayur.rustagi@gmail.com)>:
> > You are specifying the spark master in the jar  
> >  .setMaster("spark://hadoop-1.certus.com:7077 (http://hadoop-1.certus.com:7077/)")
> > so sbt run is deploying the jar into the master cluster and running it.  
> > Regards
> > Mayur
> >  
> >  
> > Mayur Rustagi
> > Ph: +919632149971
> > h (https://twitter.com/mayur_rustagi)ttp://www.sigmoidanalytics.com (http://www.sigmoidanalytics.com)
> > https://twitter.com/mayur_rustagi
> >  
> >  
> >  
> > On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > > I’m not sure if I understand your question correctly  
> > >  
> > > do you mean you didn’t see the application information in Spark Web UI even it generates the expected results?
> > >  
> > > Best,  
> > >  
> > > --  
> > > Nan Zhu
> > >  
> > >  
> > > On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
> > >  
> > > > My application source file,  SimpleDistributedApp.scala, is as  follows:
> > > >  
> > > > __________________________________________________________________  
> > > > import org.apache.spark.{SparkConf, SparkContext}
> > > >  
> > > > object SimpleDistributedApp {
> > > >     def main(args: Array[String]) = {
> > > >         val filepath = "hdfs://hadoop-1.certus.com:54310/user/root/samples/data (http://hadoop-1.certus.com:54310/user/root/samples/data)"
> > > >  
> > > >         val conf = new SparkConf()
> > > >                     .setMaster("spark://hadoop-1.certus.com:7077 (http://hadoop-1.certus.com:7077)")
> > > >                     .setAppName("**SimpleDistributedApp**")
> > > >                     .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
> > > >                     .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
> > > >                     .set("spark.executor.memory", "1g")
> > > >  
> > > >         val sc = new SparkContext(conf)
> > > >         val text = sc.textFile(filepath, 3)
> > > >  
> > > >         val numOfHello = text.filter(line => line.contains("hello")).count()
> > > >  
> > > >         println("number of lines containing 'hello' is " + numOfHello)
> > > >         println("down")
> > > >     }
> > > > }
> > > >  
> > > > ______________________________________________________________________
> > > >  
> > > >  
> > > >  
> > > > The corresponding sbt file, $SPARK_HOME/simple.sbt,  is as follows:
> > > > _________________________________________________________________
> > > >  
> > > > name := "Simple Distributed App"  
> > > >  
> > > > version := "1.0"
> > > >  
> > > > scalaVersion := "2.10.3"  
> > > >  
> > > > libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.0-incubating"  
> > > >  
> > > > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"  
> > > > _________________________________________________________________
> > > >  
> > > >  
> > > > I built the application into $SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar, using the command   
> > > >         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
> > > >  
> > > > I ran it using the command "sbt/sbt run" and it finished running successfully.  
> > > >  
> > > > But I'm not sure what's the correct and general way to submit and run a job in Spark cluster. To be specific,after having built a job into a JAR file, say simpleApp.jar, where should I put it and how should I submit it to Spark cluster?   
> > > >   
> > > >  
> > > >  
> > > >   
> > > >  
> > > >  
> > > >  
> > > >  
> > >  
> >  
>

Re: How to submit a job to Spark cluster?

Posted by Tao Xiao <xi...@gmail.com>.

In a Hadoop cluster, the following command is the general way to submit a
job:
       bin/hadoop jar <job-jar> <arguments>


Is there such a general way to submit a job into Spark cluster?

Besides, my job finished successfully, and the Spark Web UI shows that this
application's state is *FINISHED*, but each executor's state is *KILLED*. I
could see this application has produced the expected result, why is each
executor's state reported as *KILLED* ?

Completed Applications IDNameCoresMemory per NodeSubmitted TimeUserState
Duration







app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
**SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0
MB 2014/02/20
17:39:57rootFINISHED13 s








Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>


Thanks
Tao


2014-02-21 0:00 GMT+08:00 Mayur Rustagi <ma...@gmail.com>:

> You are specifying the spark master in the jar
>  .setMaster("spark://hadoop-1.certus.com:7077")
> so sbt run is deploying the jar into the master cluster and running it.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <zh...@gmail.com> wrote:
>
>>  I'm not sure if I understand your question correctly
>>
>> do you mean you didn't see the application information in Spark Web UI
>> even it generates the expected results?
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>>
>> My application source file,  *SimpleDistributedApp.scala*, is as
>>  follows:
>>
>> __________________________________________________________________
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> object SimpleDistributedApp {
>>     def main(args: Array[String]) = {
>>         val filepath = "hdfs://
>> hadoop-1.certus.com:54310/user/root/samples/data"
>>
>>         val conf = new SparkConf()
>>                     .setMaster("spark://hadoop-1.certus.com:7077")
>>                     .setAppName("**SimpleDistributedApp**")
>>
>> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>>
>> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>>                     .set("spark.executor.memory", "1g")
>>
>>         val sc = new SparkContext(conf)
>>         val text = sc.textFile(filepath, 3)
>>
>>         val numOfHello = text.filter(line =>
>> line.contains("hello")).count()
>>
>>         println("number of lines containing 'hello' is " + numOfHello)
>>         println("down")
>>     }
>> }
>> ______________________________________________________________________
>>
>>
>>
>> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
>> _________________________________________________________________
>>
>> name := "Simple Distributed App"
>>
>> version := "1.0"
>>
>> scalaVersion := "2.10.3"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" %
>> "0.9.0-incubating"
>>
>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>> _________________________________________________________________
>>
>>
>> I built the application into
>> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
>> using the command
>>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>>
>> I ran it using the command "sbt/sbt run" and it finished running
>> successfully.
>>
>> But I'm not sure what's the correct and general way to submit and run a
>> job in Spark cluster. To be specific,after having built a job into a JAR
>> file, say *simpleApp.jar*, where should I put it and how should I submit
>> it to Spark cluster?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: How to submit a job to Spark cluster?

Posted by Mayur Rustagi <ma...@gmail.com>.

You are specifying the spark master in the jar
 .setMaster("spark://hadoop-1.certus.com:7077")
so sbt run is deploying the jar into the master cluster and running it.
Regards
Mayur

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <zh...@gmail.com> wrote:

>  I’m not sure if I understand your question correctly
>
> do you mean you didn’t see the application information in Spark Web UI
> even it generates the expected results?
>
> Best,
>
> --
> Nan Zhu
>
> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>
> My application source file,  *SimpleDistributedApp.scala*, is as  follows:
>
> __________________________________________________________________
> import org.apache.spark.{SparkConf, SparkContext}
>
> object SimpleDistributedApp {
>     def main(args: Array[String]) = {
>         val filepath = "hdfs://
> hadoop-1.certus.com:54310/user/root/samples/data"
>
>         val conf = new SparkConf()
>                     .setMaster("spark://hadoop-1.certus.com:7077")
>                     .setAppName("**SimpleDistributedApp**")
>
> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>
> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>                     .set("spark.executor.memory", "1g")
>
>         val sc = new SparkContext(conf)
>         val text = sc.textFile(filepath, 3)
>
>         val numOfHello = text.filter(line =>
> line.contains("hello")).count()
>
>         println("number of lines containing 'hello' is " + numOfHello)
>         println("down")
>     }
> }
> ______________________________________________________________________
>
>
>
> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
> _________________________________________________________________
>
> name := "Simple Distributed App"
>
> version := "1.0"
>
> scalaVersion := "2.10.3"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" %
> "0.9.0-incubating"
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
> _________________________________________________________________
>
>
> I built the application into
> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
> using the command
>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>
> I ran it using the command "sbt/sbt run" and it finished running
> successfully.
>
> But I'm not sure what's the correct and general way to submit and run a
> job in Spark cluster. To be specific,after having built a job into a JAR
> file, say *simpleApp.jar*, where should I put it and how should I submit
> it to Spark cluster?
>
>
>
>
>
>
>
>
>
>

Re: How to submit a job to Spark cluster?

Posted by Nan Zhu <zh...@gmail.com>.

I’m not sure if I understand your question correctly  

do you mean you didn’t see the application information in Spark Web UI even it generates the expected results?

Best,  

--  
Nan Zhu


On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:

> My application source file,  SimpleDistributedApp.scala, is as  follows:
>  
> __________________________________________________________________
> import org.apache.spark.{SparkConf, SparkContext}
>  
> object SimpleDistributedApp {
>     def main(args: Array[String]) = {
>         val filepath = "hdfs://hadoop-1.certus.com:54310/user/root/samples/data (http://hadoop-1.certus.com:54310/user/root/samples/data)"
>  
>         val conf = new SparkConf()
>                     .setMaster("spark://hadoop-1.certus.com:7077 (http://hadoop-1.certus.com:7077)")
>                     .setAppName("**SimpleDistributedApp**")
>                     .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>                     .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>                     .set("spark.executor.memory", "1g")
>  
>         val sc = new SparkContext(conf)
>         val text = sc.textFile(filepath, 3)
>  
>         val numOfHello = text.filter(line => line.contains("hello")).count()
>  
>         println("number of lines containing 'hello' is " + numOfHello)
>         println("down")
>     }
> }
>  
> ______________________________________________________________________
>  
>  
>  
> The corresponding sbt file, $SPARK_HOME/simple.sbt,  is as follows:
> _________________________________________________________________
>  
> name := "Simple Distributed App"  
>  
> version := "1.0"
>  
> scalaVersion := "2.10.3"  
>  
> libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.0-incubating"
>  
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"  
> _________________________________________________________________
>  
>  
> I built the application into $SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar, using the command   
>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>  
> I ran it using the command "sbt/sbt run" and it finished running successfully.  
>  
> But I'm not sure what's the correct and general way to submit and run a job in Spark cluster. To be specific,after having built a job into a JAR file, say simpleApp.jar, where should I put it and how should I submit it to Spark cluster?   
>   
>  
>  
>   
>  
>  
>  
>