You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Arijit <ar...@live.com> on 2015/10/08 01:47:14 UTC

Understanding code/closure shipment to Spark workers‏

 Hi,
 
I want to understand the code flow starting from the Spark jar that I submit through spark-submit, how does Spark identify and extract the closures, clean and serialize them and ship them to workers to execute as tasks. Can someone point me to any documentation or a pointer to the source code path to help me understand this.
 
Thanks, Arijit 		 	   		  

Re: Understanding code/closure shipment to Spark workers‏

Posted by Lars Francke <la...@gmail.com>.
Hi Arijit,

my understanding is the following:

RDD actions will at some point call the runJob method of a SparkContext
That runJob method calls the clean method which in turn calls
ClosureCleaner.clean which removes unneeded stuff from closures and also
checks whether they are serializable.
The next step is to submit it to the DAGScheduler which in turn has a
closureSerializer which is just an instance of a
normal org.apache.spark.serializer.Serializer:

val closureSerializer =
instantiateClassFromConf[Serializer]("spark.closure.serializer",
"org.apache.spark.serializer.JavaSerializer")

This serializes the closure to a byte array which can then be shipped.

I might be wrong here but I hope it helps :)

Cheers,
Lars


On Wed, Oct 14, 2015 at 11:23 PM, Arijit <ar...@live.com> wrote:

> Hi Xiao,
>
> Thank you very much for the pointers. I looked into the part of the
> code. I now understand how the main method is invoked. Still not clear
> how is the code distributed to the executors. Is it the whole jar or some
> serialized object. I was expecting to see the part of the code where the
> closures are serialized and shipped. Maybe I am missing something.
>
> Thanks again, Arijit
>
> ------------------------------
> Date: Thu, 8 Oct 2015 10:26:55 -0700
> Subject: Re: Understanding code/closure shipment to Spark workers‏
> From: gatorsmile@gmail.com
> To: arijitt@live.com
> CC: dev@spark.apache.org
>
>
> Hi, Arijit,
>
> The code flow of spark-submit is simple.
>
> Enter the main function of SparkSubmit.scala
>     --> case SparkSubmitAction.SUBMIT => submit(appArgs)
>     --> doRunMain() in function submit() in the same file
>     --> runMain(childArgs,...) in the same file
>     --> mainMethod.invoke(null, childArgs.toArray)  in the same file
>
> Function Invoke() is provided by JAVA Reflection for invoking the main
> function of your JAR.
>
> Hopefully, it can help you understand the problem.
>
> Thanks,
>
> Xiao Li
>
>
> 2015-10-07 16:47 GMT-07:00 Arijit <ar...@live.com>:
>
>  Hi,
>
> I want to understand the code flow starting from the Spark jar that I
> submit through spark-submit, how does Spark identify and extract the
> closures, clean and serialize them and ship them to workers to execute as
> tasks. Can someone point me to any documentation or a pointer to the source
> code path to help me understand this.
>
> Thanks, Arijit
>
>
>
>
>

RE: Understanding code/closure shipment to Spark workers‏

Posted by Arijit <ar...@live.com>.
Hi Xiao,
 
Thank you very much for the pointers. I looked into the part of the code. I now understand how the main method is invoked. Still not clear how is the code distributed to the executors. Is it the whole jar or some serialized object. I was expecting to see the part of the code where the closures are serialized and shipped. Maybe I am missing something.
 
Thanks again, Arijit
 
Date: Thu, 8 Oct 2015 10:26:55 -0700
Subject: Re: Understanding code/closure shipment to Spark workers‏
From: gatorsmile@gmail.com
To: arijitt@live.com
CC: dev@spark.apache.org

Hi, Arijit, 
The code flow of spark-submit is simple. 
Enter the main function of SparkSubmit.scala     --> case SparkSubmitAction.SUBMIT => submit(appArgs)    --> doRunMain() in function submit() in the same file 
    --> runMain(childArgs,...) in the same file    --> mainMethod.invoke(null, childArgs.toArray)  in the same file 
Function Invoke() is provided by JAVA Reflection for invoking the main function of your JAR. 
Hopefully, it can help you understand the problem. 
Thanks, 
Xiao Li

2015-10-07 16:47 GMT-07:00 Arijit <ar...@live.com>:



 Hi,
 
I want to understand the code flow starting from the Spark jar that I submit through spark-submit, how does Spark identify and extract the closures, clean and serialize them and ship them to workers to execute as tasks. Can someone point me to any documentation or a pointer to the source code path to help me understand this.
 
Thanks, Arijit 		 	   		  



 		 	   		  

Re: Understanding code/closure shipment to Spark workers‏

Posted by Xiao Li <ga...@gmail.com>.
Hi, Arijit,

The code flow of spark-submit is simple.

Enter the main function of SparkSubmit.scala
    --> case SparkSubmitAction.SUBMIT => submit(appArgs)
    --> doRunMain() in function submit() in the same file
    --> runMain(childArgs,...) in the same file
    --> mainMethod.invoke(null, childArgs.toArray)  in the same file

Function Invoke() is provided by JAVA Reflection for invoking the main
function of your JAR.

Hopefully, it can help you understand the problem.

Thanks,

Xiao Li


2015-10-07 16:47 GMT-07:00 Arijit <ar...@live.com>:

>  Hi,
>
> I want to understand the code flow starting from the Spark jar that I
> submit through spark-submit, how does Spark identify and extract the
> closures, clean and serialize them and ship them to workers to execute as
> tasks. Can someone point me to any documentation or a pointer to the source
> code path to help me understand this.
>
> Thanks, Arijit
>