You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by canan chen <cc...@gmail.com> on 2015/10/14 04:50:47 UTC

When does python program started in pyspark

I look at the source code of spark, but didn't find where python program is
started in python.

It seems spark-submit will call PythonGatewayServer, but where is python
program started ?

Thanks

Re: When does python program started in pyspark

Posted by canan chen <cc...@gmail.com>.
I think PythonRunner is launched when executing python script.
PythonGatewayServer is entry point for python spark shell


if (args.isPython && deployMode == CLIENT) {
  if (args.primaryResource == PYSPARK_SHELL) {
    args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"
  } else {
    // If a python file is provided, add it to the child arguments and
list of files to deploy.
    // Usage: PythonAppRunner <main python file> <extra python files>
[app arguments]
    args.mainClass = "org.apache.spark.deploy.PythonRunner"
    args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles)
++ args.childArgs
    if (clusterManager != YARN) {
      // The YARN backend distributes the primary file differently, so
don't merge it.
      args.files = mergeFileLists(args.files, args.primaryResource)
    }
  }


On Wed, Oct 14, 2015 at 12:46 PM, skaarthik oss <sk...@gmail.com>
wrote:

> See PythonRunner @
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
>
> On Tue, Oct 13, 2015 at 7:50 PM, canan chen <cc...@gmail.com> wrote:
>
>> I look at the source code of spark, but didn't find where python program
>> is started in python.
>>
>> It seems spark-submit will call PythonGatewayServer, but where is python
>> program started ?
>>
>> Thanks
>>
>
>

Re: When does python program started in pyspark

Posted by canan chen <cc...@gmail.com>.
I think PythonRunner is launched when executing python script.
PythonGatewayServer is entry point for python spark shell


if (args.isPython && deployMode == CLIENT) {
  if (args.primaryResource == PYSPARK_SHELL) {
    args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"
  } else {
    // If a python file is provided, add it to the child arguments and
list of files to deploy.
    // Usage: PythonAppRunner <main python file> <extra python files>
[app arguments]
    args.mainClass = "org.apache.spark.deploy.PythonRunner"
    args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles)
++ args.childArgs
    if (clusterManager != YARN) {
      // The YARN backend distributes the primary file differently, so
don't merge it.
      args.files = mergeFileLists(args.files, args.primaryResource)
    }
  }


On Wed, Oct 14, 2015 at 12:46 PM, skaarthik oss <sk...@gmail.com>
wrote:

> See PythonRunner @
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
>
> On Tue, Oct 13, 2015 at 7:50 PM, canan chen <cc...@gmail.com> wrote:
>
>> I look at the source code of spark, but didn't find where python program
>> is started in python.
>>
>> It seems spark-submit will call PythonGatewayServer, but where is python
>> program started ?
>>
>> Thanks
>>
>
>

Re: When does python program started in pyspark

Posted by skaarthik oss <sk...@gmail.com>.
See PythonRunner @
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala

On Tue, Oct 13, 2015 at 7:50 PM, canan chen <cc...@gmail.com> wrote:

> I look at the source code of spark, but didn't find where python program
> is started in python.
>
> It seems spark-submit will call PythonGatewayServer, but where is python
> program started ?
>
> Thanks
>