You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Virgil Palanciuc <vi...@palanciuc.eu> on 2015/08/21 16:42:45 UTC

Finding the number of executors.

Is there any reliable way to find out the number of executors
programatically - regardless of how the job  is run? A method that
preferably works for spark-standalone, yarn, mesos, regardless whether the
code runs from the shell or not?

Things that I tried and don't work:
- sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
does not work if task submitted via  spark-submit
- sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't work
unless explicitly configured
- call to http://master:8080/json (this used to work, but doesn't anymore?)

I guess I could parse the output html from the Spark UI... but that seems
dumb. is there really no better way?

Thanks,
Virgil.

Re: Finding the number of executors.

Posted by Virgil Palanciuc <vi...@gmail.com>.
As I was writing a long-ish message to explain how it doesn't work, it
dawned on me that maybe driver connects to executors only after there's
some work to do (while I was trying to find the number of executors BEFORE
starting the actual work).

So the solution was to simply execute a dummy task (
sparkContext.parallelize(1 until 1000, 200).reduce(_+_) ) before attempting
to retrieve the executors. It works now :)

Virgil.

On Sat, Aug 22, 2015 at 12:44 AM, Du Li <li...@yahoo-inc.com> wrote:

> Following is a method that retrieves the list of executors registered to a
> spark context. It worked perfectly with spark-submit in standalone mode for
> my project.
>
> /**
>    * A simplified method that just returns the current active/registered
> executors
>    * excluding the driver.
>    * @param sc
>    *           The spark context to retrieve registered executors.
>    * @return
>    *         A list of executors each in the form of host:port.
>    */
>   def currentActiveExecutors(sc: SparkContext): Seq[String] = {
>     val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
>     val driverHost: String = sc.getConf.get("spark.driver.host")
>     allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
>   }
>
>
>
>
> On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <vi...@gmail.com>
> wrote:
>
>
> Hi Akhil,
>
> I'm using spark 1.4.1.
> Number of executors is not in the command line, not in the getExecutorMemoryStatus
> (I already mentioned that I tried that, works in spark-shell but not when
> executed via spark-submit). I tried looking at "defaultParallelism" too,
> it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when
> ran via spark-submit.
>
> But the scheduler obviously knows this information. It *must* know it. How
> can I access it? Other that parsing the HTML of the WebUI, that is...
> that's pretty much guaranteed to work, and maybe I'll do that, but it's
> extremely convoluted.
>
> Regards,
> Virgil.
>
> On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> Which version spark are you using? There was a discussion happened over
> here
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html
>
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3CCACBYxK+yA1RBbNKWJHEekPnbSbH10RYkuzt-LAqGpDANVHmF_Q@mail.gmail.com%3E
> On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vi...@palanciuc.eu> wrote:
>
> Is there any reliable way to find out the number of executors
> programatically - regardless of how the job  is run? A method that
> preferably works for spark-standalone, yarn, mesos, regardless whether the
> code runs from the shell or not?
>
> Things that I tried and don't work:
> - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
> does not work if task submitted via  spark-submit
> - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't
> work unless explicitly configured
> - call to http://master:8080/json (this used to work, but doesn't
> anymore?)
>
> I guess I could parse the output html from the Spark UI... but that seems
> dumb. is there really no better way?
>
> Thanks,
> Virgil.
>
>
>
>
>
>

Re: Finding the number of executors.

Posted by Du Li <li...@yahoo-inc.com.INVALID>.
Following is a method that retrieves the list of executors registered to a spark context. It worked perfectly with spark-submit in standalone mode for my project.
/**   * A simplified method that just returns the current active/registered executors   * excluding the driver.   * @param sc   *           The spark context to retrieve registered executors.   * @return   *         A list of executors each in the form of host:port.   */  def currentActiveExecutors(sc: SparkContext): Seq[String] = {    val allExecutors = sc.getExecutorMemoryStatus.map(_._1)    val driverHost: String = sc.getConf.get("spark.driver.host")    allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList  }
 


     On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <vi...@gmail.com> wrote:
   

 Hi Akhil,
I'm using spark 1.4.1. Number of executors is not in the command line, not in the getExecutorMemoryStatus (I already mentioned that I tried that, works in spark-shell but not when executed via spark-submit). I tried looking at "defaultParallelism" too, it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when ran via spark-submit.
But the scheduler obviously knows this information. It *must* know it. How can I access it? Other that parsing the HTML of the WebUI, that is... that's pretty much guaranteed to work, and maybe I'll do that, but it's extremely convoluted.
Regards,Virgil.
On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote:

Which version spark are you using? There was a discussion happened over here 
http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.htmlhttp://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3CCACBYxK+yA1RBbNKWJHEekPnbSbH10RYkuzt-LAqGpDANVHmF_Q@mail.gmail.com%3EOn Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vi...@palanciuc.eu> wrote:

Is there any reliable way to find out the number of executors programatically - regardless of how the job  is run? A method that preferably works for spark-standalone, yarn, mesos, regardless whether the code runs from the shell or not?
Things that I tried and don't work:- sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell, does not work if task submitted via  spark-submit- sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't work unless explicitly configured- call to http://master:8080/json (this used to work, but doesn't anymore?)
I guess I could parse the output html from the Spark UI... but that seems dumb. is there really no better way?
Thanks,Virgil.






  

Re: Finding the number of executors.

Posted by Virgil Palanciuc <vi...@gmail.com>.
Hi Akhil,

I'm using spark 1.4.1.
Number of executors is not in the command line, not in the
getExecutorMemoryStatus
(I already mentioned that I tried that, works in spark-shell but not when
executed via spark-submit). I tried looking at "defaultParallelism" too,
it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when
ran via spark-submit.

But the scheduler obviously knows this information. It *must* know it. How
can I access it? Other that parsing the HTML of the WebUI, that is...
that's pretty much guaranteed to work, and maybe I'll do that, but it's
extremely convoluted.

Regards,
Virgil.

On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Which version spark are you using? There was a discussion happened over
> here
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html
>
>
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3CCACBYxK+yA1RBbNKWJHEekPnbSbH10RYkuzt-LAqGpDANVHmF_Q@mail.gmail.com%3E
> On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vi...@palanciuc.eu> wrote:
>
>> Is there any reliable way to find out the number of executors
>> programatically - regardless of how the job  is run? A method that
>> preferably works for spark-standalone, yarn, mesos, regardless whether the
>> code runs from the shell or not?
>>
>> Things that I tried and don't work:
>> - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
>> does not work if task submitted via  spark-submit
>> - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't
>> work unless explicitly configured
>> - call to http://master:8080/json (this used to work, but doesn't
>> anymore?)
>>
>> I guess I could parse the output html from the Spark UI... but that seems
>> dumb. is there really no better way?
>>
>> Thanks,
>> Virgil.
>>
>>
>>

Re: Finding the number of executors.

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Which version spark are you using? There was a discussion happened over
here
http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html

http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3CCACBYxK+yA1RBbNKWJHEekPnbSbH10RYkuzt-LAqGpDANVHmF_Q@mail.gmail.com%3E
On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vi...@palanciuc.eu> wrote:

> Is there any reliable way to find out the number of executors
> programatically - regardless of how the job  is run? A method that
> preferably works for spark-standalone, yarn, mesos, regardless whether the
> code runs from the shell or not?
>
> Things that I tried and don't work:
> - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
> does not work if task submitted via  spark-submit
> - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't
> work unless explicitly configured
> - call to http://master:8080/json (this used to work, but doesn't
> anymore?)
>
> I guess I could parse the output html from the Spark UI... but that seems
> dumb. is there really no better way?
>
> Thanks,
> Virgil.
>
>
>