You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by SRK <sw...@gmail.com> on 2016/05/22 03:31:44 UTC

How to set the degree of parallelism in Spark SQL?

Hi,

How to set the degree of parallelism in Spark SQL? I am using the following
but it somehow seems to allocate only two executors at a time.

 sqlContext.sql(" set spark.sql.shuffle.partitions  200  ")

Thanks,
Swetha





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to set the degree of parallelism in Spark SQL?

Posted by Xinh Huynh <xi...@gmail.com>.

To the original question of parallelism and executors: you can have a
parallelism of 200, even with 2 executors. In the Spark UI, you should see
that the number of _tasks_ is 200 when your job involves shuffling.

Executors vs. tasks:
http://spark.apache.org/docs/latest/cluster-overview.html

Xinh

On Mon, May 23, 2016 at 5:48 AM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> Since the default is 200, I would guess you're only running 2 executors.
> Try to verify how many executor you are actually running with the web
> interface (port 8080 where the master is running).
>
> On Sat, May 21, 2016 at 11:42 PM Ted Yu <yu...@gmail.com> wrote:
>
>> Looks like an equal sign is missing between partitions and 200.
>>
>> On Sat, May 21, 2016 at 8:31 PM, SRK <sw...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> How to set the degree of parallelism in Spark SQL? I am using the
>>> following
>>> but it somehow seems to allocate only two executors at a time.
>>>
>>>  sqlContext.sql(" set spark.sql.shuffle.partitions  200  ")
>>>
>>> Thanks,
>>> Swetha
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Re: How to set the degree of parallelism in Spark SQL?

Posted by Mathieu Longtin <ma...@closetwork.org>.

Since the default is 200, I would guess you're only running 2 executors.
Try to verify how many executor you are actually running with the web
interface (port 8080 where the master is running).

On Sat, May 21, 2016 at 11:42 PM Ted Yu <yu...@gmail.com> wrote:

> Looks like an equal sign is missing between partitions and 200.
>
> On Sat, May 21, 2016 at 8:31 PM, SRK <sw...@gmail.com> wrote:
>
>> Hi,
>>
>> How to set the degree of parallelism in Spark SQL? I am using the
>> following
>> but it somehow seems to allocate only two executors at a time.
>>
>>  sqlContext.sql(" set spark.sql.shuffle.partitions  200  ")
>>
>> Thanks,
>> Swetha
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
> --
Mathieu Longtin
1-514-803-8977

Re: How to set the degree of parallelism in Spark SQL?

Posted by Ted Yu <yu...@gmail.com>.

Looks like an equal sign is missing between partitions and 200.

On Sat, May 21, 2016 at 8:31 PM, SRK <sw...@gmail.com> wrote:

> Hi,
>
> How to set the degree of parallelism in Spark SQL? I am using the following
> but it somehow seems to allocate only two executors at a time.
>
>  sqlContext.sql(" set spark.sql.shuffle.partitions  200  ")
>
> Thanks,
> Swetha
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: How to set the degree of parallelism in Spark SQL?

Posted by Mich Talebzadeh <mi...@gmail.com>.

Also worth adding that in standalone mode there is only one executor per
spark-submit job.

In Standalone cluster mode Spark allocates resources based on cores. By
default, an application will grab all the cores in the cluster.

You only have one worker that lives within the driver JVM process that you
start when you start the application with spark-shell or spark-submit in
the host where the cluster manager is running.

The Driver node runs on the same host that the cluster manager is running.
The Driver requests the Cluster Manager for resources to run tasks.. That
worker is tasked to create the executor (in this case there is only one
executor) for the Driver. The Executor runs tasks for the Driver. Only one
executor can be allocated on each worker per application

thanks

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

On 26 May 2016 at 18:45, Ian <ps...@gmail.com> wrote:

> The number of executors is set when you launch the shell or an application
> with /spark-submit/. It's controlled by the /num-executors/ parameter:
>
> https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/
> .
>
> Important is also that cranking up the number may not cause your queries to
> run faster. If you set it to, let's say 200, but you only have 10 cores
> divided over 5 nodes, then you may not see a significant speed-up beyond
> 5-10 executors.
>
> You may want to check out Cloudera's tuning guide:
>
> http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996p27031.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: How to set the degree of parallelism in Spark SQL?

Posted by Ian <ps...@gmail.com>.

The number of executors is set when you launch the shell or an application
with /spark-submit/. It's controlled by the /num-executors/ parameter:
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/.

Important is also that cranking up the number may not cause your queries to
run faster. If you set it to, let's say 200, but you only have 10 cores
divided over 5 nodes, then you may not see a significant speed-up beyond
5-10 executors.

You may want to check out Cloudera's tuning guide:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996p27031.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org