You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hu...@Dell.com on 2014/01/14 02:17:44 UTC
squestion on using spark parallelism vs using num partitions in
spark api
Hi,
Using spark 0.8.1 ... jave code running on 8 CPU with 16GRAM single node
It's looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism", 24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism has no effect on the default number of partitions that spark uses in its api's like saveAsTextFile() .
For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked upon calling saveAsTextFile() but it's not the case as I am seeing only 1 task get invoked
If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);
I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
Can someone explain the above behavior?
Thanks,
Hussam
Re: squestion on using spark parallelism vs using num partitions in spark api
Posted by huangjay <ja...@live.cn>.
Please use local[24].
Sent from my iPad.
> 在 2014年1月15日,上午2:35,Hussam_Jarada@Dell.com 写道:
>
> I am using local
>
> Thanks,
> Hussam
>
> From: Huangguowei [mailto:huangguowei@huawei.com]
> Sent: Tuesday, January 14, 2014 4:43 AM
> To: user@spark.incubator.apache.org
> Subject: 答复: squestion on using spark parallelism vs using num partitions in spark api
>
> “Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”
>
> Local or standalone(single node) ?
>
> 发件人: leosandylh@gmail.com [mailto:leosandylh@gmail.com]
> 发送时间: 2014年1月14日 13:42
> 收件人: user
> 主题: Re: squestion on using spark parallelism vs using num partitions in spark api
>
> I think the parallelism param just control how many tasks could be run together in each work.
> it could't control how many tasks should be split .
>
> leosandylh@gmail.com
>
> From: Hussam_Jarada@Dell.com
> Date: 2014-01-14 09:17
> To: user@spark.incubator.apache.org
> Subject: squestion on using spark parallelism vs using num partitions in spark api
> Hi,
>
> Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node
>
> It’s looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism", 24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism has no effect on the default number of partitions that spark uses in its api’s like saveAsTextFile() .
>
> For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I am seeing only 1 task get invoked
>
> If I set my RDD parallelize() to 2 as
> dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
> then invoke
> dataSetRDD.saveAsTextFile(JavaRddFilePath);
>
> I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
>
> Can someone explain the above behavior?
>
> Thanks,
> Hussam
RE: squestion on using spark parallelism vs using num partitions in
spark api
Posted by Hu...@Dell.com.
I am using local
Thanks,
Hussam
From: Huangguowei [mailto:huangguowei@huawei.com]
Sent: Tuesday, January 14, 2014 4:43 AM
To: user@spark.incubator.apache.org
Subject: 答复: squestion on using spark parallelism vs using num partitions in spark api
“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”
Local or standalone(single node) ?
发件人: leosandylh@gmail.com<ma...@gmail.com> [mailto:leosandylh@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark api
I think the parallelism param just control how many tasks could be run together in each work.
it could't control how many tasks should be split .
________________________________
leosandylh@gmail.com<ma...@gmail.com>
From: Hussam_Jarada@Dell.com<ma...@Dell.com>
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: squestion on using spark parallelism vs using num partitions in spark api
Hi,
Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node
It’s looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism", 24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism has no effect on the default number of partitions that spark uses in its api’s like saveAsTextFile() .
For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I am seeing only 1 task get invoked
If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);
I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
Can someone explain the above behavior?
Thanks,
Hussam
答复: squestion on using spark parallelism vs using num partitions in spark api
Posted by Huangguowei <hu...@huawei.com>.
“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”
Local or standalone(single node) ?
发件人: leosandylh@gmail.com [mailto:leosandylh@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark api
I think the parallelism param just control how many tasks could be run together in each work.
it could't control how many tasks should be split .
________________________________
leosandylh@gmail.com<ma...@gmail.com>
From: Hussam_Jarada@Dell.com<ma...@Dell.com>
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: squestion on using spark parallelism vs using num partitions in spark api
Hi,
Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node
It’s looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism", 24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism has no effect on the default number of partitions that spark uses in its api’s like saveAsTextFile() .
For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I am seeing only 1 task get invoked
If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);
I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
Can someone explain the above behavior?
Thanks,
Hussam
Re: squestion on using spark parallelism vs using num partitions in spark api
Posted by "leosandylh@gmail.com" <le...@gmail.com>.
I think the parallelism param just control how many tasks could be run together in each work.
it could't control how many tasks should be split .
leosandylh@gmail.com
From: Hussam_Jarada@Dell.com
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org
Subject: squestion on using spark parallelism vs using num partitions in spark api
Hi,
Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node
It’s looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism", 24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism has no effect on the default number of partitions that spark uses in its api’s like saveAsTextFile() .
For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I am seeing only 1 task get invoked
If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);
I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24
Can someone explain the above behavior?
Thanks,
Hussam