You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 喜之郎 <25...@qq.com> on 2016/05/20 11:17:27 UTC
Spark.default.parallelism can not set reduce number
Hi all.
I set Spark.default.parallelism equals 20 in spark-default.conf. And send this file to all nodes.
But I found reduce number is still default value,200.
Does anyone else encouter this problem? can anyone give some advice?
############
[Stage 9:> (0 + 0) / 200]
[Stage 9:> (0 + 2) / 200]
[Stage 9:> (1 + 2) / 200]
[Stage 9:> (2 + 2) / 200]
#######
And this results in many empty files.Because my data is little, only some of the 200 files have data.
#######
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
########
Re: Spark.default.parallelism can not set reduce number
Posted by Takeshi Yamamuro <li...@gmail.com>.
You need to use `spark.sql.shuffle.partitions`.
// maropu
On Fri, May 20, 2016 at 8:17 PM, 喜之郎 <25...@qq.com> wrote:
> Hi all.
> I set Spark.default.parallelism equals 20 in spark-default.conf. And send
> this file to all nodes.
> But I found reduce number is still default value,200.
> Does anyone else encouter this problem? can anyone give some advice?
>
> ############
> [Stage 9:> (0 + 0)
> / 200]
> [Stage 9:> (0 + 2)
> / 200]
> [Stage 9:> (1 + 2)
> / 200]
> [Stage 9:> (2 + 2)
> / 200]
> #######
>
> And this results in many empty files.Because my data is little, only some
> of the 200 files have data.
> #######
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
> 2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
> ########
>
>
>
>
--
---
Takeshi Yamamuro
Re: Spark.default.parallelism can not set reduce number
Posted by Ovidiu-Cristian MARCU <ov...@inria.fr>.
You can check org.apache.spark.sql.internal.SQLConf for other default settings as well.
val SHUFFLE_PARTITIONS = SQLConfigBuilder("spark.sql.shuffle.partitions")
.doc("The default number of partitions to use when shuffling data for joins or aggregations.")
.intConf
.createWithDefault(200)
> On 20 May 2016, at 13:17, 喜之郎 <25...@qq.com> wrote:
>
> Hi all.
> I set Spark.default.parallelism equals 20 in spark-default.conf. And send this file to all nodes.
> But I found reduce number is still default value,200.
> Does anyone else encouter this problem? can anyone give some advice?
>
> ############
> [Stage 9:> (0 + 0) / 200]
> [Stage 9:> (0 + 2) / 200]
> [Stage 9:> (1 + 2) / 200]
> [Stage 9:> (2 + 2) / 200]
> #######
>
> And this results in many empty files.Because my data is little, only some of the 200 files have data.
> #######
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
> 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
> ########
>
>
>