You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 喜之郎 <25...@qq.com> on 2016/05/20 11:17:27 UTC

Spark.default.parallelism can not set reduce number

Hi all.
I set Spark.default.parallelism equals 20 in spark-default.conf. And send this file to all nodes.
But I found reduce number is still default value,200.
Does anyone else encouter this problem? can anyone give some advice?


############
[Stage 9:>                                                        (0 + 0) / 200]
[Stage 9:>                                                        (0 + 2) / 200]
[Stage 9:>                                                        (1 + 2) / 200]
[Stage 9:>                                                        (2 + 2) / 200]
#######


And this results in many empty files.Because my data is little, only some of the 200 files have data.
#######
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
 2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
########

Re: Spark.default.parallelism can not set reduce number

Posted by Takeshi Yamamuro <li...@gmail.com>.

You need to use `spark.sql.shuffle.partitions`.

// maropu

On Fri, May 20, 2016 at 8:17 PM, 喜之郎 <25...@qq.com> wrote:

>  Hi all.
> I set Spark.default.parallelism equals 20 in spark-default.conf. And send
> this file to all nodes.
> But I found reduce number is still default value,200.
> Does anyone else encouter this problem? can anyone give some advice?
>
> ############
> [Stage 9:>                                                        (0 + 0)
> / 200]
> [Stage 9:>                                                        (0 + 2)
> / 200]
> [Stage 9:>                                                        (1 + 2)
> / 200]
> [Stage 9:>                                                        (2 + 2)
> / 200]
> #######
>
> And this results in many empty files.Because my data is little, only some
> of the 200 files have data.
> #######
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
>  2016-05-20 17:01
> /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
> ########
>
>
>
>


-- 
---
Takeshi Yamamuro

Re: Spark.default.parallelism can not set reduce number

Posted by Ovidiu-Cristian MARCU <ov...@inria.fr>.

You can check org.apache.spark.sql.internal.SQLConf for other default settings as well.
  val SHUFFLE_PARTITIONS = SQLConfigBuilder("spark.sql.shuffle.partitions")
    .doc("The default number of partitions to use when shuffling data for joins or aggregations.")
    .intConf
    .createWithDefault(200)


> On 20 May 2016, at 13:17, 喜之郎 <25...@qq.com> wrote:
> 
>  Hi all.
> I set Spark.default.parallelism equals 20 in spark-default.conf. And send this file to all nodes.
> But I found reduce number is still default value,200.
> Does anyone else encouter this problem? can anyone give some advice?
> 
> ############
> [Stage 9:>                                                        (0 + 0) / 200]
> [Stage 9:>                                                        (0 + 2) / 200]
> [Stage 9:>                                                        (1 + 2) / 200]
> [Stage 9:>                                                        (2 + 2) / 200]
> #######
> 
> And this results in many empty files.Because my data is little, only some of the 200 files have data.
> #######
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004
>  2016-05-20 17:01 /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005
> ########
> 
> 
>