You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@carbondata.apache.org by suzzy <su...@hotmail.com> on 2017/06/20 03:41:53 UTC

how to add RDD partition?

Hi 
Running query 'select count(1) from sunzy.datatest' 
this job had 16 blocks and 16 tasks, but only 4  partitions 
how  to add RDD partition? 
thanks 

CarbonData ThriftServer Log: 

INFO 16-06 16:14:34,039 -  
 Identified no.of.blocks: 16, 
 no.of.tasks: 16, 
 no.of.nodes: 0, 
 parallelism: 4 
INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2 
INFO 16-06 16:14:34,060 - Registering RDD 12 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with 1
output partitions 
INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2 (MapPartitionsRDD[12]
at run at AccessController.java:-2), which has no missing parents 
INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory
(estimated size 15.4 KB, free 62.2 KB) 
INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in memory
(estimated size 7.6 KB, free 69.8 KB) 
INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on
192.168.1.41:57617 (size: 7.6 KB, free: 71.7 GB) 
INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at
DAGScheduler.scala:1006 
INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage 2
(MapPartitionsRDD[12] at run at AccessController.java:-2) 
INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks 
INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4,
partition 2,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3,
partition 0,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1,
partition 1,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2,
partition 4,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719
(size: 7.6 KB, free: 57.3 GB)



--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.

答复: how to add RDD partition?

Posted by sun suzzy <su...@hotmail.com>.
yes, thanks, it's ok now

________________________________
发件人: Liang Chen <ch...@apache.org>
发送时间: 2017年6月26日 14:44:32
收件人: user@carbondata.apache.org
主题: Re: how to add RDD partition?

Hi

Can't understand your question exactly, do you want to increase parallelism?
If yes:
You can set Spark's parallelism parameter

Regards
Liang

2017-06-20 11:41 GMT+08:00 suzzy <su...@hotmail.com>>:
Hi
Running query 'select count(1) from sunzy.datatest'
this job had 16 blocks and 16 tasks, but only 4  partitions
how  to add RDD partition?
thanks

CarbonData ThriftServer Log:

INFO 16-06 16:14:34,039 -
 Identified no.of.blocks: 16,
 no.of.tasks: 16,
 no.of.nodes: 0,
 parallelism: 4
INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2
INFO 16-06 16:14:34,060 - Registering RDD 12 (run at
AccessController.java:-2)
INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with 1
output partitions
INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at
AccessController.java:-2)
INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2)
INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2)
INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2 (MapPartitionsRDD[12]
at run at AccessController.java:-2), which has no missing parents
INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory
(estimated size 15.4 KB, free 62.2 KB)
INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in memory
(estimated size 7.6 KB, free 69.8 KB)
INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on
192.168.1.41:57617<http://192.168.1.41:57617> (size: 7.6 KB, free: 71.7 GB)
INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at
DAGScheduler.scala:1006
INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage 2
(MapPartitionsRDD[12] at run at AccessController.java:-2)
INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks
INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4,
partition 2,NODE_LOCAL, 2376 bytes)
INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3,
partition 0,NODE_LOCAL, 2376 bytes)
INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1,
partition 1,NODE_LOCAL, 2376 bytes)
INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2,
partition 4,NODE_LOCAL, 2376 bytes)
INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002
(size: 7.6 KB, free: 57.3 GB)
INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086
(size: 7.6 KB, free: 57.3 GB)
INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618
(size: 7.6 KB, free: 57.3 GB)
INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719
(size: 7.6 KB, free: 57.3 GB)



--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.


Re: how to add RDD partition?

Posted by Liang Chen <ch...@apache.org>.
Hi

Can't understand your question exactly, do you want to increase
parallelism?
If yes:
You can set Spark's parallelism parameter

Regards
Liang

2017-06-20 11:41 GMT+08:00 suzzy <su...@hotmail.com>:

> Hi
> Running query 'select count(1) from sunzy.datatest'
> this job had 16 blocks and 16 tasks, but only 4  partitions
> how  to add RDD partition?
> thanks
>
> CarbonData ThriftServer Log:
>
> INFO 16-06 16:14:34,039 -
>  Identified no.of.blocks: 16,
>  no.of.tasks: 16,
>  no.of.nodes: 0,
>  parallelism: 4
> INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2
> INFO 16-06 16:14:34,060 - Registering RDD 12 (run at
> AccessController.java:-2)
> INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with
> 1
> output partitions
> INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at
> AccessController.java:-2)
> INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2)
> INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2)
> INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2
> (MapPartitionsRDD[12]
> at run at AccessController.java:-2), which has no missing parents
> INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory
> (estimated size 15.4 KB, free 62.2 KB)
> INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in
> memory
> (estimated size 7.6 KB, free 69.8 KB)
> INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on
> 192.168.1.41:57617 (size: 7.6 KB, free: 71.7 GB)
> INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at
> DAGScheduler.scala:1006
> INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage
> 2
> (MapPartitionsRDD[12] at run at AccessController.java:-2)
> INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks
> INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4,
> partition 2,NODE_LOCAL, 2376 bytes)
> INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3,
> partition 0,NODE_LOCAL, 2376 bytes)
> INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1,
> partition 1,NODE_LOCAL, 2376 bytes)
> INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2,
> partition 4,NODE_LOCAL, 2376 bytes)
> INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002
> (size: 7.6 KB, free: 57.3 GB)
> INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086
> (size: 7.6 KB, free: 57.3 GB)
> INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618
> (size: 7.6 KB, free: 57.3 GB)
> INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719
> (size: 7.6 KB, free: 57.3 GB)
>
>
>
> --
> View this message in context: http://apache-carbondata-user-
> mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31.html
> Sent from the Apache CarbonData User Mailing List mailing list archive at
> Nabble.com.
>

Re: 答复: how to add RDD partition?

Posted by Erlu Chen <ch...@gmail.com>.
You are welcome!
: )



--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31p36.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.

答复: how to add RDD partition?

Posted by sun suzzy <su...@hotmail.com>.
thanks

________________________________
发件人: Erlu Chen <ch...@gmail.com>
发送时间: 2017年6月26日 14:42:20
收件人: user@carbondata.apache.org
主题: Re: how to add RDD partition?

Hi

please try to set spark.sql.shuffle.partitions or spark.default.parallelism
which can set task number.

spark.sql.shuffle.partitions is for spark sql.

spark.default.parallelism is for spark rdd.

Regards.
Chenerlu.





--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31p32.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.

Re: how to add RDD partition?

Posted by Erlu Chen <ch...@gmail.com>.
Hi

please try to set spark.sql.shuffle.partitions or spark.default.parallelism
which can set task number.

spark.sql.shuffle.partitions is for spark sql.

spark.default.parallelism is for spark rdd.

Regards.
Chenerlu.





--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31p32.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.