You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by prateek arora <pr...@gmail.com> on 2016/05/10 04:58:43 UTC
spark 1.6 : RDD Partitions not distributed evenly to executors
Hi
My Spark Streaming application receiving data from one kafka topic ( one
partition) and rdd have 30 partition.
but scheduler schedule the task between executors running on same host
with NODE_LOCAL locality level. ( where kafka topic partition created) .
Below are the logs :
16/05/06 11:21:38 INFO YarnScheduler: Adding task set 1.0 with 30 tasks
16/05/06 11:21:38 DEBUG TaskSetManager: Epoch for TaskSet 1.0: 1
16/05/06 11:21:38 DEBUG TaskSetManager: Valid locality levels for TaskSet
1.0: NODE_LOCAL, RACK_LOCAL, ANY
16/05/06 11:21:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, ivcp-m04.novalocal, partition 0,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
2, ivcp-m04.novalocal, partition 1,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
3, ivcp-m04.novalocal, partition 2,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
4, ivcp-m04.novalocal, partition 3,NODE_LOCAL, 2248 bytes)
16/05/06 11:21:38 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
5, ivcp-m04.novalocal, partition 4,NODE_LOCAL, 2248 bytes)
I have seen this scenario after upgrading my spark from 1.5.to 1.6 . same
application distributed rdd partition evenly to executors in spark 1.5 .
As mentioned on some spark developer blogs , I have tried
spark.shuffle.reduceLocality.enabled=false and after that my rdd partition
is distributed between executors of all host with PROCESS_LOCAL locality
level.
Below are the logs :
16/05/06 11:24:46 INFO YarnScheduler: Adding task set 1.0 with 30 tasks
16/05/06 11:24:46 DEBUG TaskSetManager: Valid locality levels for TaskSet
1.0: NO_PREF, ANY
16/05/06 11:24:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1, ivcp-m02.novalocal, partition 0,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
2, ivcp-m01.novalocal, partition 1,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
3, ivcp-m06.novalocal, partition 2,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
4, ivcp-m04.novalocal, partition 3,PROCESS_LOCAL, 2248 bytes)
16/05/06 11:24:46 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
5, ivcp-m04.novalocal, partition 4,PROCESS_LOCAL, 2248 bytes)
--------
--------
--------
Is above configuration is correct solution for problem ? and why
spark.shuffle.reduceLocality.enabled not mentioned in spark configuration
document ?
Regards
Prateek
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-RDD-Partitions-not-distributed-evenly-to-executors-tp26911.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org