You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by cheez <11...@seecs.edu.pk> on 2015/08/07 20:32:16 UTC

Get bucket details created in shuffle phase

Hey all.
I was trying to understand Spark Internals by looking in to (and hacking)
the code. 

I was trying to explore the buckets which are generated
when we partition the output of each map task and then let the reduce side
fetch them on the basis of paritionId. I went into the write() method of
SortShuffleWriter and there is an Iterator by the name of records passed in
to it as an argument. This key-value pair is what I though represented the
buckets. But upon exploring its contents I realized that I was wrong because
pairs with same keys were being shown in different buckets which should not
have been the case.

I'd really appreciate if someone could help me find where these buckets
originate.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-bucket-details-created-in-shuffle-phase-tp24175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org