You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Rupert Mazzucco <ru...@gmail.com> on 2021/05/18 08:51:14 UTC

streaming in 2.10.0 : sequence of parts

Without a reducer, does the sequence of output parts-XXXXX
files correspond to the sequence of input records, or could some
shuffling occur? If it does not match, how can I find the part
corresponding to a given input record? I had some hopes for
mapreduce_task_partition, but it seems the number of partitions
does not necessarily match the number of requested mapping tasks
(e.g., I-D mapreduce.job.maps=1271 gave me 1271 parts for 1271
input records, but only 1219 unique mapreduce_task_partition values ...)

Many thanks
Rupert