You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/05/11 18:52:51 UTC

Where the map task uses the set of locations?

Hi,

I was looking to the mapred code, searching for the moment where the
split location is passed to the MapTask, and I've found this line in
TaskInProgress class.
[code]
t = new MapTask(jobFile, taskid, partition, splitClass, split,
rawSplit.getFileName(), rawSplit.getLocations());
[/code]

The split variable is the split.

[code]
	BytesWritable split;
			if (!jobSetup && !jobCleanup) {
				splitClass = rawSplit.getClassName();
				split = rawSplit.getBytes();
			} else {
				split = new BytesWritable();
			}
[/code]

The "rawSplit.getFileName()" is the full URL to the split file
(hdfs://chicon-7.fr:54310/user/xxx/gutenberg/A.txt), the locations are
the servers where the split is ([chicon-7.fr, chinqchint-21.fr,
chinqchint-38.fr]).


1 - Why during the creation of a MapTask is passed the split and the
filename and the set of locations? If the split is passed, I deduce
that the map task already contains the split bytes, that it will use.
So, why not just pass the split, and ignore the the filename and the
set of locations?



Thanks

-- 
---------------------------
PSC

Re: Where the map task uses the set of locations?

Posted by Pedro Costa <ps...@gmail.com>.

Please, forget my question. I was looking to the wrong code.

On Wed, May 11, 2011 at 5:52 PM, Pedro Costa <ps...@gmail.com> wrote:
> Hi,
>
> I was looking to the mapred code, searching for the moment where the
> split location is passed to the MapTask, and I've found this line in
> TaskInProgress class.
> [code]
> t = new MapTask(jobFile, taskid, partition, splitClass, split,
> rawSplit.getFileName(), rawSplit.getLocations());
> [/code]
>
> The split variable is the split.
>
> [code]
>        BytesWritable split;
>                        if (!jobSetup && !jobCleanup) {
>                                splitClass = rawSplit.getClassName();
>                                split = rawSplit.getBytes();
>                        } else {
>                                split = new BytesWritable();
>                        }
> [/code]
>
> The "rawSplit.getFileName()" is the full URL to the split file
> (hdfs://chicon-7.fr:54310/user/xxx/gutenberg/A.txt), the locations are
> the servers where the split is ([chicon-7.fr, chinqchint-21.fr,
> chinqchint-38.fr]).
>
>
> 1 - Why during the creation of a MapTask is passed the split and the
> filename and the set of locations? If the split is passed, I deduce
> that the map task already contains the split bytes, that it will use.
> So, why not just pass the split, and ignore the the filename and the
> set of locations?
>
>
>
> Thanks
>
> --
> ---------------------------
> PSC
>



-- 
---------------------------
Pedro Sá da Costa

@: pcosta@lasige.di.fc.ul.pt
@: psdc1978@gmail.com