You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by psdc1978 <ps...@gmail.com> on 2010/05/18 19:01:22 UTC

Trying to relate a split file to a input file

Hi,

I'm study the MapReduce code, and I've the following questions:

1 - I'm running the wordcount example. I've 3 txt files as input. Each txt
file is about 120Mb.

During the execution of the map tasks, a number of map tasks will read the
txt files. Each file is divided in split files. I would like to know to each
txt file corresponds a split.
For example, for the A.txt file, it will be created 2 splits (split0 and
split1) of 64Mb each. I would like to know that split0 and split1 belongs to
A.txt.
Is it possible? If I've to do some code, is there any object that contains
this data?

2 -
The Job task uses a job.split file. What contains this file and what is the
purpose of this file?

Thanks,

-- 
PSC

Re: Trying to relate a split file to a input file

Posted by psdc1978 <ps...@gmail.com>.

I don't think that the workcount example uses FileSplit class. Only the
MultithreadedMapper class uses FileSplit and I can't find an example where
it's invoked.

Where is the setup() method?



On Tue, May 18, 2010 at 6:50 PM, Wilkes, Chris <cw...@gmail.com> wrote:

> In your setup() look at context.getInputSplit(), this will be a FileSplit
> in your case.   From there you can do a getPath() to see the both the
> directory structure and the split value.
>
>
> On May 18, 2010, at 10:01 AM, psdc1978 wrote:
>
>  Hi,
>>
>> I'm study the MapReduce code, and I've the following questions:
>>
>> 1 - I'm running the wordcount example. I've 3 txt files as input. Each txt
>> file is about 120Mb.
>>
>> During the execution of the map tasks, a number of map tasks will read the
>> txt files. Each file is divided in split files. I would like to know to each
>> txt file corresponds a split.
>> For example, for the A.txt file, it will be created 2 splits (split0 and
>> split1) of 64Mb each. I would like to know that split0 and split1 belongs to
>> A.txt.
>> Is it possible? If I've to do some code, is there any object that contains
>> this data?
>>
>> 2 -
>> The Job task uses a job.split file. What contains this file and what is
>> the purpose of this file?
>>
>> Thanks,
>>
>> --
>> PSC
>>
>
>


-- 
Pedro

Re: Trying to relate a split file to a input file

Posted by "Wilkes, Chris" <cw...@gmail.com>.

In your setup() look at context.getInputSplit(), this will be a  
FileSplit in your case.   From there you can do a getPath() to see the  
both the directory structure and the split value.

On May 18, 2010, at 10:01 AM, psdc1978 wrote:

> Hi,
>
> I'm study the MapReduce code, and I've the following questions:
>
> 1 - I'm running the wordcount example. I've 3 txt files as input.  
> Each txt file is about 120Mb.
>
> During the execution of the map tasks, a number of map tasks will  
> read the txt files. Each file is divided in split files. I would  
> like to know to each txt file corresponds a split.
> For example, for the A.txt file, it will be created 2 splits (split0  
> and split1) of 64Mb each. I would like to know that split0 and  
> split1 belongs to A.txt.
> Is it possible? If I've to do some code, is there any object that  
> contains this data?
>
> 2 -
> The Job task uses a job.split file. What contains this file and what  
> is the purpose of this file?
>
> Thanks,
>
> -- 
> PSC