You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by psdc1978 <ps...@gmail.com> on 2010/05/18 19:01:22 UTC
Trying to relate a split file to a input file
Hi,
I'm study the MapReduce code, and I've the following questions:
1 - I'm running the wordcount example. I've 3 txt files as input. Each txt
file is about 120Mb.
During the execution of the map tasks, a number of map tasks will read the
txt files. Each file is divided in split files. I would like to know to each
txt file corresponds a split.
For example, for the A.txt file, it will be created 2 splits (split0 and
split1) of 64Mb each. I would like to know that split0 and split1 belongs to
A.txt.
Is it possible? If I've to do some code, is there any object that contains
this data?
2 -
The Job task uses a job.split file. What contains this file and what is the
purpose of this file?
Thanks,
--
PSC
Re: Trying to relate a split file to a input file
Posted by psdc1978 <ps...@gmail.com>.
I don't think that the workcount example uses FileSplit class. Only the
MultithreadedMapper class uses FileSplit and I can't find an example where
it's invoked.
Where is the setup() method?
On Tue, May 18, 2010 at 6:50 PM, Wilkes, Chris <cw...@gmail.com> wrote:
> In your setup() look at context.getInputSplit(), this will be a FileSplit
> in your case. From there you can do a getPath() to see the both the
> directory structure and the split value.
>
>
> On May 18, 2010, at 10:01 AM, psdc1978 wrote:
>
> Hi,
>>
>> I'm study the MapReduce code, and I've the following questions:
>>
>> 1 - I'm running the wordcount example. I've 3 txt files as input. Each txt
>> file is about 120Mb.
>>
>> During the execution of the map tasks, a number of map tasks will read the
>> txt files. Each file is divided in split files. I would like to know to each
>> txt file corresponds a split.
>> For example, for the A.txt file, it will be created 2 splits (split0 and
>> split1) of 64Mb each. I would like to know that split0 and split1 belongs to
>> A.txt.
>> Is it possible? If I've to do some code, is there any object that contains
>> this data?
>>
>> 2 -
>> The Job task uses a job.split file. What contains this file and what is
>> the purpose of this file?
>>
>> Thanks,
>>
>> --
>> PSC
>>
>
>
--
Pedro
Re: Trying to relate a split file to a input file
Posted by "Wilkes, Chris" <cw...@gmail.com>.
In your setup() look at context.getInputSplit(), this will be a
FileSplit in your case. From there you can do a getPath() to see the
both the directory structure and the split value.
On May 18, 2010, at 10:01 AM, psdc1978 wrote:
> Hi,
>
> I'm study the MapReduce code, and I've the following questions:
>
> 1 - I'm running the wordcount example. I've 3 txt files as input.
> Each txt file is about 120Mb.
>
> During the execution of the map tasks, a number of map tasks will
> read the txt files. Each file is divided in split files. I would
> like to know to each txt file corresponds a split.
> For example, for the A.txt file, it will be created 2 splits (split0
> and split1) of 64Mb each. I would like to know that split0 and
> split1 belongs to A.txt.
> Is it possible? If I've to do some code, is there any object that
> contains this data?
>
> 2 -
> The Job task uses a job.split file. What contains this file and what
> is the purpose of this file?
>
> Thanks,
>
> --
> PSC