You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2009/11/26 08:05:12 UTC
The name of the current input file during a map
Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce). the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)
I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.
Q: In the map stage,how do I retrieve the name of the file being
processed? I'd rather not use the MapFileOutputFormat.
Hadoop 0.21
Regards
Saptarshi
Re: The name of the current input file during a map
Posted by Amogh Vasekar <am...@yahoo-inc.com>.
-"mapred.input.file"
+"map.input.file"
Should work
Amogh
On 11/26/09 12:57 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
Hello again,
I'm using Hadoop 0.21 and its context object e.g
public void setup(Context context) {
Configuration cfg = context.getConfiguration();
System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));
displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi
On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
<sa...@gmail.com> wrote:
> Thank you.
> Regards
> Saptarshi
>
> On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Conf.get(map.input.file) is what you need.
>>
>> Amogh
>>
>>
>> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>>
>> Hello,
>> I have a set of input files part-r-* which I will pass through another
>> map(no reduce). the part-r-* files consist of key, values, keys being
>> small, values fairly large(MB's)
>>
>> I would like to index these, i.e run a map, whose output is key and
>> /filename/ i.e to which part-r-* file the particular key belongs, so
>> that if i need them again I can just access that file.
>>
>> Q: In the map stage,how do I retrieve the name of the file being
>> processed? I'd rather not use the MapFileOutputFormat.
>>
>> Hadoop 0.21
>>
>> Regards
>> Saptarshi
>>
>>
>
Re: The name of the current input file during a map
Posted by Owen O'Malley <om...@apache.org>.
On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote:
> I'm using Hadoop 0.21 and its context object
In the new API you can re-write that as:
((FIleSplit) context.getInputSplit()).getPath()
-- Owen
Re: The name of the current input file during a map
Posted by Saptarshi Guha <sa...@gmail.com>.
Hello again,
I'm using Hadoop 0.21 and its context object e.g
public void setup(Context context) {
Configuration cfg = context.getConfiguration();
System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));
displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi
On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
<sa...@gmail.com> wrote:
> Thank you.
> Regards
> Saptarshi
>
> On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Conf.get(map.input.file) is what you need.
>>
>> Amogh
>>
>>
>> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>>
>> Hello,
>> I have a set of input files part-r-* which I will pass through another
>> map(no reduce). the part-r-* files consist of key, values, keys being
>> small, values fairly large(MB's)
>>
>> I would like to index these, i.e run a map, whose output is key and
>> /filename/ i.e to which part-r-* file the particular key belongs, so
>> that if i need them again I can just access that file.
>>
>> Q: In the map stage,how do I retrieve the name of the file being
>> processed? I'd rather not use the MapFileOutputFormat.
>>
>> Hadoop 0.21
>>
>> Regards
>> Saptarshi
>>
>>
>
Re: The name of the current input file during a map
Posted by Saptarshi Guha <sa...@gmail.com>.
Thank you.
Regards
Saptarshi
On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Conf.get(map.input.file) is what you need.
>
> Amogh
>
>
> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>
> Hello,
> I have a set of input files part-r-* which I will pass through another
> map(no reduce). the part-r-* files consist of key, values, keys being
> small, values fairly large(MB's)
>
> I would like to index these, i.e run a map, whose output is key and
> /filename/ i.e to which part-r-* file the particular key belongs, so
> that if i need them again I can just access that file.
>
> Q: In the map stage,how do I retrieve the name of the file being
> processed? I'd rather not use the MapFileOutputFormat.
>
> Hadoop 0.21
>
> Regards
> Saptarshi
>
>
Re: The name of the current input file during a map
Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Conf.get(map.input.file) is what you need.
Amogh
On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce). the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)
I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.
Q: In the map stage,how do I retrieve the name of the file being
processed? I'd rather not use the MapFileOutputFormat.
Hadoop 0.21
Regards
Saptarshi