You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2009/11/26 08:05:12 UTC

The name of the current input file during a map

Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce).  the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)

I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.

Q: In the map stage,how do I retrieve the name of the file being
processed?  I'd rather not use the MapFileOutputFormat.

Hadoop 0.21

Regards
Saptarshi

Re: The name of the current input file during a map

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
-"mapred.input.file"
+"map.input.file"
Should work

Amogh

On 11/26/09 12:57 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:

Hello again,
I'm using Hadoop 0.21 and its context object  e.g

 public void setup(Context context) {
        Configuration cfg = context.getConfiguration();
System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));

displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi


On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
<sa...@gmail.com> wrote:
> Thank you.
> Regards
> Saptarshi
>
> On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Conf.get(map.input.file) is what you need.
>>
>> Amogh
>>
>>
>> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>>
>> Hello,
>> I have a set of input files part-r-* which I will pass through another
>> map(no reduce).  the part-r-* files consist of key, values, keys being
>> small, values fairly large(MB's)
>>
>> I would like to index these, i.e run a map, whose output is key and
>> /filename/ i.e to which part-r-* file the particular key belongs, so
>> that if i need them again I can just access that file.
>>
>> Q: In the map stage,how do I retrieve the name of the file being
>> processed?  I'd rather not use the MapFileOutputFormat.
>>
>> Hadoop 0.21
>>
>> Regards
>> Saptarshi
>>
>>
>


Re: The name of the current input file during a map

Posted by Owen O'Malley <om...@apache.org>.
On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote:

> I'm using Hadoop 0.21 and its context object

In the new API you can re-write that as:

((FIleSplit) context.getInputSplit()).getPath()

-- Owen

Re: The name of the current input file during a map

Posted by Saptarshi Guha <sa...@gmail.com>.
Hello again,
I'm using Hadoop 0.21 and its context object  e.g

 public void setup(Context context) {
	Configuration cfg = context.getConfiguration();
System.out.println("mapred.input.file="+cfg.get("mapred.input.file"));

displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi


On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
<sa...@gmail.com> wrote:
> Thank you.
> Regards
> Saptarshi
>
> On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Conf.get(map.input.file) is what you need.
>>
>> Amogh
>>
>>
>> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>>
>> Hello,
>> I have a set of input files part-r-* which I will pass through another
>> map(no reduce).  the part-r-* files consist of key, values, keys being
>> small, values fairly large(MB's)
>>
>> I would like to index these, i.e run a map, whose output is key and
>> /filename/ i.e to which part-r-* file the particular key belongs, so
>> that if i need them again I can just access that file.
>>
>> Q: In the map stage,how do I retrieve the name of the file being
>> processed?  I'd rather not use the MapFileOutputFormat.
>>
>> Hadoop 0.21
>>
>> Regards
>> Saptarshi
>>
>>
>

Re: The name of the current input file during a map

Posted by Saptarshi Guha <sa...@gmail.com>.
Thank you.
Regards
Saptarshi

On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Conf.get(map.input.file) is what you need.
>
> Amogh
>
>
> On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:
>
> Hello,
> I have a set of input files part-r-* which I will pass through another
> map(no reduce).  the part-r-* files consist of key, values, keys being
> small, values fairly large(MB's)
>
> I would like to index these, i.e run a map, whose output is key and
> /filename/ i.e to which part-r-* file the particular key belongs, so
> that if i need them again I can just access that file.
>
> Q: In the map stage,how do I retrieve the name of the file being
> processed?  I'd rather not use the MapFileOutputFormat.
>
> Hadoop 0.21
>
> Regards
> Saptarshi
>
>

Re: The name of the current input file during a map

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Conf.get(map.input.file) is what you need.

Amogh


On 11/26/09 12:35 PM, "Saptarshi Guha" <sa...@gmail.com> wrote:

Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce).  the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)

I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.

Q: In the map stage,how do I retrieve the name of the file being
processed?  I'd rather not use the MapFileOutputFormat.

Hadoop 0.21

Regards
Saptarshi