You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Florin P <fl...@yahoo.com> on 2011/07/22 14:34:46 UTC

Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Hello!
  I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
   As far as I know when using  CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

I look forward for your answers. Thank you.
  Regards,
  Florin

Re: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Posted by Harsh J <ha...@cloudera.com>.
Florin,

I believe you answered yourself accidentally?

On Thu, Jul 28, 2011 at 4:10 PM, Florin P <fl...@yahoo.com> wrote:
> --- On Fri, 7/22/11, Florin P <fl...@yahoo.com> wrote:
>
> From: Florin P <fl...@yahoo.com>
> Subject: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used
> To: hdfs-user@hadoop.apache.org
> Date: Friday, July 22, 2011, 8:34 AM
>
> Hello!
>   I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
>    As far as I know when using CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

Depending on your you have implemented your per-FileSplit record
readers in the CFIP, you can set "map.input.file" in the Configuration
instance in each of its initialization. This is somewhat self managed
here since several record readers may be initialized. Let me know if
you would like to see a simple example along as well.

--
Harsh J

Re: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Posted by Florin P <fl...@yahoo.com>.
Hello!
 In the Hadoop 0.20, you'll do the following:
In the mapper class
1. create a field "job" of type JobConf
2. in the "configure" method of the mapper class 
      initialize the job with the received argument
3. In the map function you'll get the processed file name by using  the property map.input.file (example job.get("map.input.file"))

I hope that this help.
  Regards,
 Florin


--- On Fri, 7/22/11, Florin P <fl...@yahoo.com> wrote:

From: Florin P <fl...@yahoo.com>
Subject: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used
To: hdfs-user@hadoop.apache.org
Date: Friday, July 22, 2011, 8:34 AM

Hello!
  I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
   As far as I know when using  CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

I look forward for your answers. Thank you.
  Regards,
  Florin