You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sandhya E <sa...@gmail.com> on 2007/07/10 11:22:56 UTC

pattern for input files in MapReduce

Hi

I'm using the latest version of Hadoop. Does it support specifying a pattern
for input file names, apart from specifying an  input path thru
jobConf.setInputPath(). In my case, logfiles for over a month are stored in
a single folder with date+hour embedded in their names, and I want mapreduce
to run on one day's logs at a time.

TIA
Sandhya

RE: pattern for input files in MapReduce

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Yes it does support patterns in filenames. 

-----Original Message-----
From: Sandhya E [mailto:sandhyabhaskar@gmail.com] 
Sent: Tuesday, July 10, 2007 2:53 PM
To: hadoop-user@lucene.apache.org
Subject: pattern for input files in MapReduce

Hi

I'm using the latest version of Hadoop. Does it support specifying a pattern
for input file names, apart from specifying an  input path thru
jobConf.setInputPath(). In my case, logfiles for over a month are stored in
a single folder with date+hour embedded in their names, and I want mapreduce
to run on one day's logs at a time.

TIA
Sandhya


RE: pattern for input files in MapReduce

Posted by "Mahajan, Neeraj" <ne...@ebay.com>.
You can extend FileInputFormat and override listPaths(). Depending on
your requirements, you might be able to use almost the same code that is
in  FileInputFormat#listPaths() and only define a new filter instead of
hiddenFileFilter.
You will have to set this new file input format when you create the job
conf.
		conf.setInputFormat(YOURInputFormat.class);

~ Neeraj

-----Original Message-----
From: Sandhya E [mailto:sandhyabhaskar@gmail.com] 
Sent: Tuesday, July 10, 2007 2:23 AM
To: hadoop-user@lucene.apache.org
Subject: pattern for input files in MapReduce

Hi

I'm using the latest version of Hadoop. Does it support specifying a
pattern for input file names, apart from specifying an  input path thru
jobConf.setInputPath(). In my case, logfiles for over a month are stored
in a single folder with date+hour embedded in their names, and I want
mapreduce to run on one day's logs at a time.

TIA
Sandhya