You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Chandan Tamrakar <ch...@nepasoft.com> on 2009/10/13 14:56:42 UTC

Reading files from local file system


We are trying to read files from local file system. But when running the map
reduce it is not able to read files from the input location (the input
location is also local file system location).

For this we changed the configuration of the hadoop-site.xml as shown below:

/etc/conf/hadoop/hadoop-site.xml

<property>
    <name>fs.default.name</name>
    <value>file:///</value>
  </property>


 [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt output1

Suppose Test.txt is pain text file that contains
Test1
Test2
Test3


While running simple MapReduce job we get following exception  "File not
found exception " , we are using TextInputFormat in our Job configuration 


09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/10/13 17:26:37 INFO mapred.JobClient: Running job: job_200910131447_0033
09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
attempt_200910131447_0033_m_000000_0, Status : FAILED
java.io.FileNotFoundException: File file:/home/admin/Desktop/input/test.txt
does not exist.
	at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
a:420)
	at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
9)
	at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
umFileSystem.java:117)
	at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
	at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
	at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
a:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
	at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

However, running in the code as a separate Main method does work well.

public static void main (String [] args) throws IOException {

     Configuration conf = new Configuration();
     FileSystem fs = FileSystem.get(conf);

     Path filenamePath = new Path(theFilename);
     FSDataOutputStream out = fs.create(new Path("abc.txt"));
     out.writeUTF("abc");
     out.close();

}

The above code works fine when running it as a jar in hadoop. The above code
successfully creates file in /home/admin/abc.txt when running from admin
user.


Re: Reading files from local file system

Posted by Chandan Tamrakar <ch...@nepasoft.com>.
Joson , i think it worked after changing these two paramters in
hadoop-site.xml

*  1. fs.default.name*
      used local file system as default file:///
  *2. **mapred.job.tracker
*      previously were using hdfs location and removed now*

   *Thanks
chandan
*
*
On Wed, Oct 14, 2009 at 11:49 AM, Jason Venner <ja...@gmail.com>wrote:

> If you want to open a local file in hadoop you have 3 simple ways
>
> 1: use file:///path
> 2: get a LocalFileSystem object from the FileSystem
> /**
>   * Get the local file syste
>   * @param conf the configuration to configure the file system with
>   * @return a LocalFileSystem
>   */
>  public static LocalFileSystem getLocal(Configuration conf)
>    throws IOException {
>    return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
>  }
>
> 3: use the java.io File* classes.
>
> On Tue, Oct 13, 2009 at 9:05 AM, Chandan Tamrakar <
> chandan.tamrakar@nepasoft.com> wrote:
>
> > Do I need to change any configuration beside changing the default file
> > system to "local file system' ?
> > I am trying to input for example  input.txt to map job
> >
> > input.txt will contain file location as following
> >
> > file://path/abc1.doc
> > file://path/abc2.doc
> > ..
> > ...
> >
> > map program will read each line from input.txt and process them
> >
> > Do i need to change any configuration ? This is similar to how Nutch
> crawls
> > .
> >
> > any feedbacks would be appreciated
> >
> > thanks
> >
> >
> >
> > On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <zj...@gmail.com> wrote:
> >
> > > Maybe you could debug your mapreduce job in eclipse, since you run it
> in
> > > local mode.
> > >
> > >
> > >
> > > On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
> > > chandan.tamrakar@nepasoft.com> wrote:
> > >
> > > >
> > > >
> > > > We are trying to read files from local file system. But when running
> > the
> > > > map
> > > > reduce it is not able to read files from the input location (the
> input
> > > > location is also local file system location).
> > > >
> > > > For this we changed the configuration of the hadoop-site.xml as shown
> > > > below:
> > > >
> > > > /etc/conf/hadoop/hadoop-site.xml
> > > >
> > > > <property>
> > > >    <name>fs.default.name</name>
> > > >    <value>file:///</value>
> > > >  </property>
> > > >
> > > >
> > > >  [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> > > > output1
> > > >
> > > > Suppose Test.txt is pain text file that contains
> > > > Test1
> > > > Test2
> > > > Test3
> > > >
> > > >
> > > > While running simple MapReduce job we get following exception  "File
> > not
> > > > found exception " , we are using TextInputFormat in our Job
> > configuration
> > > >
> > > >
> > > > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> > > > parsing the arguments. Applications should implement Tool for the
> same.
> > > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > > process
> > > > : 1
> > > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > > process
> > > > : 1
> > > > 09/10/13 17:26:37 INFO mapred.JobClient: Running job:
> > > job_200910131447_0033
> > > > 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> > > > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> > > > attempt_200910131447_0033_m_000000_0, Status : FAILED
> > > > java.io.FileNotFoundException: File
> > > file:/home/admin/Desktop/input/test.txt
> > > > does not exist.
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> > > > a:420)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> > > > 9)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> > > > umFileSystem.java:117)
> > > >        at
> > > >
> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
> > > >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> > > > a:50)
> > > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> > > >        at
> > > >
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> > > >
> > > > However, running in the code as a separate Main method does work
> well.
> > > >
> > > > public static void main (String [] args) throws IOException {
> > > >
> > > >     Configuration conf = new Configuration();
> > > >     FileSystem fs = FileSystem.get(conf);
> > > >
> > > >     Path filenamePath = new Path(theFilename);
> > > >     FSDataOutputStream out = fs.create(new Path("abc.txt"));
> > > >     out.writeUTF("abc");
> > > >     out.close();
> > > >
> > > > }
> > > >
> > > > The above code works fine when running it as a jar in hadoop. The
> above
> > > > code
> > > > successfully creates file in /home/admin/abc.txt when running from
> > admin
> > > > user.
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Chandan Tamrakar
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Chandan Tamrakar

Re: Reading files from local file system

Posted by Jason Venner <ja...@gmail.com>.
If you want to open a local file in hadoop you have 3 simple ways

1: use file:///path
2: get a LocalFileSystem object from the FileSystem
/**
   * Get the local file syste
   * @param conf the configuration to configure the file system with
   * @return a LocalFileSystem
   */
  public static LocalFileSystem getLocal(Configuration conf)
    throws IOException {
    return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
  }

3: use the java.io File* classes.

On Tue, Oct 13, 2009 at 9:05 AM, Chandan Tamrakar <
chandan.tamrakar@nepasoft.com> wrote:

> Do I need to change any configuration beside changing the default file
> system to "local file system' ?
> I am trying to input for example  input.txt to map job
>
> input.txt will contain file location as following
>
> file://path/abc1.doc
> file://path/abc2.doc
> ..
> ...
>
> map program will read each line from input.txt and process them
>
> Do i need to change any configuration ? This is similar to how Nutch crawls
> .
>
> any feedbacks would be appreciated
>
> thanks
>
>
>
> On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Maybe you could debug your mapreduce job in eclipse, since you run it in
> > local mode.
> >
> >
> >
> > On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
> > chandan.tamrakar@nepasoft.com> wrote:
> >
> > >
> > >
> > > We are trying to read files from local file system. But when running
> the
> > > map
> > > reduce it is not able to read files from the input location (the input
> > > location is also local file system location).
> > >
> > > For this we changed the configuration of the hadoop-site.xml as shown
> > > below:
> > >
> > > /etc/conf/hadoop/hadoop-site.xml
> > >
> > > <property>
> > >    <name>fs.default.name</name>
> > >    <value>file:///</value>
> > >  </property>
> > >
> > >
> > >  [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> > > output1
> > >
> > > Suppose Test.txt is pain text file that contains
> > > Test1
> > > Test2
> > > Test3
> > >
> > >
> > > While running simple MapReduce job we get following exception  "File
> not
> > > found exception " , we are using TextInputFormat in our Job
> configuration
> > >
> > >
> > > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> > > parsing the arguments. Applications should implement Tool for the same.
> > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 1
> > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 1
> > > 09/10/13 17:26:37 INFO mapred.JobClient: Running job:
> > job_200910131447_0033
> > > 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> > > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> > > attempt_200910131447_0033_m_000000_0, Status : FAILED
> > > java.io.FileNotFoundException: File
> > file:/home/admin/Desktop/input/test.txt
> > > does not exist.
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> > > a:420)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> > > 9)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> > > umFileSystem.java:117)
> > >        at
> > >
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
> > >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
> > >        at
> > >
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> > > a:50)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> > >        at
> > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> > >
> > > However, running in the code as a separate Main method does work well.
> > >
> > > public static void main (String [] args) throws IOException {
> > >
> > >     Configuration conf = new Configuration();
> > >     FileSystem fs = FileSystem.get(conf);
> > >
> > >     Path filenamePath = new Path(theFilename);
> > >     FSDataOutputStream out = fs.create(new Path("abc.txt"));
> > >     out.writeUTF("abc");
> > >     out.close();
> > >
> > > }
> > >
> > > The above code works fine when running it as a jar in hadoop. The above
> > > code
> > > successfully creates file in /home/admin/abc.txt when running from
> admin
> > > user.
> > >
> > >
> >
>
>
>
> --
> Chandan Tamrakar
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Reading files from local file system

Posted by Chandan Tamrakar <ch...@nepasoft.com>.
Do I need to change any configuration beside changing the default file
system to "local file system' ?
I am trying to input for example  input.txt to map job

input.txt will contain file location as following

file://path/abc1.doc
file://path/abc2.doc
..
...

map program will read each line from input.txt and process them

Do i need to change any configuration ? This is similar to how Nutch crawls
.

any feedbacks would be appreciated

thanks



On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Maybe you could debug your mapreduce job in eclipse, since you run it in
> local mode.
>
>
>
> On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
> chandan.tamrakar@nepasoft.com> wrote:
>
> >
> >
> > We are trying to read files from local file system. But when running the
> > map
> > reduce it is not able to read files from the input location (the input
> > location is also local file system location).
> >
> > For this we changed the configuration of the hadoop-site.xml as shown
> > below:
> >
> > /etc/conf/hadoop/hadoop-site.xml
> >
> > <property>
> >    <name>fs.default.name</name>
> >    <value>file:///</value>
> >  </property>
> >
> >
> >  [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> > output1
> >
> > Suppose Test.txt is pain text file that contains
> > Test1
> > Test2
> > Test3
> >
> >
> > While running simple MapReduce job we get following exception  "File not
> > found exception " , we are using TextInputFormat in our Job configuration
> >
> >
> > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 09/10/13 17:26:37 INFO mapred.JobClient: Running job:
> job_200910131447_0033
> > 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> > attempt_200910131447_0033_m_000000_0, Status : FAILED
> > java.io.FileNotFoundException: File
> file:/home/admin/Desktop/input/test.txt
> > does not exist.
> >        at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> > a:420)
> >        at
> >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> > 9)
> >        at
> >
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> > umFileSystem.java:117)
> >        at
> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
> >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
> >        at
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
> >        at
> >
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> > a:50)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> >        at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> >
> > However, running in the code as a separate Main method does work well.
> >
> > public static void main (String [] args) throws IOException {
> >
> >     Configuration conf = new Configuration();
> >     FileSystem fs = FileSystem.get(conf);
> >
> >     Path filenamePath = new Path(theFilename);
> >     FSDataOutputStream out = fs.create(new Path("abc.txt"));
> >     out.writeUTF("abc");
> >     out.close();
> >
> > }
> >
> > The above code works fine when running it as a jar in hadoop. The above
> > code
> > successfully creates file in /home/admin/abc.txt when running from admin
> > user.
> >
> >
>



-- 
Chandan Tamrakar

Re: Reading files from local file system

Posted by Jeff Zhang <zj...@gmail.com>.
Maybe you could debug your mapreduce job in eclipse, since you run it in
local mode.



On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
chandan.tamrakar@nepasoft.com> wrote:

>
>
> We are trying to read files from local file system. But when running the
> map
> reduce it is not able to read files from the input location (the input
> location is also local file system location).
>
> For this we changed the configuration of the hadoop-site.xml as shown
> below:
>
> /etc/conf/hadoop/hadoop-site.xml
>
> <property>
>    <name>fs.default.name</name>
>    <value>file:///</value>
>  </property>
>
>
>  [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> output1
>
> Suppose Test.txt is pain text file that contains
> Test1
> Test2
> Test3
>
>
> While running simple MapReduce job we get following exception  "File not
> found exception " , we are using TextInputFormat in our Job configuration
>
>
> 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 09/10/13 17:26:37 INFO mapred.JobClient: Running job: job_200910131447_0033
> 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> attempt_200910131447_0033_m_000000_0, Status : FAILED
> java.io.FileNotFoundException: File file:/home/admin/Desktop/input/test.txt
> does not exist.
>        at
>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> a:420)
>        at
>
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> 9)
>        at
>
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> umFileSystem.java:117)
>        at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
>        at
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
>        at
>
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> a:50)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>        at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
>
> However, running in the code as a separate Main method does work well.
>
> public static void main (String [] args) throws IOException {
>
>     Configuration conf = new Configuration();
>     FileSystem fs = FileSystem.get(conf);
>
>     Path filenamePath = new Path(theFilename);
>     FSDataOutputStream out = fs.create(new Path("abc.txt"));
>     out.writeUTF("abc");
>     out.close();
>
> }
>
> The above code works fine when running it as a jar in hadoop. The above
> code
> successfully creates file in /home/admin/abc.txt when running from admin
> user.
>
>