You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by JOAQUIN GUANTER GONZALBEZ <xi...@tid.es> on 2012/08/31 10:11:16 UTC

Reading a directory in standalone Hadoop

Hello Hadoopers,

I am trying to write a test that runs some MR jobs. One of these jobs needs to read a file produced by another job with a SequenceFile.Reader class. The job that produces this file has its output path set to "/folder/timezone". After running in my standalone Hadoop environment, it produces 4 files in this folder:


-          .SUCCESS.crc

-          .part-r-00000.crc

-          _SUCCESS

-          part-r-00000

So far, so good. The problem is that when the next job tries to read "/folder/timezone" with a SequenceFile.Reader class, it gets a FileNotFoundException, presumably because it is trying to open the directory as a "file" in my local filesystem. Here's the stacktrace I get:

java.io.FileNotFoundException: /folder/timezone (Acess denied)
                at java.io.FileInputStream.open(Native Method)
                at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
                at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:72)
                at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:108)
                at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:178)
                at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.&lt;init&gt;(ChecksumFileSystem.java:127)
                at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
                at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1508)
                at org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1487)
                at org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1480)
                at org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1475)
                at es.tid.smartsteps.footfalls.lookups.CartesianConverterMapperBase.setup(CartesianConverterMapperBase.java:41)
                at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
                at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
                at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

Any idea on how to solve this issue? Is this a supported scenario by SequenceFile.Reader or am I out of luck?

Many thanks!
Ximo.

________________________________
Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra pol?tica de env?o y recepci?n de correo electr?nico en el enlace situado m?s abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Reading a directory in standalone Hadoop

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The stack trace mentions that it is getting an access denied. Could
you check the permissions of the directory /folder/timezone ? Also,
are you using the local job tracker, and not a cluster ?

In general, please ensure your configuration is pointing to the right
cluster where the job needs to run when launching the mapred job - be
it local, or a cluster set up in pseudo distributed mode or a real
cluster.

Thanks
Hemanth

On Fri, Aug 31, 2012 at 1:41 PM, JOAQUIN GUANTER GONZALBEZ <xi...@tid.es> wrote:
> Hello Hadoopers,
>
>
>
> I am trying to write a test that runs some MR jobs. One of these jobs needs
> to read a file produced by another job with a SequenceFile.Reader class. The
> job that produces this file has its output path set to “/folder/timezone”.
> After running in my standalone Hadoop environment, it produces 4 files in
> this folder:
>
>
>
> -          .SUCCESS.crc
>
> -          .part-r-00000.crc
>
> -          _SUCCESS
>
> -          part-r-00000
>
>
>
> So far, so good. The problem is that when the next job tries to read
> “/folder/timezone” with a SequenceFile.Reader class, it gets a
> FileNotFoundException, presumably because it is trying to open the directory
> as a “file” in my local filesystem. Here’s the stacktrace I get:
>
>
>
> java.io.FileNotFoundException: /folder/timezone (Acess denied)
>
>                 at java.io.FileInputStream.open(Native Method)
>
>                 at
> java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:72)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:108)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:178)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.&lt;init&gt;(ChecksumFileSystem.java:127)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1508)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1487)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1480)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1475)
>
>                 at
> es.tid.smartsteps.footfalls.lookups.CartesianConverterMapperBase.setup(CartesianConverterMapperBase.java:41)
>
>                 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>                 at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>
>                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>
>                 at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> Any idea on how to solve this issue? Is this a supported scenario by
> SequenceFile.Reader or am I out of luck?
>
>
>
> Many thanks!
>
> Ximo.
>
>
> ________________________________
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Reading a directory in standalone Hadoop

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The stack trace mentions that it is getting an access denied. Could
you check the permissions of the directory /folder/timezone ? Also,
are you using the local job tracker, and not a cluster ?

In general, please ensure your configuration is pointing to the right
cluster where the job needs to run when launching the mapred job - be
it local, or a cluster set up in pseudo distributed mode or a real
cluster.

Thanks
Hemanth

On Fri, Aug 31, 2012 at 1:41 PM, JOAQUIN GUANTER GONZALBEZ <xi...@tid.es> wrote:
> Hello Hadoopers,
>
>
>
> I am trying to write a test that runs some MR jobs. One of these jobs needs
> to read a file produced by another job with a SequenceFile.Reader class. The
> job that produces this file has its output path set to “/folder/timezone”.
> After running in my standalone Hadoop environment, it produces 4 files in
> this folder:
>
>
>
> -          .SUCCESS.crc
>
> -          .part-r-00000.crc
>
> -          _SUCCESS
>
> -          part-r-00000
>
>
>
> So far, so good. The problem is that when the next job tries to read
> “/folder/timezone” with a SequenceFile.Reader class, it gets a
> FileNotFoundException, presumably because it is trying to open the directory
> as a “file” in my local filesystem. Here’s the stacktrace I get:
>
>
>
> java.io.FileNotFoundException: /folder/timezone (Acess denied)
>
>                 at java.io.FileInputStream.open(Native Method)
>
>                 at
> java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:72)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:108)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:178)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.&lt;init&gt;(ChecksumFileSystem.java:127)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1508)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1487)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1480)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1475)
>
>                 at
> es.tid.smartsteps.footfalls.lookups.CartesianConverterMapperBase.setup(CartesianConverterMapperBase.java:41)
>
>                 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>                 at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>
>                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>
>                 at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> Any idea on how to solve this issue? Is this a supported scenario by
> SequenceFile.Reader or am I out of luck?
>
>
>
> Many thanks!
>
> Ximo.
>
>
> ________________________________
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Reading a directory in standalone Hadoop

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The stack trace mentions that it is getting an access denied. Could
you check the permissions of the directory /folder/timezone ? Also,
are you using the local job tracker, and not a cluster ?

In general, please ensure your configuration is pointing to the right
cluster where the job needs to run when launching the mapred job - be
it local, or a cluster set up in pseudo distributed mode or a real
cluster.

Thanks
Hemanth

On Fri, Aug 31, 2012 at 1:41 PM, JOAQUIN GUANTER GONZALBEZ <xi...@tid.es> wrote:
> Hello Hadoopers,
>
>
>
> I am trying to write a test that runs some MR jobs. One of these jobs needs
> to read a file produced by another job with a SequenceFile.Reader class. The
> job that produces this file has its output path set to “/folder/timezone”.
> After running in my standalone Hadoop environment, it produces 4 files in
> this folder:
>
>
>
> -          .SUCCESS.crc
>
> -          .part-r-00000.crc
>
> -          _SUCCESS
>
> -          part-r-00000
>
>
>
> So far, so good. The problem is that when the next job tries to read
> “/folder/timezone” with a SequenceFile.Reader class, it gets a
> FileNotFoundException, presumably because it is trying to open the directory
> as a “file” in my local filesystem. Here’s the stacktrace I get:
>
>
>
> java.io.FileNotFoundException: /folder/timezone (Acess denied)
>
>                 at java.io.FileInputStream.open(Native Method)
>
>                 at
> java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:72)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:108)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:178)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.&lt;init&gt;(ChecksumFileSystem.java:127)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1508)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1487)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1480)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1475)
>
>                 at
> es.tid.smartsteps.footfalls.lookups.CartesianConverterMapperBase.setup(CartesianConverterMapperBase.java:41)
>
>                 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>                 at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>
>                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>
>                 at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> Any idea on how to solve this issue? Is this a supported scenario by
> SequenceFile.Reader or am I out of luck?
>
>
>
> Many thanks!
>
> Ximo.
>
>
> ________________________________
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: Reading a directory in standalone Hadoop

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The stack trace mentions that it is getting an access denied. Could
you check the permissions of the directory /folder/timezone ? Also,
are you using the local job tracker, and not a cluster ?

In general, please ensure your configuration is pointing to the right
cluster where the job needs to run when launching the mapred job - be
it local, or a cluster set up in pseudo distributed mode or a real
cluster.

Thanks
Hemanth

On Fri, Aug 31, 2012 at 1:41 PM, JOAQUIN GUANTER GONZALBEZ <xi...@tid.es> wrote:
> Hello Hadoopers,
>
>
>
> I am trying to write a test that runs some MR jobs. One of these jobs needs
> to read a file produced by another job with a SequenceFile.Reader class. The
> job that produces this file has its output path set to “/folder/timezone”.
> After running in my standalone Hadoop environment, it produces 4 files in
> this folder:
>
>
>
> -          .SUCCESS.crc
>
> -          .part-r-00000.crc
>
> -          _SUCCESS
>
> -          part-r-00000
>
>
>
> So far, so good. The problem is that when the next job tries to read
> “/folder/timezone” with a SequenceFile.Reader class, it gets a
> FileNotFoundException, presumably because it is trying to open the directory
> as a “file” in my local filesystem. Here’s the stacktrace I get:
>
>
>
> java.io.FileNotFoundException: /folder/timezone (Acess denied)
>
>                 at java.io.FileInputStream.open(Native Method)
>
>                 at
> java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:72)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.&lt;init&gt;(RawLocalFileSystem.java:108)
>
>                 at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:178)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.&lt;init&gt;(ChecksumFileSystem.java:127)
>
>                 at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1508)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1487)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1480)
>
>                 at
> org.apache.hadoop.io.SequenceFile$Reader.&lt;init&gt;(SequenceFile.java:1475)
>
>                 at
> es.tid.smartsteps.footfalls.lookups.CartesianConverterMapperBase.setup(CartesianConverterMapperBase.java:41)
>
>                 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>                 at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>
>                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>
>                 at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> Any idea on how to solve this issue? Is this a supported scenario by
> SequenceFile.Reader or am I out of luck?
>
>
>
> Many thanks!
>
> Ximo.
>
>
> ________________________________
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx