You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dmitriy Ivanov <iv...@rsc.org> on 2013/03/07 04:57:57 UTC

Reading partitioned sequence file from hdfs throws filenotfoundexception

Hello,

I'm using hadoop 1.1.1 and run into unexpected complication with partitioned file. The file itself is the result of map-reduce task.

Here is code I'm using to read the file:

        try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf)) {
                // skipped code.
        }

This is exception:

java.io.FileNotFoundException: File does not exist: /users/ivanovd/1.624be3e5-5932-468d-9ce4-f73078836936.cvsp
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:1975)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1944)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1936)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:731)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
        at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
        at HadoopTask.exportResults(HadoopTask.java:163)

The file itself exists (in partitioned form):
./hadoop fs -ls /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/
-rw-r--r--   3 ivanovd supergroup          0 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_SUCCESS
drwxr-xr-x   - ivanovd supergroup          0 2013-03-06 22:16 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_logs
-rw-r--r--   3 ivanovd supergroup      63301 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000

Also hadoop fs -getmerge works fine.

Did anyone encounter this problem with hdfs SequenceFile.Reader? What am I doing wrong?

Thanks,
/DI


DISCLAIMER:

This communication (including any attachments) is intended for the use of the addressee only and may contain confidential, privileged or copyright material. It may not be relied upon or disclosed to any other person without the consent of the RSC. If you have received it in error, please contact us immediately. Any advice given by the RSC has been carefully formulated but is necessarily based on the information available, and the RSC cannot be held responsible for accuracy or completeness. In this respect, the RSC owes no duty of care and shall not be liable for any resulting damage or loss. The RSC acknowledges that a disclaimer cannot restrict liability at law for personal injury or death arising through a finding of negligence. The RSC does not warrant that its emails or attachments are Virus-free: Please rely on your own screening. The Royal Society of Chemistry is a charity, registered in England and Wales, number 207890 - Registered office: Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF

Re: Reading partitioned sequence file from hdfs throws filenotfoundexception

Posted by Abdelrhman Shettia <as...@hortonworks.com>.
Hi All , 

Try to give the full path for the file such as : 

> /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000

if the job is producing lots of files and there is a need to setup the number of mappers more than one. A file crusher utility may be the best option here to implement between 1st M-R job and 2nd M-R job. 


Thanks 
- Abdelrahman 

On Mar 6, 2013, at 7:57 PM, Dmitriy Ivanov <iv...@rsc.org> wrote:

> Hello,
>  
> I’m using hadoop 1.1.1 and run into unexpected complication with partitioned file. The file itself is the result of map-reduce task.
>  
> Here is code I’m using to read the file:
>  
>         try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf)) {
>                 // skipped code.
>         }
>  
> This is exception:
>  
> java.io.FileNotFoundException: File does not exist: /users/ivanovd/1.624be3e5-5932-468d-9ce4-f73078836936.cvsp
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:1975)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1944)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1936)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:731)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
>         at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
>         at HadoopTask.exportResults(HadoopTask.java:163)
>  
> The file itself exists (in partitioned form):
> ./hadoop fs -ls /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/
> -rw-r--r--   3 ivanovd supergroup          0 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_SUCCESS
> drwxr-xr-x   - ivanovd supergroup          0 2013-03-06 22:16 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_logs
> -rw-r--r--   3 ivanovd supergroup      63301 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000
>  
> Also hadoop fs -getmerge works fine.
>  
> Did anyone encounter this problem with hdfs SequenceFile.Reader? What am I doing wrong?
>  
> Thanks,
> /DI
>  
> 
> DISCLAIMER:
> 
> This communication (including any attachments) is intended for the use of the addressee only and may contain confidential, privileged or copyright material. It may not be relied upon or disclosed to any other person without the consent of the RSC. If you have received it in error, please contact us immediately. Any advice given by the RSC has been carefully formulated but is necessarily based on the information available, and the RSC cannot be held responsible for accuracy or completeness. In this respect, the RSC owes no duty of care and shall not be liable for any resulting damage or loss. The RSC acknowledges that a disclaimer cannot restrict liability at law for personal injury or death arising through a finding of negligence. The RSC does not warrant that its emails or attachments are Virus-free: Please rely on your own screening. The Royal Society of Chemistry is a charity, registered in England and Wales, number 207890 - Registered office: Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF


Re: Reading partitioned sequence file from hdfs throws filenotfoundexception

Posted by Abdelrhman Shettia <as...@hortonworks.com>.
Hi All , 

Try to give the full path for the file such as : 

> /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000

if the job is producing lots of files and there is a need to setup the number of mappers more than one. A file crusher utility may be the best option here to implement between 1st M-R job and 2nd M-R job. 


Thanks 
- Abdelrahman 

On Mar 6, 2013, at 7:57 PM, Dmitriy Ivanov <iv...@rsc.org> wrote:

> Hello,
>  
> I’m using hadoop 1.1.1 and run into unexpected complication with partitioned file. The file itself is the result of map-reduce task.
>  
> Here is code I’m using to read the file:
>  
>         try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf)) {
>                 // skipped code.
>         }
>  
> This is exception:
>  
> java.io.FileNotFoundException: File does not exist: /users/ivanovd/1.624be3e5-5932-468d-9ce4-f73078836936.cvsp
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:1975)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1944)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1936)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:731)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
>         at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
>         at HadoopTask.exportResults(HadoopTask.java:163)
>  
> The file itself exists (in partitioned form):
> ./hadoop fs -ls /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/
> -rw-r--r--   3 ivanovd supergroup          0 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_SUCCESS
> drwxr-xr-x   - ivanovd supergroup          0 2013-03-06 22:16 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_logs
> -rw-r--r--   3 ivanovd supergroup      63301 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000
>  
> Also hadoop fs -getmerge works fine.
>  
> Did anyone encounter this problem with hdfs SequenceFile.Reader? What am I doing wrong?
>  
> Thanks,
> /DI
>  
> 
> DISCLAIMER:
> 
> This communication (including any attachments) is intended for the use of the addressee only and may contain confidential, privileged or copyright material. It may not be relied upon or disclosed to any other person without the consent of the RSC. If you have received it in error, please contact us immediately. Any advice given by the RSC has been carefully formulated but is necessarily based on the information available, and the RSC cannot be held responsible for accuracy or completeness. In this respect, the RSC owes no duty of care and shall not be liable for any resulting damage or loss. The RSC acknowledges that a disclaimer cannot restrict liability at law for personal injury or death arising through a finding of negligence. The RSC does not warrant that its emails or attachments are Virus-free: Please rely on your own screening. The Royal Society of Chemistry is a charity, registered in England and Wales, number 207890 - Registered office: Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF


Re: Reading partitioned sequence file from hdfs throws filenotfoundexception

Posted by Abdelrhman Shettia <as...@hortonworks.com>.
Hi All , 

Try to give the full path for the file such as : 

> /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000

if the job is producing lots of files and there is a need to setup the number of mappers more than one. A file crusher utility may be the best option here to implement between 1st M-R job and 2nd M-R job. 


Thanks 
- Abdelrahman 

On Mar 6, 2013, at 7:57 PM, Dmitriy Ivanov <iv...@rsc.org> wrote:

> Hello,
>  
> I’m using hadoop 1.1.1 and run into unexpected complication with partitioned file. The file itself is the result of map-reduce task.
>  
> Here is code I’m using to read the file:
>  
>         try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf)) {
>                 // skipped code.
>         }
>  
> This is exception:
>  
> java.io.FileNotFoundException: File does not exist: /users/ivanovd/1.624be3e5-5932-468d-9ce4-f73078836936.cvsp
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:1975)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1944)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1936)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:731)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
>         at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
>         at HadoopTask.exportResults(HadoopTask.java:163)
>  
> The file itself exists (in partitioned form):
> ./hadoop fs -ls /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/
> -rw-r--r--   3 ivanovd supergroup          0 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_SUCCESS
> drwxr-xr-x   - ivanovd supergroup          0 2013-03-06 22:16 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_logs
> -rw-r--r--   3 ivanovd supergroup      63301 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000
>  
> Also hadoop fs -getmerge works fine.
>  
> Did anyone encounter this problem with hdfs SequenceFile.Reader? What am I doing wrong?
>  
> Thanks,
> /DI
>  
> 
> DISCLAIMER:
> 
> This communication (including any attachments) is intended for the use of the addressee only and may contain confidential, privileged or copyright material. It may not be relied upon or disclosed to any other person without the consent of the RSC. If you have received it in error, please contact us immediately. Any advice given by the RSC has been carefully formulated but is necessarily based on the information available, and the RSC cannot be held responsible for accuracy or completeness. In this respect, the RSC owes no duty of care and shall not be liable for any resulting damage or loss. The RSC acknowledges that a disclaimer cannot restrict liability at law for personal injury or death arising through a finding of negligence. The RSC does not warrant that its emails or attachments are Virus-free: Please rely on your own screening. The Royal Society of Chemistry is a charity, registered in England and Wales, number 207890 - Registered office: Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF


Re: Reading partitioned sequence file from hdfs throws filenotfoundexception

Posted by Abdelrhman Shettia <as...@hortonworks.com>.
Hi All , 

Try to give the full path for the file such as : 

> /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000

if the job is producing lots of files and there is a need to setup the number of mappers more than one. A file crusher utility may be the best option here to implement between 1st M-R job and 2nd M-R job. 


Thanks 
- Abdelrahman 

On Mar 6, 2013, at 7:57 PM, Dmitriy Ivanov <iv...@rsc.org> wrote:

> Hello,
>  
> I’m using hadoop 1.1.1 and run into unexpected complication with partitioned file. The file itself is the result of map-reduce task.
>  
> Here is code I’m using to read the file:
>  
>         try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf)) {
>                 // skipped code.
>         }
>  
> This is exception:
>  
> java.io.FileNotFoundException: File does not exist: /users/ivanovd/1.624be3e5-5932-468d-9ce4-f73078836936.cvsp
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:1975)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1944)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1936)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:731)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
>         at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
>         at HadoopTask.exportResults(HadoopTask.java:163)
>  
> The file itself exists (in partitioned form):
> ./hadoop fs -ls /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/
> -rw-r--r--   3 ivanovd supergroup          0 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_SUCCESS
> drwxr-xr-x   - ivanovd supergroup          0 2013-03-06 22:16 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/_logs
> -rw-r--r--   3 ivanovd supergroup      63301 2013-03-06 22:17 /users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-00000
>  
> Also hadoop fs -getmerge works fine.
>  
> Did anyone encounter this problem with hdfs SequenceFile.Reader? What am I doing wrong?
>  
> Thanks,
> /DI
>  
> 
> DISCLAIMER:
> 
> This communication (including any attachments) is intended for the use of the addressee only and may contain confidential, privileged or copyright material. It may not be relied upon or disclosed to any other person without the consent of the RSC. If you have received it in error, please contact us immediately. Any advice given by the RSC has been carefully formulated but is necessarily based on the information available, and the RSC cannot be held responsible for accuracy or completeness. In this respect, the RSC owes no duty of care and shall not be liable for any resulting damage or loss. The RSC acknowledges that a disclaimer cannot restrict liability at law for personal injury or death arising through a finding of negligence. The RSC does not warrant that its emails or attachments are Virus-free: Please rely on your own screening. The Royal Society of Chemistry is a charity, registered in England and Wales, number 207890 - Registered office: Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF