You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kayla Jay <ka...@yahoo.com> on 2008/07/08 19:48:30 UTC

HDFS files

Hi

I am using code for a reader that must pass in a filename in order to create a FileInputStream instance that uses the getChannel to read the file.  I have to use FileInputStream because it is processing image files and it's faster than InputStream.

I can run this code locally, but when I move my job to use on files that are in HDFS, it fails.  I'm assuming because the file is within HDFS, it's not being recognized by the JVM to process when it takes the filename in the FileInputStream.  I get a nullPointerException whenever it gets the FileInputStream.

In my custom reader, I have

Reader(config job, filesplit split)
{

FileSystem fs = file.getFileSystem(job);
imagereader = new ImageReader(split.getPath().toString());

....

}

When I print out split.getPath.toString, it produces the exact file location on the HDFS that I want:

hdfs://myip:port//usr/hadoop/Image1

Is there a way to be able to pass in a filename for FileInputStream of a file that is in HDFS?

I thought since this is wrapped in a job in hadoop, the JVM would take care and know where to find that file to open up regardless.
What's my problem?  Looking at the API for FileInputStream, I have no choice to take in string of the filename/path in order to successfully create the FileInputStream to process with it's getChannel method.  

I was trying to look for a way where I could just create an FSDataInputStream with the open() on the fs.open(split.getPath()) and then pass it the inputstream. But, the API does not allow InputStream to be used for the FileInputStream.


Thanks.



      

Re: HDFS files

Posted by Bryan Duxbury <br...@rapleaf.com>.
Unfortunately you have to use FSDataInputStream, not FileInputStream,  
to interact with HDFS files. Does this image processing code have a  
constructor that accepts InputStreams? If so, just pass that.

-Bryan

On Jul 8, 2008, at 10:48 AM, Kayla Jay wrote:

> Hi
>
> I am using code for a reader that must pass in a filename in order  
> to create a FileInputStream instance that uses the getChannel to  
> read the file.  I have to use FileInputStream because it is  
> processing image files and it's faster than InputStream.
>
> I can run this code locally, but when I move my job to use on files  
> that are in HDFS, it fails.  I'm assuming because the file is  
> within HDFS, it's not being recognized by the JVM to process when  
> it takes the filename in the FileInputStream.  I get a  
> nullPointerException whenever it gets the FileInputStream.
>
> In my custom reader, I have
>
> Reader(config job, filesplit split)
> {
>
> FileSystem fs = file.getFileSystem(job);
> imagereader = new ImageReader(split.getPath().toString());
>
> ....
>
> }
>
> When I print out split.getPath.toString, it produces the exact file  
> location on the HDFS that I want:
>
> hdfs://myip:port//usr/hadoop/Image1
>
> Is there a way to be able to pass in a filename for FileInputStream  
> of a file that is in HDFS?
>
> I thought since this is wrapped in a job in hadoop, the JVM would  
> take care and know where to find that file to open up regardless.
> What's my problem?  Looking at the API for FileInputStream, I have  
> no choice to take in string of the filename/path in order to  
> successfully create the FileInputStream to process with it's  
> getChannel method.
>
> I was trying to look for a way where I could just create an  
> FSDataInputStream with the open() on the fs.open(split.getPath())  
> and then pass it the inputstream. But, the API does not allow  
> InputStream to be used for the FileInputStream.
>
>
> Thanks.
>
>
>