You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bin YANG <ya...@gmail.com> on 2007/12/05 11:25:35 UTC

Dose hadoop just provide reading by line

hi colleague,

Does hadoop just provide reading the files from line to line?
How can I read many lines from a file?

thanks

-- 
Bin YANG
Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com

Re: Dose hadoop just provide reading by line

Posted by Fabrice Medio <fm...@discoverymining.com>.
Are you talking about reading arbitrary files from HDFS?
You can just get a regular InputStream to a Path:

JobConf conf = new JobConf(SomeJob.class);
FileSystem hdfs = FileSystem.get(conf);
FSDataInputStream inputStream = hdfs.open(new Path("/my/path"));
BufferedReader reader = new BufferedReader(new 
InputStreamReader(inputStream));
String s = reader.readLine();

Fabrice

Bin YANG wrote:
> hi colleague,
> 
> Does hadoop just provide reading the files from line to line?
> How can I read many lines from a file?
> 
> thanks
> 


Re: Dose hadoop just provide reading by line

Posted by André Martin <ma...@andremartin.de>.
Hi Bin YANG,
it can read as many lines/bytes as you want - but you need to implement 
your own RecordReader 
(http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/RecordReader.html) 
for this (and InputFormat in order to make it usable for your jobs...)
Just take a look at the LineRecordReader as a starting point on how to 
implement a RecordReader.

Cu on the 'net,
                        Bye - bye,

                                   <<<<< André <<<< >>>> èrbnA >>>>>

...you wrote:
> hi colleague,
>
> Does hadoop just provide reading the files from line to line?
> How can I read many lines from a file?
>
> thanks