You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Guillaume Perrot <gp...@ubikod.com> on 2013/01/17 12:36:34 UTC

Read sequence files that are being written

Hi everyone,

I am using Hadoop 1.0.3.

I write logs to an Hadoop sequence file into HDFS, I call syncFS() after
each bunch of logs but I never close the file (except when I am performing
daily rolling).

What I want to guarantee is that the file is available to readers while the
file is still being written.

I can read the bytes of the sequence file via FSDataInputStream, but if I
try to use SequenceFile.Reader.next(key,val), it returns false at the first
call.

I know the data is in the file since I can read it with FSDataInputStream
or with the cat command and I am 100% sure that syncFS() is called.

I checked the namenode and datanode logs, no error or warning. fsck shows
no corruption.

Why SequenceFile.Reader is unable to read my currently being written file ?