You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Christopher Ng <cn...@gmail.com> on 2013/06/24 11:20:48 UTC
bug in SequenceFile.sync()?
cross-posting this from cdh-users group where it received little interest:
is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
/** Seek to the next sync mark past a given position.*/
public synchronized void sync(long position) throws IOException {
if (position+SYNC_SIZE >= end) {
seek(end);
return;
}
if (position < headerEnd) {
// seek directly to first record
in.seek(headerEnd); <====
should this not call seek (ie this.seek) instead?
// note the sync marker "seen" in the header
syncSeen = true;
return;
}
the problem is that when you sync to the start of a compressed file, the
noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
triggered. When you subsequently call next() you're potentially getting
keys from the buffer which still contains keys from the previous position
of the file.
Re: bug in SequenceFile.sync()?
Posted by Christopher Ng <cn...@gmail.com>.
cool thx. is there an ETA on a fix? or a workaround for the case where i
want to seek to the start of the file?
On Mon, Jun 24, 2013 at 4:39 PM, Colin McCabe <cm...@alumni.cmu.edu>wrote:
> Hi Chris,
>
> Thanks for the report. I filed
> https://issues.apache.org/jira/browse/HADOOP-9667 for this.
>
> Colin
> Software Engineer, Cloudera
>
>
> On Mon, Jun 24, 2013 at 2:20 AM, Christopher Ng <cn...@gmail.com> wrote:
> > cross-posting this from cdh-users group where it received little
> interest:
> >
> > is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
> >
> > /** Seek to the next sync mark past a given position.*/
> > public synchronized void sync(long position) throws IOException {
> > if (position+SYNC_SIZE >= end) {
> > seek(end);
> > return;
> > }
> >
> > if (position < headerEnd) {
> > // seek directly to first record
> > in.seek(headerEnd); <====
> > should this not call seek (ie this.seek) instead?
> > // note the sync marker "seen" in the header
> > syncSeen = true;
> > return;
> > }
> >
> > the problem is that when you sync to the start of a compressed file, the
> > noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> > triggered. When you subsequently call next() you're potentially getting
> > keys from the buffer which still contains keys from the previous position
> > of the file.
>
Re: bug in SequenceFile.sync()?
Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Hi Chris,
Thanks for the report. I filed
https://issues.apache.org/jira/browse/HADOOP-9667 for this.
Colin
Software Engineer, Cloudera
On Mon, Jun 24, 2013 at 2:20 AM, Christopher Ng <cn...@gmail.com> wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
>
> /** Seek to the next sync mark past a given position.*/
> public synchronized void sync(long position) throws IOException {
> if (position+SYNC_SIZE >= end) {
> seek(end);
> return;
> }
>
> if (position < headerEnd) {
> // seek directly to first record
> in.seek(headerEnd); <====
> should this not call seek (ie this.seek) instead?
> // note the sync marker "seen" in the header
> syncSeen = true;
> return;
> }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered. When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
Re: bug in SequenceFile.sync()?
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Christopher,
indeed, I think that the noBufferedKeys and valuesDecompressed should be
reset.
Regards
JB
On 06/24/2013 11:20 AM, Christopher Ng wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
>
> /** Seek to the next sync mark past a given position.*/
> public synchronized void sync(long position) throws IOException {
> if (position+SYNC_SIZE >= end) {
> seek(end);
> return;
> }
>
> if (position < headerEnd) {
> // seek directly to first record
> in.seek(headerEnd); <====
> should this not call seek (ie this.seek) instead?
> // note the sync marker "seen" in the header
> syncSeen = true;
> return;
> }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered. When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
>
--
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com