You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Venkat <vr...@ymail.com> on 2013/02/20 21:23:30 UTC
Avro file - Seek to specific offset and read
Hi All,
Using DataFileReader, I'm trying to read data from a specific [start-offset] to an [end-offset]. Both the start and end offsets are marked with synchronization markers using DataFileWriter.sync()
The following is the snipped I use to read the data back:
DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(input, reader);
fileReader.seek(startOffset); // set to the start-offset
while(fileReader.hasNext() && !fileReader.pastSync(endOffset))
{
GenericRecord gr = fileReader.next();
}
This, however, reads & returns more records than what I wrote between the two offsets.
Appreciate your help regarding this.
Thanks
Re: Avro file - Seek to specific offset and read
Posted by Venkat <vr...@ymail.com>.
Hi Doug,
Adjusting the start and end offsets (returned by DataFileWriter.sync()) back by 16 bytes (DataFileConstants.SYNC_SIZE) fixed the issue.
This assumption is based on looking at the DataFileReader.pastSync() implementation.
97 public boolean pastSync(long position) throws IOException {
98 return ((blockStart >= position+SYNC_SIZE)||(blockStart >= sin.length()));
99 }
Let me know if this assumption is correct.
Thanks
Venkat
________________________________
From: Doug Cutting <cu...@apache.org>
To: user@avro.apache.org; Venkat <vr...@ymail.com>
Sent: Friday, February 22, 2013 12:01 PM
Subject: Re: Avro file - Seek to specific offset and read
Venkat,
That should work. It's hard for me to guess what's going wrong,
whether there's a bug in Avro, in your program, or perhaps just
unclear documentation. Could you post a complete program that
demonstrates the issue?
Thanks,
Doug
On Wed, Feb 20, 2013 at 12:23 PM, Venkat <vr...@ymail.com> wrote:
> Hi All,
>
> Using DataFileReader, I'm trying to read data from a specific [start-offset]
> to an [end-offset]. Both the start and end offsets are marked with
> synchronization markers using DataFileWriter.sync()
>
> The following is the snipped I use to read the data back:
>
> DataFileReader<GenericRecord> fileReader = new
> DataFileReader<GenericRecord>(input, reader);
> fileReader.seek(startOffset); // set to the start-offset
> while(fileReader.hasNext() && !fileReader.pastSync(endOffset))
> {
> GenericRecord gr = fileReader.next();
> }
>
> This, however, reads & returns more records than what I wrote between the
> two offsets.
>
> Appreciate your help regarding this.
>
> Thanks
>
Re: Avro file - Seek to specific offset and read
Posted by Doug Cutting <cu...@apache.org>.
Venkat,
That should work. It's hard for me to guess what's going wrong,
whether there's a bug in Avro, in your program, or perhaps just
unclear documentation. Could you post a complete program that
demonstrates the issue?
Thanks,
Doug
On Wed, Feb 20, 2013 at 12:23 PM, Venkat <vr...@ymail.com> wrote:
> Hi All,
>
> Using DataFileReader, I'm trying to read data from a specific [start-offset]
> to an [end-offset]. Both the start and end offsets are marked with
> synchronization markers using DataFileWriter.sync()
>
> The following is the snipped I use to read the data back:
>
> DataFileReader<GenericRecord> fileReader = new
> DataFileReader<GenericRecord>(input, reader);
> fileReader.seek(startOffset); // set to the start-offset
> while(fileReader.hasNext() && !fileReader.pastSync(endOffset))
> {
> GenericRecord gr = fileReader.next();
> }
>
> This, however, reads & returns more records than what I wrote between the
> two offsets.
>
> Appreciate your help regarding this.
>
> Thanks
>