You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2009/02/06 06:41:35 UTC
can't read the SequenceFile correctly
Hi,
I have written binary files to a SequenceFile, seemeingly successfully, but
when I read them back with the code below, after a first few reads I get the
same number of bytes for the different files. What could go wrong?
Thank you,
Mark
reader = new SequenceFile.Reader(fs, path, conf);
Writable key = (Writable)
ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)
ReflectionUtils.newInstance(reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? "*" : "";
byte [] fileBytes = ((BytesWritable) value).getBytes();
System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
key, fileBytes.length);
position = reader.getPosition(); // beginning of next record
}
Re: can't read the SequenceFile correctly
Posted by Mark Kerzner <ma...@gmail.com>.
Indeed, this was the answer!
Thank you,
Mark
On Fri, Feb 6, 2009 at 4:25 AM, Tom White <to...@cloudera.com> wrote:
> Hi Mark,
>
> Not all the bytes stored in a BytesWritable object are necessarily
> valid. Use BytesWritable#getLength() to determine how much of the
> buffer returned by BytesWritable#getBytes() to use.
>
> Tom
>
> On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com>
> wrote:
> > Hi,
> >
> > I have written binary files to a SequenceFile, seemeingly successfully,
> but
> > when I read them back with the code below, after a first few reads I get
> the
> > same number of bytes for the different files. What could go wrong?
> >
> > Thank you,
> > Mark
> >
> > reader = new SequenceFile.Reader(fs, path, conf);
> > Writable key = (Writable)
> > ReflectionUtils.newInstance(reader.getKeyClass(), conf);
> > Writable value = (Writable)
> > ReflectionUtils.newInstance(reader.getValueClass(), conf);
> > long position = reader.getPosition();
> > while (reader.next(key, value)) {
> > String syncSeen = reader.syncSeen() ? "*" : "";
> > byte [] fileBytes = ((BytesWritable) value).getBytes();
> > System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> > key, fileBytes.length);
> > position = reader.getPosition(); // beginning of next
> record
> > }
> >
>
Re: can't read the SequenceFile correctly
Posted by Raghu Angadi <ra...@yahoo-inc.com>.
+1 on something like getValidBytes(). Just the existence of this would
warn many programmers about getBytes().
Raghu.
Owen O'Malley wrote:
>
> On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
>
>> Hey Tom,
>>
>> I got also burned by this ?? Why does BytesWritable.getBytes() returns
>> non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()
>> kind of function.
>
> It does it because continually resizing the array to the "valid" length
> is very expensive. It would be a reasonable patch to add a
> getValidBytes, but most methods in Java's libraries are aware of this
> and let you pass in byte[], offset, and length. So once you realize what
> the problem is, you can work around it.
>
> -- Owen
Re: can't read the SequenceFile correctly
Posted by Owen O'Malley <om...@apache.org>.
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
> Hey Tom,
>
> I got also burned by this ?? Why does BytesWritable.getBytes() returns
> non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()
> kind of function.
It does it because continually resizing the array to the "valid"
length is very expensive. It would be a reasonable patch to add a
getValidBytes, but most methods in Java's libraries are aware of this
and let you pass in byte[], offset, and length. So once you realize
what the problem is, you can work around it.
-- Owen
RE: can't read the SequenceFile correctly
Posted by Bhupesh Bansal <bb...@linkedin.com>.
Hey Tom,
I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function.
Best
Bhupesh
-----Original Message-----
From: Tom White [mailto:tom@cloudera.com]
Sent: Fri 2/6/2009 2:25 AM
To: core-user@hadoop.apache.org
Subject: Re: can't read the SequenceFile correctly
Hi Mark,
Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.
Tom
On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi,
>
> I have written binary files to a SequenceFile, seemeingly successfully, but
> when I read them back with the code below, after a first few reads I get the
> same number of bytes for the different files. What could go wrong?
>
> Thank you,
> Mark
>
> reader = new SequenceFile.Reader(fs, path, conf);
> Writable key = (Writable)
> ReflectionUtils.newInstance(reader.getKeyClass(), conf);
> Writable value = (Writable)
> ReflectionUtils.newInstance(reader.getValueClass(), conf);
> long position = reader.getPosition();
> while (reader.next(key, value)) {
> String syncSeen = reader.syncSeen() ? "*" : "";
> byte [] fileBytes = ((BytesWritable) value).getBytes();
> System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> key, fileBytes.length);
> position = reader.getPosition(); // beginning of next record
> }
>
Re: can't read the SequenceFile correctly
Posted by Tom White <to...@cloudera.com>.
Hi Mark,
Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.
Tom
On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi,
>
> I have written binary files to a SequenceFile, seemeingly successfully, but
> when I read them back with the code below, after a first few reads I get the
> same number of bytes for the different files. What could go wrong?
>
> Thank you,
> Mark
>
> reader = new SequenceFile.Reader(fs, path, conf);
> Writable key = (Writable)
> ReflectionUtils.newInstance(reader.getKeyClass(), conf);
> Writable value = (Writable)
> ReflectionUtils.newInstance(reader.getValueClass(), conf);
> long position = reader.getPosition();
> while (reader.next(key, value)) {
> String syncSeen = reader.syncSeen() ? "*" : "";
> byte [] fileBytes = ((BytesWritable) value).getBytes();
> System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> key, fileBytes.length);
> position = reader.getPosition(); // beginning of next record
> }
>