You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2009/02/06 06:41:35 UTC

can't read the SequenceFile correctly

Hi,

I have written binary files to a SequenceFile, seemeingly successfully, but
when I read them back with the code below, after a first few reads I get the
same number of bytes for the different files. What could go wrong?

Thank you,
Mark

          reader = new SequenceFile.Reader(fs, path, conf);
            Writable key = (Writable)
ReflectionUtils.newInstance(reader.getKeyClass(), conf);
            Writable value = (Writable)
ReflectionUtils.newInstance(reader.getValueClass(), conf);
            long position = reader.getPosition();
            while (reader.next(key, value)) {
                String syncSeen = reader.syncSeen() ? "*" : "";
                byte [] fileBytes = ((BytesWritable) value).getBytes();
                System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
key, fileBytes.length);
                position = reader.getPosition(); // beginning of next record
            }

Re: can't read the SequenceFile correctly

Posted by Mark Kerzner <ma...@gmail.com>.
Indeed, this was the answer!

Thank you,
Mark

On Fri, Feb 6, 2009 at 4:25 AM, Tom White <to...@cloudera.com> wrote:

> Hi Mark,
>
> Not all the bytes stored in a BytesWritable object are necessarily
> valid. Use BytesWritable#getLength() to determine how much of the
> buffer returned by BytesWritable#getBytes() to use.
>
> Tom
>
> On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com>
> wrote:
> > Hi,
> >
> > I have written binary files to a SequenceFile, seemeingly successfully,
> but
> > when I read them back with the code below, after a first few reads I get
> the
> > same number of bytes for the different files. What could go wrong?
> >
> > Thank you,
> > Mark
> >
> >          reader = new SequenceFile.Reader(fs, path, conf);
> >            Writable key = (Writable)
> > ReflectionUtils.newInstance(reader.getKeyClass(), conf);
> >            Writable value = (Writable)
> > ReflectionUtils.newInstance(reader.getValueClass(), conf);
> >            long position = reader.getPosition();
> >            while (reader.next(key, value)) {
> >                String syncSeen = reader.syncSeen() ? "*" : "";
> >                byte [] fileBytes = ((BytesWritable) value).getBytes();
> >                System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> > key, fileBytes.length);
> >                position = reader.getPosition(); // beginning of next
> record
> >            }
> >
>

Re: can't read the SequenceFile correctly

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
+1 on something like getValidBytes(). Just the existence of this would 
warn many programmers about getBytes().

Raghu.

Owen O'Malley wrote:
> 
> On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
> 
>> Hey Tom,
>>
>> I got also burned by this ?? Why does BytesWritable.getBytes() returns
>> non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() 
>> kind of function.
> 
> It does it because continually resizing the array to the "valid" length 
> is very expensive. It would be a reasonable patch to add a 
> getValidBytes, but most methods in Java's libraries are aware of this 
> and let you pass in byte[], offset, and length. So once you realize what 
> the problem is, you can work around it.
> 
> -- Owen


Re: can't read the SequenceFile correctly

Posted by Owen O'Malley <om...@apache.org>.
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:

> Hey Tom,
>
> I got also burned by this ?? Why does BytesWritable.getBytes() returns
> non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()  
> kind of function.

It does it because continually resizing the array to the "valid"  
length is very expensive. It would be a reasonable patch to add a  
getValidBytes, but most methods in Java's libraries are aware of this  
and let you pass in byte[], offset, and length. So once you realize  
what the problem is, you can work around it.

-- Owen

RE: can't read the SequenceFile correctly

Posted by Bhupesh Bansal <bb...@linkedin.com>.
Hey Tom, 

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. 


Best
Bhupesh 



-----Original Message-----
From: Tom White [mailto:tom@cloudera.com]
Sent: Fri 2/6/2009 2:25 AM
To: core-user@hadoop.apache.org
Subject: Re: can't read the SequenceFile correctly
 
Hi Mark,

Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.

Tom

On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi,
>
> I have written binary files to a SequenceFile, seemeingly successfully, but
> when I read them back with the code below, after a first few reads I get the
> same number of bytes for the different files. What could go wrong?
>
> Thank you,
> Mark
>
>          reader = new SequenceFile.Reader(fs, path, conf);
>            Writable key = (Writable)
> ReflectionUtils.newInstance(reader.getKeyClass(), conf);
>            Writable value = (Writable)
> ReflectionUtils.newInstance(reader.getValueClass(), conf);
>            long position = reader.getPosition();
>            while (reader.next(key, value)) {
>                String syncSeen = reader.syncSeen() ? "*" : "";
>                byte [] fileBytes = ((BytesWritable) value).getBytes();
>                System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> key, fileBytes.length);
>                position = reader.getPosition(); // beginning of next record
>            }
>


Re: can't read the SequenceFile correctly

Posted by Tom White <to...@cloudera.com>.
Hi Mark,

Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.

Tom

On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner <ma...@gmail.com> wrote:
> Hi,
>
> I have written binary files to a SequenceFile, seemeingly successfully, but
> when I read them back with the code below, after a first few reads I get the
> same number of bytes for the different files. What could go wrong?
>
> Thank you,
> Mark
>
>          reader = new SequenceFile.Reader(fs, path, conf);
>            Writable key = (Writable)
> ReflectionUtils.newInstance(reader.getKeyClass(), conf);
>            Writable value = (Writable)
> ReflectionUtils.newInstance(reader.getValueClass(), conf);
>            long position = reader.getPosition();
>            while (reader.next(key, value)) {
>                String syncSeen = reader.syncSeen() ? "*" : "";
>                byte [] fileBytes = ((BytesWritable) value).getBytes();
>                System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen,
> key, fileBytes.length);
>                position = reader.getPosition(); // beginning of next record
>            }
>