You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/07/28 05:06:20 UTC
Using the DataOutput/InputBuffer classes
Hi All
I have been trying to use the DataOutputBuffer class for its obvious memory
efficiency. I basically write some data into the buffer and then write the
buffer into a file (an instance of RandomAccessFile) by invoking
buffer.getData(). However what I am seeing is that a lot of garbage is being
written into the file which manifests itself as a series of '@' characters
in Linux and spaces on Windows.
This is my usage :
DataOutputBuffer buffer = new DataOutputBuffer();
RandomAccessFile raf = new RandomAccessFile(file, "rw");
for ( each data in some data structure )
{
buffer.reset();
serialize data into buffer;
raf.write(buffer.getData());
}
When I use ByteArrayOutputStream and a DataOutputStream to do the same task
the size of the generated file is 29K. However when I use the
DataOutputBuffer the size of the file for the same dataset it 507K. Is my
usage correct ?
Please advice
THanks
A
Re: Using the DataOutput/InputBuffer classes
Posted by Brian Harrington <br...@yahoo-inc.com>.
From the apidocs for DataOutputBuffer: "Returns the current contents of
the buffer. Data is only valid to |getLength()|
<http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/DataOutputBuffer.html#getLength%28%29>."
Try:
raf.write(buffer.getData(), 0, buffer.getLength());
Brian
Phantom wrote:
> Hi All
>
> I have been trying to use the DataOutputBuffer class for its obvious memory
> efficiency. I basically write some data into the buffer and then write the
> buffer into a file (an instance of RandomAccessFile) by invoking
> buffer.getData(). However what I am seeing is that a lot of garbage is being
> written into the file which manifests itself as a series of '@' characters
> in Linux and spaces on Windows.
>
> This is my usage :
>
> DataOutputBuffer buffer = new DataOutputBuffer();
> RandomAccessFile raf = new RandomAccessFile(file, "rw");
>
> for ( each data in some data structure )
> {
> buffer.reset();
> serialize data into buffer;
> raf.write(buffer.getData());
> }
>
> When I use ByteArrayOutputStream and a DataOutputStream to do the same task
> the size of the generated file is 29K. However when I use the
> DataOutputBuffer the size of the file for the same dataset it 507K. Is my
> usage correct ?
>
> Please advice
>
> THanks
> A
>
>