You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/07/28 05:06:20 UTC

Using the DataOutput/InputBuffer classes

Hi All

I have been trying to use the DataOutputBuffer class for its obvious memory
efficiency. I basically write some data into the buffer and then write the
buffer into a file (an instance of RandomAccessFile) by invoking
buffer.getData(). However what I am seeing is that a lot of garbage is being
written into the file which manifests itself as a series of '@' characters
in Linux and spaces on Windows.

This is my usage :

DataOutputBuffer buffer = new DataOutputBuffer();
RandomAccessFile raf  = new RandomAccessFile(file, "rw");

for ( each data in some data structure )
{
    buffer.reset();
    serialize data into buffer;
    raf.write(buffer.getData());
}

When I use ByteArrayOutputStream and a DataOutputStream to do the same task
the size of the generated file is 29K. However when I use the
DataOutputBuffer the size of the file for the same dataset it 507K. Is my
usage correct ?

Please advice

THanks
A

Re: Using the DataOutput/InputBuffer classes

Posted by Brian Harrington <br...@yahoo-inc.com>.
 From the apidocs for DataOutputBuffer: "Returns the current contents of 
the buffer. Data is only valid to |getLength()| 
<http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/DataOutputBuffer.html#getLength%28%29>."

Try:

raf.write(buffer.getData(), 0, buffer.getLength());

Brian


Phantom wrote:
> Hi All
>
> I have been trying to use the DataOutputBuffer class for its obvious memory
> efficiency. I basically write some data into the buffer and then write the
> buffer into a file (an instance of RandomAccessFile) by invoking
> buffer.getData(). However what I am seeing is that a lot of garbage is being
> written into the file which manifests itself as a series of '@' characters
> in Linux and spaces on Windows.
>
> This is my usage :
>
> DataOutputBuffer buffer = new DataOutputBuffer();
> RandomAccessFile raf  = new RandomAccessFile(file, "rw");
>
> for ( each data in some data structure )
> {
>     buffer.reset();
>     serialize data into buffer;
>     raf.write(buffer.getData());
> }
>
> When I use ByteArrayOutputStream and a DataOutputStream to do the same task
> the size of the generated file is 29K. However when I use the
> DataOutputBuffer the size of the file for the same dataset it 507K. Is my
> usage correct ?
>
> Please advice
>
> THanks
> A
>
>