You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Thanh Do <th...@cs.wisc.edu> on 2010/11/04 22:58:27 UTC

Why dataOut is FileOutputStream?

Hi all,

When a datanode receive a block, the datanode
write the block into 2 streams on disk:
- the data stream (dataOut)
- the checksum stream (checksumOut)

While the checksumOut is created with following code:
   this.checksumOut = new DataOutputStream(new BufferedOutputStream(
                                          streams.checksumOut,
                                          SMALL_BUFFER_SIZE));
The dataOut is simply FileOutputStream()

So, the checksumOut is buffered, but dataOut is not.

Is there any particular reason for doing so?
or it doesn't matter, because after that, we flush
the two streams anyway?

Thanks
Thanh

Re: Why dataOut is FileOutputStream?

Posted by Thanh Do <th...@cs.wisc.edu>.
Thanks Eli,

I got it now.

On Fri, Nov 5, 2010 at 10:36 PM, Eli Collins <el...@cloudera.com> wrote:

> Hey Thanh,
>
> Data gets written in 64KB packets so there doesn't seem to be a need
> to buffer it.
>
> Thanks,
> Eli
>
> On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do <th...@cs.wisc.edu> wrote:
> > Hi all,
> >
> > When a datanode receive a block, the datanode
> > write the block into 2 streams on disk:
> > - the data stream (dataOut)
> > - the checksum stream (checksumOut)
> >
> > While the checksumOut is created with following code:
> >   this.checksumOut = new DataOutputStream(new BufferedOutputStream(
> >                                          streams.checksumOut,
> >                                          SMALL_BUFFER_SIZE));
> > The dataOut is simply FileOutputStream()
> >
> > So, the checksumOut is buffered, but dataOut is not.
> >
> > Is there any particular reason for doing so?
> > or it doesn't matter, because after that, we flush
> > the two streams anyway?
> >
> > Thanks
> > Thanh
> >
>

Re: Why dataOut is FileOutputStream?

Posted by Thanh Do <th...@cs.wisc.edu>.
Thanks Eli,

I got it now.

On Fri, Nov 5, 2010 at 10:36 PM, Eli Collins <el...@cloudera.com> wrote:

> Hey Thanh,
>
> Data gets written in 64KB packets so there doesn't seem to be a need
> to buffer it.
>
> Thanks,
> Eli
>
> On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do <th...@cs.wisc.edu> wrote:
> > Hi all,
> >
> > When a datanode receive a block, the datanode
> > write the block into 2 streams on disk:
> > - the data stream (dataOut)
> > - the checksum stream (checksumOut)
> >
> > While the checksumOut is created with following code:
> >   this.checksumOut = new DataOutputStream(new BufferedOutputStream(
> >                                          streams.checksumOut,
> >                                          SMALL_BUFFER_SIZE));
> > The dataOut is simply FileOutputStream()
> >
> > So, the checksumOut is buffered, but dataOut is not.
> >
> > Is there any particular reason for doing so?
> > or it doesn't matter, because after that, we flush
> > the two streams anyway?
> >
> > Thanks
> > Thanh
> >
>

Re: Why dataOut is FileOutputStream?

Posted by Eli Collins <el...@cloudera.com>.
Hey Thanh,

Data gets written in 64KB packets so there doesn't seem to be a need
to buffer it.

Thanks,
Eli

On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do <th...@cs.wisc.edu> wrote:
> Hi all,
>
> When a datanode receive a block, the datanode
> write the block into 2 streams on disk:
> - the data stream (dataOut)
> - the checksum stream (checksumOut)
>
> While the checksumOut is created with following code:
>   this.checksumOut = new DataOutputStream(new BufferedOutputStream(
>                                          streams.checksumOut,
>                                          SMALL_BUFFER_SIZE));
> The dataOut is simply FileOutputStream()
>
> So, the checksumOut is buffered, but dataOut is not.
>
> Is there any particular reason for doing so?
> or it doesn't matter, because after that, we flush
> the two streams anyway?
>
> Thanks
> Thanh
>

Re: Why dataOut is FileOutputStream?

Posted by Eli Collins <el...@cloudera.com>.
Hey Thanh,

Data gets written in 64KB packets so there doesn't seem to be a need
to buffer it.

Thanks,
Eli

On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do <th...@cs.wisc.edu> wrote:
> Hi all,
>
> When a datanode receive a block, the datanode
> write the block into 2 streams on disk:
> - the data stream (dataOut)
> - the checksum stream (checksumOut)
>
> While the checksumOut is created with following code:
>   this.checksumOut = new DataOutputStream(new BufferedOutputStream(
>                                          streams.checksumOut,
>                                          SMALL_BUFFER_SIZE));
> The dataOut is simply FileOutputStream()
>
> So, the checksumOut is buffered, but dataOut is not.
>
> Is there any particular reason for doing so?
> or it doesn't matter, because after that, we flush
> the two streams anyway?
>
> Thanks
> Thanh
>