You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Himanshu Vashishtha <hv...@gmail.com> on 2013/11/05 06:11:13 UTC

Calling o/s.flush() in HLog.sync()?

Looking at ProtobufLogWriter class, it looks like the call to flush() in
the sync method is a noop.

https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134

The underlying output stream is DFSOutputStream, which doesn't implement
flush().

And, it calls sync() anyway, which ensures the data is written to DN's
(cache).

Previously with SequenceFile$Writer, it writes data to the outputstream
(using Writables#write), and invoke sync/hflush.
https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314

Is there a reason we have this call here? Please let me know if I miss any
context.

Thanks,
Himanshu

Re: Calling o/s.flush() in HLog.sync()?

Posted by Haosong Huang <ha...@gmail.com>.
Sorry, I misunderstood what you ask.
2013-11-6 上午9:23于 "Himanshu Vashishtha" <hv...@gmail.com>写道:

> Okay, good to know but not sure how your response is related to what I
> asked.
>
>
> On Tue, Nov 5, 2013 at 1:08 AM, Haosong Huang <ha...@gmail.com> wrote:
>
> > An os fsync() call will spent nearly 10ms because of the harddisk iops
> > neckbottle. A hsync() would become two os fsync(). One for checksum file
> > and theother for block file. If you use SSD disk, you could try use
> fsync()
> > instead of flush() and mount file system without writebarrier.
> > 2013-11-5 下午1:12于 "Himanshu Vashishtha" <hv...@gmail.com>写道:
> >
> > > Looking at ProtobufLogWriter class, it looks like the call to flush()
> in
> > > the sync method is a noop.
> > >
> > >
> > >
> >
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
> > >
> > > The underlying output stream is DFSOutputStream, which doesn't
> implement
> > > flush().
> > >
> > > And, it calls sync() anyway, which ensures the data is written to DN's
> > > (cache).
> > >
> > > Previously with SequenceFile$Writer, it writes data to the outputstream
> > > (using Writables#write), and invoke sync/hflush.
> > >
> > >
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
> > >
> > > Is there a reason we have this call here? Please let me know if I miss
> > any
> > > context.
> > >
> > > Thanks,
> > > Himanshu
> > >
> >
>

Re: Calling o/s.flush() in HLog.sync()?

Posted by Himanshu Vashishtha <hv...@gmail.com>.
Okay, good to know but not sure how your response is related to what I
asked.


On Tue, Nov 5, 2013 at 1:08 AM, Haosong Huang <ha...@gmail.com> wrote:

> An os fsync() call will spent nearly 10ms because of the harddisk iops
> neckbottle. A hsync() would become two os fsync(). One for checksum file
> and theother for block file. If you use SSD disk, you could try use fsync()
> instead of flush() and mount file system without writebarrier.
> 2013-11-5 下午1:12于 "Himanshu Vashishtha" <hv...@gmail.com>写道:
>
> > Looking at ProtobufLogWriter class, it looks like the call to flush() in
> > the sync method is a noop.
> >
> >
> >
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
> >
> > The underlying output stream is DFSOutputStream, which doesn't implement
> > flush().
> >
> > And, it calls sync() anyway, which ensures the data is written to DN's
> > (cache).
> >
> > Previously with SequenceFile$Writer, it writes data to the outputstream
> > (using Writables#write), and invoke sync/hflush.
> >
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
> >
> > Is there a reason we have this call here? Please let me know if I miss
> any
> > context.
> >
> > Thanks,
> > Himanshu
> >
>

Re: Calling o/s.flush() in HLog.sync()?

Posted by Haosong Huang <ha...@gmail.com>.
An os fsync() call will spent nearly 10ms because of the harddisk iops
neckbottle. A hsync() would become two os fsync(). One for checksum file
and theother for block file. If you use SSD disk, you could try use fsync()
instead of flush() and mount file system without writebarrier.
2013-11-5 下午1:12于 "Himanshu Vashishtha" <hv...@gmail.com>写道:

> Looking at ProtobufLogWriter class, it looks like the call to flush() in
> the sync method is a noop.
>
>
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
>
> The underlying output stream is DFSOutputStream, which doesn't implement
> flush().
>
> And, it calls sync() anyway, which ensures the data is written to DN's
> (cache).
>
> Previously with SequenceFile$Writer, it writes data to the outputstream
> (using Writables#write), and invoke sync/hflush.
>
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
>
> Is there a reason we have this call here? Please let me know if I miss any
> context.
>
> Thanks,
> Himanshu
>

Re: Calling o/s.flush() in HLog.sync()?

Posted by Jonathan Hsieh <jo...@cloudera.com>.
I find that this answer was unsatisfying and could use some elaboration.

I think its there because of java OutputStream convention and not so much
because of hadoop.

The output object in ProtobufLogWriter is a HDFS FSDataOutputStream.  The
HFDS FSDataOutputStream essentially wraps a java OutputStream [1] (which
has write byte[] and write int methods only) providing a Java
DataOutputStream [2] object which provides nice writeXxxx methods for
serializing primitive datatypes (int, float etc).  For efficiency, usually
you'd wrap the OutputStream with a BufferedOutputStream[3] which adds an in
memory buffer and flushes to the underlaying outputstream when a  certain
size is reach or flush is called().

Since it gets it from the FS object I bet it could it could have different
implementations other than just the DFSOutputStream you saw -- which
require the flush.

Jon.

[1] http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html
[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
[3]
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedOutputStream.html


On Thu, Nov 7, 2013 at 2:56 PM, Ted Yu <yu...@gmail.com> wrote:

> Himanshu:
> See
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#DataOutputStream.flush%28%29
> The flush() call results in OutputStream.flush().
>
> Cheers
>
>
> On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <hv.csuoa@gmail.com
> >wrote:
>
> > Looking at ProtobufLogWriter class, it looks like the call to flush() in
> > the sync method is a noop.
> >
> >
> >
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
> >
> > The underlying output stream is DFSOutputStream, which doesn't implement
> > flush().
> >
> > And, it calls sync() anyway, which ensures the data is written to DN's
> > (cache).
> >
> > Previously with SequenceFile$Writer, it writes data to the outputstream
> > (using Writables#write), and invoke sync/hflush.
> >
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
> >
> > Is there a reason we have this call here? Please let me know if I miss
> any
> > context.
> >
> > Thanks,
> > Himanshu
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Calling o/s.flush() in HLog.sync()?

Posted by Ted Yu <yu...@gmail.com>.
Himanshu:
See
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#DataOutputStream.flush%28%29
The flush() call results in OutputStream.flush().

Cheers


On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <hv...@gmail.com>wrote:

> Looking at ProtobufLogWriter class, it looks like the call to flush() in
> the sync method is a noop.
>
>
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
>
> The underlying output stream is DFSOutputStream, which doesn't implement
> flush().
>
> And, it calls sync() anyway, which ensures the data is written to DN's
> (cache).
>
> Previously with SequenceFile$Writer, it writes data to the outputstream
> (using Writables#write), and invoke sync/hflush.
>
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
>
> Is there a reason we have this call here? Please let me know if I miss any
> context.
>
> Thanks,
> Himanshu
>