You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Zhu Han (JIRA)" <ji...@apache.org> on 2011/09/23 14:40:26 UTC
[jira] [Updated] (CASSANDRA-3248) CommitLog writer should call
fdatasync instead of fsync
[ https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhu Han updated CASSANDRA-3248:
-------------------------------
Description:
CommitLogSegment use SequentialWriter to flush the buffered data to log device. It depends on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
However, at least on Linux, fdatasync() is good enough for commit log flush:
bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.
File size is synced to disk by fdatasync() either. Although the commit log recovery logic sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes fdatasync(). On windows, the false flag does not have any impact.
On my log device (commodity SATA HDD, write cache disabled), there is large performance gap between fsync() and fdatasync():
{quote}
$sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fdatasync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
54.90 Requests/sec executed
per-request statistics:
min: 8.29ms
avg: 18.18ms
max: 108.36ms
approx. 95 percentile: 25.02ms
$ sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fsync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
28.08 Requests/sec executed
per-request statistics:
min: 33.28ms
avg: 35.61ms
max: 911.87ms
approx. 95 percentile: 41.69ms
{quote}
I do think this is a very critical performance improvement.
was:
CommitLogSegment use SequentialWriter to flush the buffered data to log device. It depends on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
However, at least on Linux, fdatasync() is good enough for commit log flush:
bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.
File size is synced to disk by fdatasync() either. Although the commit log recovery logic sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes fdatasync(). On windows, the false flag does not have any impact.
On my log device (commodity SATA HDD, write cache disabled), fsync() and fdatasync() has large performance gap:
{quote}
$sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fdatasync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
54.90 Requests/sec executed
per-request statistics:
min: 8.29ms
avg: 18.18ms
max: 108.36ms
approx. 95 percentile: 25.02ms
$ sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fsync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
28.08 Requests/sec executed
per-request statistics:
min: 33.28ms
avg: 35.61ms
max: 911.87ms
approx. 95 percentile: 41.69ms
{quote}
I do think this is a very critical performance improvement.
> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>
> Key: CASSANDRA-3248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3248
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.6, 1.0.0, 1.1
> Environment: Linux
> Reporter: Zhu Han
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> CommitLogSegment use SequentialWriter to flush the buffered data to log device. It depends on FileDescriptor#sync() which invokes fsync() as it force the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log recovery logic sorts the commit log segements on their modify timestamp, it can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, FileChannel#force(false) invokes fdatasync(). On windows, the false flag does not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large performance gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fdatasync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
> 54.90 Requests/sec executed
> per-request statistics:
> min: 8.29ms
> avg: 18.18ms
> max: 108.36ms
> approx. 95 percentile: 25.02ms
> $ sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G --file-fsync-all=on --file-fsync-mode=fsync --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 run
> 28.08 Requests/sec executed
> per-request statistics:
> min: 33.28ms
> avg: 35.61ms
> max: 911.87ms
> approx. 95 percentile: 41.69ms
> {quote}
> I do think this is a very critical performance improvement.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira