You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/16 00:06:20 UTC
[jira] [Commented] (COUCHDB-3061) Decrease search time for deeply
buried headers
[ https://issues.apache.org/jira/browse/COUCHDB-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380335#comment-15380335 ]
ASF GitHub Bot commented on COUCHDB-3061:
-----------------------------------------
GitHub user jaydoane opened a pull request:
https://github.com/apache/couchdb-couch/pull/185
3061 adaptive header search
The current find_header algorithm performs one file read per block,
which can be inefficient when the most recent header is buried deeply in
a large file.
With default config params, this change essentially keeps current
behavior, reading each block into memory one at a time until a header is
found, or beginning of file is reached. However, by setting new config
params, it's possible to exponentially increase the number of blocks
read into a "chunk" of memory with a single read, up to a configurable
limit.
For example, the following settings begin with a chunk size of one
block, and then double in size each additional search step backward to a
maximum limit of 16GB:
[couchdb]
chunk_max_size = 4096*4096
chunk_exponent_base = 2
Measurements for a 12GB .couch.meta file with its last header at 4GB
shows a speed improvement of from 17x (server) to 27x (laptop).
COUCHDB-3061
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloudant/couchdb-couch 3061-adaptive-header-search
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/couchdb-couch/pull/185.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #185
----
commit 1c3db513a029351af6af26b4dd1da1f72f978789
Author: Jay Doane <ja...@gmail.com>
Date: 2016-07-14T02:05:56Z
Rename function and variables:
- Drop "calculate_" prefix since all functions perform calculations
- Change "_len" to "_size" for better consistency
- Change TotalBytes to TotalSize for better consistency
commit abf1e76fa996327bb52ac038b3d4520662cbab15
Author: Jay Doane <ja...@gmail.com>
Date: 2016-07-15T23:29:02Z
Implement adaptive header search
The current find_header algorithm performs one file read per block,
which can be inefficient when the most recent header is buried deeply in
a large file.
With default config params, this change essentially keeps current
behavior, reading each block into memory one at a time until a header is
found, or beginning of file is reached. However, by setting new config
params, it's possible to exponentially increase the number of blocks
read into a "chunk" of memory with a single read, up to a configurable
limit.
For example, the following settings begin with a chunk size of one
block, and then double in size each additional search step backward to a
maximum limit of 16GB:
[couchdb]
chunk_max_size = 4096*4096
chunk_exponent_base = 2
Measurements for a 12GB .couch.meta file with its last header at 4GB
shows a speed improvement of from 17x (server) to 27x (laptop).
COUCHDB-3061
----
> Decrease search time for deeply buried headers
> ----------------------------------------------
>
> Key: COUCHDB-3061
> URL: https://issues.apache.org/jira/browse/COUCHDB-3061
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Reporter: Jay Doane
> Assignee: Jay Doane
>
> When a db compaction is interrupted and restarted, it's possible for the most recent header written to the .compact.meta file to be buried under many GB from the end of file. For example, an interrupted compaction left the following files:
> {code}
> -rw-r--r-- 1 dbcore dbcore 47G Jun 27 04:29 /srv/db/shards/40000000-7fffffff/opendi/yellow-ng-loadbalancer.1461156255.couch.compact.data
> -rw-r--r-- 1 dbcore dbcore 12G Jun 27 05:20 /srv/db/shards/40000000-7fffffff/opendi/yellow-ng-loadbalancer.1461156255.couch.compact.meta
> {code}
> but
> {code}
> # grep -abo db_header yellow-ng-loadbalancer.1461156255.couch.compact.meta | tail -1
> 4364894251:db_header
> {code}
> which means the current algorithm must search through about 8GB of file before it gets to a header. I measured how long it takes, both on an SSD laptop and SSD server, getting similar numbers:
> {code}
> (dbcore@db3.testy012.cloudant.net)18> timer:tc(couch_file,test_find_header,["/srv/jdoane/yellow-ng-loadbalancer.1461156255.couch.compact.meta", default]).
> {328335338, ...
> (node1@127.0.0.1)27> timer:tc(couch_file,test_find_header,["/Users/jay/proj/ibm/sample-data/yellow-ng-loadbalancer.1461156255.couch.compact.meta", default]).
> {426650530, ...
> {code}
> which is between 328-427 seconds, or 19-25 MB/s.
> One reason for this relative slowness is because the current algorithm performs a disk read for every block it searches:
> https://github.com/apache/couchdb-couch/blob/master/src/couch_file.erl#L537-L539
> We can improve the speed by loading a "chunk" of many blocks into memory with a single read operation, and then search each block in memory, trading off memory for speed. Ideally, the tradeoff can be made configurable, so that existing speed/memory behavior can be retained by default.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)