You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Paul Davis <pa...@gmail.com> on 2009/05/11 03:49:23 UTC

Patch to couch_btree:chunkify

Chris reminded me that I had an optimization patch laying around for
couch_btree:chunkify and his tests show that it gets a bit of a speed
increase when running some tests with hovercraft. The basic outline of
what I did was to swap a call like term_to_binary([ListOfTuples]) to a
sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
Size = sum(ListOfSizes), and then when we go through the list of
tuples to split them into chunks I use the pre calculated sizes.

Anyway, I just wanted to run it across the list before I commit it in
case anyone sees anything subtle I might be missing.

chunkify(_Bt, []) ->
    [];
chunkify(Bt, InList) ->
    ToSize = fun(X) -> size(term_to_binary(X)) end,
    SizeList = lists:map(ToSize, InList),
    TotalSize = lists:sum(SizeList),
    case TotalSize of
    Size when Size > ?CHUNK_THRESHOLD ->
        NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
        ChunkThreshold = Size div NumberOfChunksLikely,
        chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
    _Else ->
        [InList]
    end.

chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
    lists:reverse(Chunks);
chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
    lists:reverse([lists:reverse(OutAcc) | Chunks]);
chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold, OutAcc,
        OutAccSize, Chunks) ->
    case InSize of
    InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /= [] ->
        chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
            [lists:reverse([InElement | OutAcc]) | Chunks]);
    InSize ->
        chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | OutAcc],
            OutAccSize + InSize, Chunks)
    end.

Re: chunkify profiling (was Re: Patch to couch_btree:chunkify)

Posted by Paul Davis <pa...@gmail.com>.

>
> You know BERT better than I do -- you said the size of a binary is stored in
> its header, correct?
>

I'm not sure now. It may only get length information when being sent
across the wire.

>> I'll put this on the weekend agenda. Until I can show that its
>> consistently faster I'll hold off.
>>
>> For reference, when you say 2K docs in batches of 1K, did you mean 200K?
>
> No, I meant 2k (2 calls to _bulk_docs).  200k would have generated a
> multi-GB trace and I think fprof:profile() would have melted my MacBook
> processing it.  YMMV ;-)

I thought you knew the guys at Cern ;)

Thanks for writing this up and do please post code somewhere. This
weekend I'll take a bit of time to see if I can weasel anything better
out of the fprof stuff.

Paul Davis

Re: chunkify profiling (was Re: Patch to couch_btree:chunkify)

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Paul,

On May 13, 2009, at 4:01 PM, Paul Davis wrote:

> Adam,
>
> No worries about the delay. I'd agree that the first graph doesn't
> really show much than *maybe* we can say the patch reduces the
> variability a bit.

Agreed.  The variance on the wallclock measurements is frustrating.  I  
started a thread about CPU time profiling on erlang-questions, as it  
seems to be automatically disabled on Linux and not even implemented  
on OS X.  I think wallclock time is generally the right way to go, but  
if we're testing two implementations of an algorithm that doesn't  
touch the disk then CPU time seems like the better metric to me.

> On the second graph, I haven't the faintest why that'd be as such.
> I'll have to try and setup fprof and see if I can figure out what
> exactly is taking most of the time.

I should clean up and post the code I used to make these  
measurements.  I wanted to just profile the couch_db_updater process,  
so I disabled hovercraft's delete/create DB calls at the start of  
lightning() and did them in my test code (so the updater PID didn't  
change on me).

Here's the fprof analysis from a trunk run -- in this case I believe  
it was 10k docs @ 1k bulk:

http://friendpaste.com/16xwiZuqwWrqYXQeaBS0fx

There's a lot of detail there, but if you stare at it awhile a few  
details start to emerge:

- 11218 ms total time spent in the emulator (line 13)
   - 10028 ms in couch_db_updater:update_docs_int/3 and funs called by  
it (line 47)
     - 3682 ms in couch_btree:add_remove/3 (line 70)
     - 1941 ms in couch_btree:lookup/2 (line 104)
     - 1262 ms in couch_db_updater:flush_trees/3 (line 265)
     -  951 ms in couch_db_updater:merge_rev_trees/7 (line 322)
     -  910 ms in couch_db_updater:commit_data/2 (line 330)

and so on.  Each of those numbers is an ACC that includes all the time  
spent in functions called by that function.  The five functions I  
listed have basically zero overlap, so I'm not double-counting.  You  
can certainly drill deeper and see which functions take the most time  
inside add_remove/3, etc.

> Perhaps we're looking at wrong
> thing by reducing term_to_binary. You did say that most of the time
> was spent in size/1 as opposed to term_to_binary the other day which
> is hard to believe at best.

Agreed, that's pretty difficult to believe.  Here's what I saw -- I  
defined a function

chunkify_sizer(Elem, AccIn) ->
     Size = size(term_to_binary(Elem)),
     {{Elem, Size}, AccIn+Size}.

Here's the profile for that function

%                                               CNT       ACC       OWN
{[{{couch_btree,'-chunkify/1-fun-0-',2},       22163,  268.312,   
154.927}],
  { {couch_btree,chunkify_sizer,2},             22163,  268.312,   
154.927},     %
  [{{erlang,term_to_binary,1},                  22163,  108.671,   
108.671},
   {garbage_collect,                               9,    4.714,     
4.714}]}.

60% of the time is spent in the body of chunkify_sizer, and only 40%  
in term_to_binary.  I've never seen size/1 show up explicitly in the  
profile, but if I define a dummy wrapper

get_size(Bin) ->
     erlang:size(Bin).

then get_size/1 will show up with a large OWN time, so I conclude that  
any time spent sizing a binary gets charged to OWN.

You know BERT better than I do -- you said the size of a binary is  
stored in its header, correct?

> I'll put this on the weekend agenda. Until I can show that its
> consistently faster I'll hold off.
>
> For reference, when you say 2K docs in batches of 1K, did you mean  
> 200K?

No, I meant 2k (2 calls to _bulk_docs).  200k would have generated a  
multi-GB trace and I think fprof:profile() would have melted my  
MacBook processing it.  YMMV ;-)

> Also, note to self, we should check speeds for dealing with uuids too
> to see if the non-ordered mode makes a difference.

Agreed.  At the moment fprof seems much better suited to identifying  
hot spots in code than comparing alternative implementations of a  
function.  Best thing I've come up with so far is comparing ratios of  
time spent in the function (as in Figure 2 above).

> Paul
>
> On Wed, May 13, 2009 at 3:33 PM, Adam Kocoloski  
> <ko...@apache.org> wrote:
>> Sorry for the delay on this front.  I ran hovercraft:lightning 20  
>> times each
>> with and without Paul's patch.  Each run inserted 2k docs in  
>> batches of
>> 1000.  Here are two plots showing the effect of the patch:
>>
>> http://dl.getdropbox.com/u/237885/insert_rate.png
>> http://dl.getdropbox.com/u/237885/chunkify_fraction.png
>>
>> The first plot histograms the insert rate for the two scenarios*.   
>> I don't
>> really see much of a difference.  The second plot uses fprof to  
>> plot the
>> fraction of time the couch_db_updater process spent in chunkify and  
>> any
>> functions called by chunkify.  For those familiar with fprof, it's  
>> the ratio
>> of ACC for couch_btree:chunkify/2 divided by OWN for the updater  
>> pid.  If
>> fprof is to be believed, the trunk code is almost 2x faster.
>>
>> Adam
>>
>> * the reason the insert rate is so low is because fprof is  
>> running.  Turning
>> off the profiler speeds things up by an order of magnitude, in  
>> accord with
>> the numbers Chris has posted.
>>

Re: chunkify profiling (was Re: Patch to couch_btree:chunkify)

Posted by Paul Davis <pa...@gmail.com>.

Adam,

No worries about the delay. I'd agree that the first graph doesn't
really show much than *maybe* we can say the patch reduces the
variability a bit.

On the second graph, I haven't the faintest why that'd be as such.
I'll have to try and setup fprof and see if I can figure out what
exactly is taking most of the time. Perhaps we're looking at wrong
thing by reducing term_to_binary. You did say that most of the time
was spent in size/1 as opposed to term_to_binary the other day which
is hard to believe at best.

I'll put this on the weekend agenda. Until I can show that its
consistently faster I'll hold off.

For reference, when you say 2K docs in batches of 1K, did you mean 200K?

Also, note to self, we should check speeds for dealing with uuids too
to see if the non-ordered mode makes a difference.

Paul

On Wed, May 13, 2009 at 3:33 PM, Adam Kocoloski <ko...@apache.org> wrote:
> Sorry for the delay on this front.  I ran hovercraft:lightning 20 times each
> with and without Paul's patch.  Each run inserted 2k docs in batches of
> 1000.  Here are two plots showing the effect of the patch:
>
> http://dl.getdropbox.com/u/237885/insert_rate.png
> http://dl.getdropbox.com/u/237885/chunkify_fraction.png
>
> The first plot histograms the insert rate for the two scenarios*.  I don't
> really see much of a difference.  The second plot uses fprof to plot the
> fraction of time the couch_db_updater process spent in chunkify and any
> functions called by chunkify.  For those familiar with fprof, it's the ratio
> of ACC for couch_btree:chunkify/2 divided by OWN for the updater pid.  If
> fprof is to be believed, the trunk code is almost 2x faster.
>
> Adam
>
> * the reason the insert rate is so low is because fprof is running.  Turning
> off the profiler speeds things up by an order of magnitude, in accord with
> the numbers Chris has posted.
>

chunkify profiling (was Re: Patch to couch_btree:chunkify)

Posted by Adam Kocoloski <ko...@apache.org>.

Sorry for the delay on this front.  I ran hovercraft:lightning 20  
times each with and without Paul's patch.  Each run inserted 2k docs  
in batches of 1000.  Here are two plots showing the effect of the patch:

http://dl.getdropbox.com/u/237885/insert_rate.png
http://dl.getdropbox.com/u/237885/chunkify_fraction.png

The first plot histograms the insert rate for the two scenarios*.  I  
don't really see much of a difference.  The second plot uses fprof to  
plot the fraction of time the couch_db_updater process spent in  
chunkify and any functions called by chunkify.  For those familiar  
with fprof, it's the ratio of ACC for couch_btree:chunkify/2 divided  
by OWN for the updater pid.  If fprof is to be believed, the trunk  
code is almost 2x faster.

Adam

* the reason the insert rate is so low is because fprof is running.   
Turning off the profiler speeds things up by an order of magnitude, in  
accord with the numbers Chris has posted.

Re: Patch to couch_btree:chunkify

Posted by Chris Anderson <jc...@apache.org>.

On Mon, May 11, 2009 at 12:57 PM, Adam Kocoloski
<ad...@gmail.com> wrote:
> I'd like to see some more concrete numbers on the performance difference the
> two versions.  I wasn't able to reproduce Chris' 10%+ speedup using
> hovercraft:lightning; in fact, the two versions seem to be compatible within
> measurement variance.
>

All I did was run hovercraft:lightning() with and without the patch.
Without the patch I was seeing 6.9k doc/sec, with the patch I think it
was 7.7k. The fastest I got was with the patch *and* with compression
turned off in couch_btree:append_term, when I saw 10.1k docs/sec.

I'm running on an old white MacBook, and I seem to be IO bound (with
compression turned off). If I turn compression back on, I become CPU
bound and end up writing to the disk at a slower rate. These effects
would be wildly different on other hardware, I'm guessing.

> I tried messing around with fprof for a while today, and if anything it
> indicates that the original version might actually be faster (though I find
> that hard to believe).  Anyway, I think we should get in the habit of having
> some quantitative, reproducible way of evaluating performance-related
> patches.
>
> +1 for Bob's suggestion of stripping out Bt from the arguments, though.
>
> Adam
>
> On May 11, 2009, at 3:28 PM, Damien Katz wrote:
>
>> +1 for committing.
>>
>> -Damien
>>
>>
>> On May 10, 2009, at 9:49 PM, Paul Davis wrote:
>>
>>> Chris reminded me that I had an optimization patch laying around for
>>> couch_btree:chunkify and his tests show that it gets a bit of a speed
>>> increase when running some tests with hovercraft. The basic outline of
>>> what I did was to swap a call like term_to_binary([ListOfTuples]) to a
>>> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
>>> Size = sum(ListOfSizes), and then when we go through the list of
>>> tuples to split them into chunks I use the pre calculated sizes.
>>>
>>> Anyway, I just wanted to run it across the list before I commit it in
>>> case anyone sees anything subtle I might be missing.
>>>
>>> chunkify(_Bt, []) ->
>>>  [];
>>> chunkify(Bt, InList) ->
>>>  ToSize = fun(X) -> size(term_to_binary(X)) end,
>>>  SizeList = lists:map(ToSize, InList),
>>>  TotalSize = lists:sum(SizeList),
>>>  case TotalSize of
>>>  Size when Size > ?CHUNK_THRESHOLD ->
>>>      NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>>>      ChunkThreshold = Size div NumberOfChunksLikely,
>>>      chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>>>  _Else ->
>>>      [InList]
>>>  end.
>>>
>>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>>>  lists:reverse(Chunks);
>>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>>>  lists:reverse([lists:reverse(OutAcc) | Chunks]);
>>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold,
>>> OutAcc,
>>>      OutAccSize, Chunks) ->
>>>  case InSize of
>>>  InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /= [] ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>>>          [lists:reverse([InElement | OutAcc]) | Chunks]);
>>>  InSize ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | OutAcc],
>>>          OutAccSize + InSize, Chunks)
>>>  end.
>>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Patch to couch_btree:chunkify

Posted by Paul Davis <pa...@gmail.com>.

On Mon, May 11, 2009 at 3:57 PM, Adam Kocoloski
<ad...@gmail.com> wrote:
> I'd like to see some more concrete numbers on the performance difference the
> two versions.  I wasn't able to reproduce Chris' 10%+ speedup using
> hovercraft:lightning; in fact, the two versions seem to be compatible within
> measurement variance.
>

Hmm. My first question is if everyone is using the same parameters for
inserting documents. I'm pretty sure that this patch isn't going to
make too much of a difference unless the size of a _bulk_save is big
enough to make the binary_to_term's noticeable. Other than that, the
only thing I could point at for differences would be Erlang VM version
or hardware.

> I tried messing around with fprof for a while today, and if anything it
> indicates that the original version might actually be faster (though I find
> that hard to believe).  Anyway, I think we should get in the habit of having
> some quantitative, reproducible way of evaluating performance-related
> patches.
>

We definitely need some standard benchmarks to make sure we're not
getting performance regressions else where.

> +1 for Bob's suggestion of stripping out Bt from the arguments, though.
>

Sounds good.

> Adam
>
> On May 11, 2009, at 3:28 PM, Damien Katz wrote:
>
>> +1 for committing.
>>
>> -Damien
>>
>>
>> On May 10, 2009, at 9:49 PM, Paul Davis wrote:
>>
>>> Chris reminded me that I had an optimization patch laying around for
>>> couch_btree:chunkify and his tests show that it gets a bit of a speed
>>> increase when running some tests with hovercraft. The basic outline of
>>> what I did was to swap a call like term_to_binary([ListOfTuples]) to a
>>> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
>>> Size = sum(ListOfSizes), and then when we go through the list of
>>> tuples to split them into chunks I use the pre calculated sizes.
>>>
>>> Anyway, I just wanted to run it across the list before I commit it in
>>> case anyone sees anything subtle I might be missing.
>>>
>>> chunkify(_Bt, []) ->
>>>  [];
>>> chunkify(Bt, InList) ->
>>>  ToSize = fun(X) -> size(term_to_binary(X)) end,
>>>  SizeList = lists:map(ToSize, InList),
>>>  TotalSize = lists:sum(SizeList),
>>>  case TotalSize of
>>>  Size when Size > ?CHUNK_THRESHOLD ->
>>>      NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>>>      ChunkThreshold = Size div NumberOfChunksLikely,
>>>      chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>>>  _Else ->
>>>      [InList]
>>>  end.
>>>
>>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>>>  lists:reverse(Chunks);
>>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>>>  lists:reverse([lists:reverse(OutAcc) | Chunks]);
>>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold,
>>> OutAcc,
>>>      OutAccSize, Chunks) ->
>>>  case InSize of
>>>  InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /= [] ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>>>          [lists:reverse([InElement | OutAcc]) | Chunks]);
>>>  InSize ->
>>>      chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | OutAcc],
>>>          OutAccSize + InSize, Chunks)
>>>  end.
>>
>
>

Re: Patch to couch_btree:chunkify

Posted by Adam Kocoloski <ad...@gmail.com>.

I'd like to see some more concrete numbers on the performance  
difference the two versions.  I wasn't able to reproduce Chris' 10%+  
speedup using hovercraft:lightning; in fact, the two versions seem to  
be compatible within measurement variance.

I tried messing around with fprof for a while today, and if anything  
it indicates that the original version might actually be faster  
(though I find that hard to believe).  Anyway, I think we should get  
in the habit of having some quantitative, reproducible way of  
evaluating performance-related patches.

+1 for Bob's suggestion of stripping out Bt from the arguments, though.

Adam

On May 11, 2009, at 3:28 PM, Damien Katz wrote:

> +1 for committing.
>
> -Damien
>
>
> On May 10, 2009, at 9:49 PM, Paul Davis wrote:
>
>> Chris reminded me that I had an optimization patch laying around for
>> couch_btree:chunkify and his tests show that it gets a bit of a speed
>> increase when running some tests with hovercraft. The basic outline  
>> of
>> what I did was to swap a call like term_to_binary([ListOfTuples])  
>> to a
>> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
>> Size = sum(ListOfSizes), and then when we go through the list of
>> tuples to split them into chunks I use the pre calculated sizes.
>>
>> Anyway, I just wanted to run it across the list before I commit it in
>> case anyone sees anything subtle I might be missing.
>>
>> chunkify(_Bt, []) ->
>>   [];
>> chunkify(Bt, InList) ->
>>   ToSize = fun(X) -> size(term_to_binary(X)) end,
>>   SizeList = lists:map(ToSize, InList),
>>   TotalSize = lists:sum(SizeList),
>>   case TotalSize of
>>   Size when Size > ?CHUNK_THRESHOLD ->
>>       NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>>       ChunkThreshold = Size div NumberOfChunksLikely,
>>       chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>>   _Else ->
>>       [InList]
>>   end.
>>
>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>>   lists:reverse(Chunks);
>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>>   lists:reverse([lists:reverse(OutAcc) | Chunks]);
>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes],  
>> Threshold, OutAcc,
>>       OutAccSize, Chunks) ->
>>   case InSize of
>>   InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /=  
>> [] ->
>>       chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>>           [lists:reverse([InElement | OutAcc]) | Chunks]);
>>   InSize ->
>>       chunkify(Bt, RestInList, RestSizes, Threshold, [InElement |  
>> OutAcc],
>>           OutAccSize + InSize, Chunks)
>>   end.
>

Re: Patch to couch_btree:chunkify

Posted by Damien Katz <da...@apache.org>.

+1 for committing.

-Damien


On May 10, 2009, at 9:49 PM, Paul Davis wrote:

> Chris reminded me that I had an optimization patch laying around for
> couch_btree:chunkify and his tests show that it gets a bit of a speed
> increase when running some tests with hovercraft. The basic outline of
> what I did was to swap a call like term_to_binary([ListOfTuples]) to a
> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
> Size = sum(ListOfSizes), and then when we go through the list of
> tuples to split them into chunks I use the pre calculated sizes.
>
> Anyway, I just wanted to run it across the list before I commit it in
> case anyone sees anything subtle I might be missing.
>
> chunkify(_Bt, []) ->
>    [];
> chunkify(Bt, InList) ->
>    ToSize = fun(X) -> size(term_to_binary(X)) end,
>    SizeList = lists:map(ToSize, InList),
>    TotalSize = lists:sum(SizeList),
>    case TotalSize of
>    Size when Size > ?CHUNK_THRESHOLD ->
>        NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>        ChunkThreshold = Size div NumberOfChunksLikely,
>        chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>    _Else ->
>        [InList]
>    end.
>
> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>    lists:reverse(Chunks);
> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>    lists:reverse([lists:reverse(OutAcc) | Chunks]);
> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes],  
> Threshold, OutAcc,
>        OutAccSize, Chunks) ->
>    case InSize of
>    InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /=  
> [] ->
>        chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>            [lists:reverse([InElement | OutAcc]) | Chunks]);
>    InSize ->
>        chunkify(Bt, RestInList, RestSizes, Threshold, [InElement |  
> OutAcc],
>            OutAccSize + InSize, Chunks)
>    end.

Re: Patch to couch_btree:chunkify

Posted by Robert Dionne <di...@dionne-associates.com>.

I had played around with chunkify some time ago and seen about 4%. It  
may be worth looking at [1]. Also note that I don't think you need  
the variable _Bt, it's just passed around the recursion but never used.

Cheers,

Bob

[1] http://github.com/bdionne/couchdb/tree/btree




On May 10, 2009, at 9:56 PM, Chris Anderson wrote:

> On Sun, May 10, 2009 at 6:49 PM, Paul Davis  
> <pa...@gmail.com> wrote:
>> Chris reminded me that I had an optimization patch laying around for
>> couch_btree:chunkify and his tests show that it gets a bit of a speed
>> increase when running some tests with hovercraft. The basic  
>> outline of
>> what I did was to swap a call like term_to_binary([ListOfTuples])  
>> to a
>> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
>> Size = sum(ListOfSizes), and then when we go through the list of
>> tuples to split them into chunks I use the pre calculated sizes.
>>
>
> In my tests this gave a more than 10% speed boost both via Hovercraft
> and http benchmarking. I'm +1 on committing it.
>
>> Anyway, I just wanted to run it across the list before I commit it in
>> case anyone sees anything subtle I might be missing.
>>
>> chunkify(_Bt, []) ->
>>    [];
>> chunkify(Bt, InList) ->
>>    ToSize = fun(X) -> size(term_to_binary(X)) end,
>>    SizeList = lists:map(ToSize, InList),
>>    TotalSize = lists:sum(SizeList),
>>    case TotalSize of
>>    Size when Size > ?CHUNK_THRESHOLD ->
>>        NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>>        ChunkThreshold = Size div NumberOfChunksLikely,
>>        chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>>    _Else ->
>>        [InList]
>>    end.
>>
>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>>    lists:reverse(Chunks);
>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>>    lists:reverse([lists:reverse(OutAcc) | Chunks]);
>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes],  
>> Threshold, OutAcc,
>>        OutAccSize, Chunks) ->
>>    case InSize of
>>    InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /=  
>> [] ->
>>        chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>>            [lists:reverse([InElement | OutAcc]) | Chunks]);
>>    InSize ->
>>        chunkify(Bt, RestInList, RestSizes, Threshold, [InElement |  
>> OutAcc],
>>            OutAccSize + InSize, Chunks)
>>    end.
>>
>
>
>
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couch.io

Re: Patch to couch_btree:chunkify

Posted by Chris Anderson <jc...@apache.org>.

On Sun, May 10, 2009 at 6:49 PM, Paul Davis <pa...@gmail.com> wrote:
> Chris reminded me that I had an optimization patch laying around for
> couch_btree:chunkify and his tests show that it gets a bit of a speed
> increase when running some tests with hovercraft. The basic outline of
> what I did was to swap a call like term_to_binary([ListOfTuples]) to a
> sequence of ListOfSizes = lists:map(term_to_binary, ListOfTuples),
> Size = sum(ListOfSizes), and then when we go through the list of
> tuples to split them into chunks I use the pre calculated sizes.
>

In my tests this gave a more than 10% speed boost both via Hovercraft
and http benchmarking. I'm +1 on committing it.

> Anyway, I just wanted to run it across the list before I commit it in
> case anyone sees anything subtle I might be missing.
>
> chunkify(_Bt, []) ->
>    [];
> chunkify(Bt, InList) ->
>    ToSize = fun(X) -> size(term_to_binary(X)) end,
>    SizeList = lists:map(ToSize, InList),
>    TotalSize = lists:sum(SizeList),
>    case TotalSize of
>    Size when Size > ?CHUNK_THRESHOLD ->
>        NumberOfChunksLikely = ((Size div ?CHUNK_THRESHOLD) + 1),
>        ChunkThreshold = Size div NumberOfChunksLikely,
>        chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []);
>    _Else ->
>        [InList]
>    end.
>
> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) ->
>    lists:reverse(Chunks);
> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) ->
>    lists:reverse([lists:reverse(OutAcc) | Chunks]);
> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold, OutAcc,
>        OutAccSize, Chunks) ->
>    case InSize of
>    InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /= [] ->
>        chunkify(Bt, RestInList, RestSizes, Threshold, [], 0,
>            [lists:reverse([InElement | OutAcc]) | Chunks]);
>    InSize ->
>        chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | OutAcc],
>            OutAccSize + InSize, Chunks)
>    end.
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io