You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2022/06/06 22:20:49 UTC

[GitHub] [couchdb] nickva opened a new issue, #4051: See if it is possible to optimize a couch_util:reorder_results/2

nickva opened a new issue, #4051:
URL: https://github.com/apache/couchdb/issues/4051

   [NOTE]: # ( ^^ Provide a general summary of the request in the title above. ^^ )
   
   ## Summary
   
   [NOTE]: # ( Provide a brief overview of what the new feature is all about. )
   
   ## Desired Behaviour
   
   [NOTE]: # ( Tell us how the new feature should work. Be specific. )
   [TIP]:  # ( Do NOT give us access or passwords to your actual CouchDB! )
   
   ## Possible Solution
   
   [NOTE]: # ( Not required. Suggest how to implement the addition or change. )
   
   ## Additional context
   
   [TIP]:  # ( Why does this feature matter to you? What unique circumstances do you have? )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #4051:
URL: https://github.com/apache/couchdb/issues/4051#issuecomment-1148006831

   The quick and dirty patch:
   
   ```diff 
   diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
   --- a/src/couch/src/couch_util.erl
   +++ b/src/couch/src/couch_util.erl
   @@ -530,6 +531,17 @@ reorder_results(Keys, SortedResults) ->
   
   +% linear search is faster for small lists, length() is 0.5 ms for 100k list
   +reorder_results2(Keys, SortedResults, Cutoff, _DictType) when length(Keys) < Cutoff ->
   +    [couch_util:get_value(Key, SortedResults) || Key <- Keys];
   +reorder_results2(Keys, SortedResults, _Cutoff, map) ->
   +    Map = maps:from_list(SortedResults),
   +    [maps:get(Key, Map, undefined) || Key <- Keys];
   +reorder_results2(Keys, SortedResults, _Cutoff, dict) ->
   +    KeyDict = dict:from_list(SortedResults),
   +    [dict:fetch(Key, KeyDict) || Key <- Keys].
   +
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #4051:
URL: https://github.com/apache/couchdb/issues/4051#issuecomment-1150180223

   Optimization PR merged. Closing issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #4051:
URL: https://github.com/apache/couchdb/issues/4051#issuecomment-1148721671

   @iilyak neat ideas!
   
   For [couch_btree.erl](https://github.com/apache/couchdb/blob/0059b8f90e58e10b199a4b768a06a762d12a30d3/src/couch/src/couch_btree.erl#L272:L275) the issue is that we're sorting the input keys, so that's guaranteed, but the response assertion is that the results will be in the same order as the input Keys. We, for instance check if Keys is sorted already and skip the sort and final reorder.
   
   Good point about the length guard, it is O(N). I am thinking since map already falls back to an orddict (but with =:=) comparison semantics which we want to preserve, I think we can try to simply the code and just use map always.
   
   It seems with 2 elements in play, building the map vs fetching with couch_util is not that much different (map is even a tiny bit faster).
   
   ```
   (node1@127.0.0.1)31> f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
   (node1@127.0.0.1)4> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
   4494506
   (node1@127.0.0.1)5> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
   4467353
   (node1@127.0.0.1)6> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
   4216277
   (node1@127.0.0.1)7> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
   5699589
   (node1@127.0.0.1)8> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
   5796950
   (node1@127.0.0.1)9> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
   5816921
   ```
   
   (There changed the cutoff switch from 1 to 5, when it's 1 it should use the map, when 5 it should use couch_util).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
nickva commented on issue #4051:
URL: https://github.com/apache/couchdb/issues/4051#issuecomment-1148751719

   With just the map implementation and without the length/1 guard seems to do pretty well in most cases:
   
   ```
    f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
   ok
   (node1@127.0.0.1)5> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   6670253
   (node1@127.0.0.1)6> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   6441751
   (node1@127.0.0.1)7> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   6638078
   ```
   (Compared to 5816921 with the length guard)
   
   ```
   (node1@127.0.0.1)8> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
   ok
   (node1@127.0.0.1)10> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   12395
   (node1@127.0.0.1)11> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   12508
   (node1@127.0.0.1)12> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
   12420
   ```
   (Compared to 11639 with the length/1 guard)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] iilyak commented on issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
iilyak commented on issue #4051:
URL: https://github.com/apache/couchdb/issues/4051#issuecomment-1148600648

   Another possible optimization. 
   
   If the function is always called in the context when passed results are sorted (and sorting method is compatible with erlang sorting) then we can use `orddict:fetch/2` instead of `couch_util:get_value/2`.
   
   In case of `couch_btree:lookup/2 ` the sorting is guaranteed see [couch_btree.erl#L272:L275](https://github.com/apache/couchdb/blob/0059b8f90e58e10b199a4b768a06a762d12a30d3/src/couch/src/couch_btree.erl#L272:L275). 
   
   In case of `fabric_doc_update` it might be not the case (quick review is not conclusive). Which means our variable name `SortedResults` is misleading.  
   
   I wish erlang had `length_greater_than/2`. The `length/1` in the first clause still concerns me. However it seems like there is no option to remove it. Also IRC the keys are passed via API. It is unlikely we can get above 100k of elements.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva closed issue #4051: See if it is possible to optimize a couch_util:reorder_results/2

Posted by GitBox <gi...@apache.org>.
nickva closed issue #4051: See if it is possible to optimize a couch_util:reorder_results/2
URL: https://github.com/apache/couchdb/issues/4051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org