You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2022/07/06 16:19:46 UTC
[GitHub] [couchdb] nickva opened a new pull request, #4091: Optimize couch_util:to_hex/1
nickva opened a new pull request, #4091:
URL: https://github.com/apache/couchdb/pull/4091
When profiling [1] a cluster acting as a replication source, noticed a lot of time spent in `couch_util:to_hex/1`. That function is used to emit revision ids amongst other things. When processing a million documents with a 1000 revisions
each, it ends up in the hotpath, so to speak.
Remembering that in Erlang/OTP 24 there is a new [`binary:encode_hex/1`](https://www.erlang.org/doc/man/binary.html#encode_hex-1) function, decided to benchmark ours vs the OTP implementation. It turns ours is slower [2] so let's try to use the OTP one.
One difference from the OTP's version is ours emits lower case hex letters, while the OTP one emits upper case ones.
As a bonus, replaced a few calls to `couch_util:to_hex/1` wrapped in `?l2b/1` or `list_to_binary/1` with just a single call to `couch_util:to_hex_bin/1`.
Existing `couch_util:to_hex/1` version, returning a list ,was left as is and just calls `to_hex_bin/1` internally and converts the result to a list.
[1]
```
> ... eprof:analyze(total,[{sort, time}, {filter, [{time, 1000}, {calls, 100}]}]).
FUNCTION CALLS % TIME [uS / CALLS]
-------- ----- ------- ---- [----------]
...
couch_doc:revid_to_str/1 1165641 1.71 402860 [ 0.35]
couch_key_tree:get_key_leafs_simple/4 1304102 1.85 435700 [ 0.33]
erlang:list_to_integer/2 873172 2.00 471140 [ 0.54]
gen_server:loop/7 62209 2.11 496932 [ 7.99]
couch_key_tree:map_simple/3 1829235 2.36 554429 [ 0.30]
couch_util:nibble_to_hex/1 37334650 16.67 3917127 [ 0.10]
couch_util:to_hex/1 19834192 34.37 8077050 [ 0.41]
--------------------------------------------------------------- -------- ------- -------- [----------]
Total: 91072005 100.00% 23503072 [ 0.26]
```
[2] `to_hex1/1` is the existing version and `to_hex3/1` is the OTP version.
```
% ~/src/erlperf/erlperf 'hex:to_hex1(<<210,90,95,232,68,185,66,248,160,33,184,103,181,221,158,96>>).' 'hex:to_hex3(<<210,90,95,232,68,185,66,248,160,33,184,103,181,221,158,96>>).'
Code || QPS Time Rel
hex:to_hex3(<<210,90,95,232,68,185,66,248,160,33,184,103,181,221,158,96>>). 1 2746 Ki 364 ns 100%
hex:to_hex1(<<210,90,95,232,68,185,66,248,160,33,184,103,181,221,158,96>>). 1 1593 Ki 627 ns 58%
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [couchdb] davisp commented on a diff in pull request #4091: Optimize couch_util:to_hex/1
Posted by GitBox <gi...@apache.org>.
davisp commented on code in PR #4091:
URL: https://github.com/apache/couchdb/pull/4091#discussion_r915060435
##########
src/couch/src/couch_util.erl:
##########
@@ -212,29 +212,36 @@ validate_utf8_fast(B, O) ->
false
end.
-to_hex(<<Hi:4, Lo:4, Rest/binary>>) ->
- [nibble_to_hex(Hi), nibble_to_hex(Lo) | to_hex(Rest)];
-to_hex(<<>>) ->
- [];
+to_hex(Binary) when is_binary(Binary) ->
+ binary_to_list(to_hex_bin(Binary));
to_hex(List) when is_list(List) ->
to_hex(list_to_binary(List)).
Review Comment:
Do we need this clause to recurse back into to_hex/1 just to have the binary turned back into a list? It'd change the error a bit if List isn't convertible to binary I guess? This is just me thinking out loud, I don't have a direct preference really.
##########
src/couch/src/couch_util.erl:
##########
@@ -792,3 +799,31 @@ version_to_binary(Ver) when is_list(Ver) ->
Ver1 = lists:reverse(lists:dropwhile(IsZero, lists:reverse(Ver))),
Ver2 = [erlang:integer_to_list(N) || N <- Ver1],
?l2b(lists:join(".", Ver2)).
+
+-compile({inline, [hex/1]}).
+
+%% erlfmt-ignore
+hex(X) ->
+ % Table of all pairs of hex characters for a byte:
Review Comment:
Might add a note that we're encoding the hex as ascii directly here. Took me a few minutes to realize what was going on.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [couchdb] nickva merged pull request #4091: Optimize couch_util:to_hex/1
Posted by GitBox <gi...@apache.org>.
nickva merged PR #4091:
URL: https://github.com/apache/couchdb/pull/4091
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [couchdb] nickva commented on a diff in pull request #4091: Optimize couch_util:to_hex/1
Posted by GitBox <gi...@apache.org>.
nickva commented on code in PR #4091:
URL: https://github.com/apache/couchdb/pull/4091#discussion_r915067040
##########
src/couch/src/couch_util.erl:
##########
@@ -212,29 +212,36 @@ validate_utf8_fast(B, O) ->
false
end.
-to_hex(<<Hi:4, Lo:4, Rest/binary>>) ->
- [nibble_to_hex(Hi), nibble_to_hex(Lo) | to_hex(Rest)];
-to_hex(<<>>) ->
- [];
+to_hex(Binary) when is_binary(Binary) ->
+ binary_to_list(to_hex_bin(Binary));
to_hex(List) when is_list(List) ->
to_hex(list_to_binary(List)).
Review Comment:
Oh, it does look silly this way. I'd just have it call to_hex_bin/1 directly it would look cleaner that way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [couchdb] nickva commented on a diff in pull request #4091: Optimize couch_util:to_hex/1
Posted by GitBox <gi...@apache.org>.
nickva commented on code in PR #4091:
URL: https://github.com/apache/couchdb/pull/4091#discussion_r915063424
##########
src/couch/src/couch_util.erl:
##########
@@ -792,3 +799,31 @@ version_to_binary(Ver) when is_list(Ver) ->
Ver1 = lists:reverse(lists:dropwhile(IsZero, lists:reverse(Ver))),
Ver2 = [erlang:integer_to_list(N) || N <- Ver1],
?l2b(lists:join(".", Ver2)).
+
+-compile({inline, [hex/1]}).
+
+%% erlfmt-ignore
+hex(X) ->
+ % Table of all pairs of hex characters for a byte:
Review Comment:
good idea
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org