You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Adam Kocoloski (JIRA)" <ji...@apache.org> on 2010/05/04 21:19:55 UTC

[jira] Commented: (COUCHDB-757) crypto:md5 vs erlang:md5

    [ https://issues.apache.org/jira/browse/COUCHDB-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863955#action_12863955 ] 

Adam Kocoloski commented on COUCHDB-757:
----------------------------------------

Cool, nice find Filipe.  In the interest of not relying too heavily on crypto we could add couch_util:md5*, e.g.

md5(Data) ->
    try crypto:md5(Data) catch error:_ -> erlang:md5(Data) end.

I didn't notice any performance hit from the extra function call and try..catch wrapper.  Of course CouchDB still depends on crypto in other places, but at least this patch wouldn't tie us any more closely to it.

> crypto:md5 vs erlang:md5
> ------------------------
>
>                 Key: COUCHDB-757
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-757
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: GNU/Linux
>            Reporter: Filipe Manana
>         Attachments: crypto_md5.patch
>
>
> Just noticed that crypto:md5 is faster than erlang:md5 by about an order of magnitude when hashing just 8Kb or 4Kb of data.
> Basically we use md5 hashing when writing and reading documents and attachments through couch_file and couch_stream.
> Eshell V5.8  (abort with ^G)
> 1> crypto:start().
> ok
> 2> Bin1 = crypto:rand_bytes(4 * 1024).
> <<92,239,233,29,1,237,96,193,188,97,4,72,51,90,96,91,187,
>   112,112,198,7,173,105,99,205,65,105,94,144,...>>
> 3>        
> 3> {T1, _} = timer:tc(erlang, md5, [Bin1]).
> {211,
>  <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
>    131,230>>}
> 4> 
> 4> {T2, _} = timer:tc(crypto, md5, [Bin1]).
> {60,
>  <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
>    131,230>>}
> 5> 
> 5> Bin2 = crypto:rand_bytes(8 * 1024).     
> <<246,66,158,227,62,127,62,239,202,232,133,244,191,9,136,
>   6,164,179,109,166,253,41,144,185,177,39,177,88,142,...>>
> 6> 
> 6> {T3, _} = timer:tc(erlang, md5, [Bin2]).
> {446,
>  <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
> 7> 
> 7> {T4, _} = timer:tc(crypto, md5, [Bin2]).
> {77,
>  <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
> 8> 
> I know there's a ticket around with the goal of the possibility to remove the dependency on the crypto module, but for environments where this is not a problem it would be a plus.
> Made a test that wrote 400 attachments with about 60Kbs and noticed an average response time of 0.16s versus 0.18s (erlang:md5).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.