You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Elliott Sims (JIRA)" <ji...@apache.org> on 2019/05/14 09:06:00 UTC

[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

    [ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839229#comment-16839229 ] 

Elliott Sims commented on CASSANDRA-13292:
------------------------------------------

In terms of hash algorithms, a cryptographic hash is one that's expensive to invert and it doesn't necessarily affect collision probabilities. For digests, I don't think difficulty of inversion matters at all since it's definitely not trying to hide the original data or protect against deliberate corruption.

What does matter is output size and distribution.  So, any fast 128-bit hash with good distribution should be equivalent to MD5:  Murmur3F (faster than md5 but slower than the rest, well-supported, greenrobot implementation claims to be much faster than guava), 
xxH3 (fast, brand new/unstable, possible collisions), 
Farmhash128,
Spookyhash128

Default/reference implementations seem to all be in C/C++ along with most benchmarks, so "best/fastest" may not be the same as "best/fastest in available Java libraries with compatible licenses"

> Replace MessagingService usage of MD5 with something more modern
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-13292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>            Priority: Normal
>         Attachments: quorum-concurrency-reads-quorum.svg
>
>
> While profiling C* via multiple profilers, I've consistently seen a significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace	Sample Count	Percentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)	264	1.566
>    sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)	200	1.187
>       sun.security.provider.DigestBase.engineUpdate(byte[], int, int)	200	1.187
>          java.security.MessageDigestSpi.engineUpdate(ByteBuffer)	200	1.187
>             java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer)	200	1.187
>                java.security.MessageDigest.update(ByteBuffer)	200	1.187
>                   org.apache.cassandra.db.Column.updateDigest(MessageDigest)	193	1.145
>                      org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest)	193	1.145
>                         org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily)	193	1.145
>                            org.apache.cassandra.service.RowDigestResolver.resolve()	106	0.629
>                               org.apache.cassandra.service.RowDigestResolver.resolve()	106	0.629
>                                  org.apache.cassandra.service.ReadCallback.get()	88	0.522
>                                     org.apache.cassandra.service.AbstractReadExecutor.get()	88	0.522
>                                        org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)	88	0.522
>                                           org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)	88	0.522
>                                              org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, ConsistencyLevel, boolean)	88	0.522
>                                                 org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)	88	0.522
>                                                    org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)	88	0.522
>                                                       org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, QueryOptions)	88	0.522
>                                                          org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, QueryOptions)	88	0.522
>                                                             org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, QueryState, QueryOptions)	88	0.522
>                                                                org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, QueryOptions)	88	0.522
>                                                                   org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)	88	0.522
>                                                                      org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, MessageEvent)	88	0.522
>                                                                         org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext, ChannelEvent)	88	0.522
>                                                                            org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, ChannelEvent)	88	0.522
>                                                                               org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)	88	0.522
>                                                                                  org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun()	88	0.522
>                                                                                     org.jboss.netty.handler.execution.ChannelEventRunnable.run()	88	0.522
>                                                                                        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	88	0.522
>                                                                                           java.util.concurrent.ThreadPoolExecutor$Worker.run()	88	0.522
>                                                                                              java.lang.Thread.run()	88	0.522
> {code}
> Pending CASSANDRA-13291, it would be pretty easy to:
> # Switch out the hashing implementation from MD5 to implementations such as adler128 and murmur3_128 (but certainly not limited to) and do some profiling to compare the net improvement on latencies and CPU usage
> # As we can't switch the algorithm from MD5 without breaking things, we could rev the MessagingService protocol version -- like we already do for things like switching from Snappy compression -> LZ4, we could switch to the new hashing implementation once all peers in the node are upgraded and support the new MessagingService version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org