You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2019/05/15 00:02:03 UTC

[GitHub] [bookkeeper] vicaya commented on issue #2069: Use pure python implementation of MurmurHash

vicaya commented on issue #2069: Use pure python implementation of MurmurHash
URL: https://github.com/apache/bookkeeper/pull/2069#issuecomment-492453989
 
 
   @merlimat, where is your pymmh3 source repo? I just tried it. It works on python 2.7.x but fails on python 3.7.x:
   ```
   Python 3.7.3 (default, Apr  8 2019, 12:02:14) 
   [Clang 10.0.1 (clang-1001.0.46.3)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import pymmh3
   >>> pymmh3.hash64("foo")
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   AttributeError: module 'pymmh3' has no attribute 'hash64'
   >>> 
   ```
   This is due to a bug in `pymmh3/__init__.py`: `from pymmh3 import *` missing a dot.
   
   A quick microbenchmark with `perf` shows that the pure python impl ranges from being 16x slower for a short message ("foo"), 44x slower for a medium message (42 chars), to 710x slower for a long message (1512 chars (The Gettysburg Address)) than mmh3:
   ```
   $ python3 simple.py 
   .....................
   short mmh3: Mean +- std dev: 429 ns +- 26 ns
   .....................
   short pymmh3: Mean +- std dev: 6.85 us +- 0.08 us
   .....................
   medium mmh3: Mean +- std dev: 426 ns +- 8 ns
   .....................
   medium pymmh3: Mean +- std dev: 18.6 us +- 0.4 us
   .....................
   long mmh3: Mean +- std dev: 705 ns +- 14 ns
   .....................
   long pymmh3: Mean +- std dev: 501 us +- 21 us
   ```
   Since the bk client only impacts pulsar functions, which is deployed inside official broker containers, which would always have mmh3 installed, there should be no perf impact in usual deployment. The slow down only happens when people what to custom build smaller broker container images.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services