You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2020/10/31 01:25:00 UTC

[jira] [Commented] (BEAM-10920) Investigate python hash libraries

    [ https://issues.apache.org/jira/browse/BEAM-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223966#comment-17223966 ] 

Valentyn Tymofieiev commented on BEAM-10920:
--------------------------------------------

mmh3 currently does not release python wheels[1], which makes it challenging to install on some platforms since installation it needs a compiler (e.g. gcc). I reached out to the maintainer to see if they would consider adding wheels. Keeping it as optional  dependency for now sounds appropriate. Good to know that sklearn also has an implementation of this functionality.

https://pypi.org/project/mmh3/#files 

> Investigate python hash libraries
> ---------------------------------
>
>                 Key: BEAM-10920
>                 URL: https://issues.apache.org/jira/browse/BEAM-10920
>             Project: Beam
>          Issue Type: Bug
>          Components: dependencies, sdk-py-core
>            Reporter: Monica Song
>            Priority: P3
>
> stats.ApproximateUnique has an optional mmh3 dependency [1] (mmh3 is roughly 9xs faster than md5), but if that repository is problematic for users, we should look into alternatives.
> Other options: sklearn.utils.murmurhash3_32
>   [1][https://github.com/hajimes/mmh3,] [https://pypi.org/project/mmh3/2.0/]
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)