You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Alex Hagerman (JIRA)" <ji...@apache.org> on 2018/04/13 17:53:00 UTC

[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

    [ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437664#comment-16437664 ] 

Alex Hagerman commented on ARROW-2339:
--------------------------------------

[~pitrou] [~wesmckinn] sorry I've been absent on this work has had me tied up day and night hoping to work some more on this over the weekend. I was wondering if you had any thoughts on using xxHash, MumrurHash or FNV-1a for this? I was going to do some timing this weekend as well as testing for collisions on various ints as you mentioned on the original ticket. Do you know if we can use existing implementations of the hash from C or C++ with wrappers? I didn't know what ASF rules might be on that with regard to licenses (only ASF or MIT/BSD allowed) and adding the Cython wrappers to PyArrow. If it's better just to do a new implementation I'll work on that too, but didn't want to reinvent a wheel if I didn't need to.

> [Python] Add a fast path for int hashing
> ----------------------------------------
>
>                 Key: ARROW-2339
>                 URL: https://issues.apache.org/jira/browse/ARROW-2339
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Alex Hagerman
>            Assignee: Alex Hagerman
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)