You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/08/06 20:52:00 UTC

[jira] [Commented] (ARROW-3002) [Python] Implement better DataType hash function

    [ https://issues.apache.org/jira/browse/ARROW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570760#comment-16570760 ] 

Wes McKinney commented on ARROW-3002:
-------------------------------------

Well, technically it is not a bug. Hash functions are allowed to have collisions. It would be better for this not to generate a collision though

> [Python] Implement better DataType hash function
> ------------------------------------------------
>
>                 Key: ARROW-3002
>                 URL: https://issues.apache.org/jira/browse/ARROW-3002
>             Project: Apache Arrow
>          Issue Type: Wish
>            Reporter: Sam Oluwalana
>            Priority: Minor
>
> {code:python}
> >>> x = pa.field('record', pa.struct([pa.field('x', pa.int32(), nullable=False)]))
> >>> y = pa.field('record', pa.struct([pa.field('x', pa.int32(), nullable=True)]))
> >>> z = pa.field('record', pa.struct([pa.field('x', pa.int32(), nullable=True)]))
> >>> x.__hash__()
> -9223372036569171727
> >>> y.__hash__()
> 285604054
> >>> z.__hash__()
> 285604076
> >>> x.type
> StructType(struct<x: int32>)
> >>> x.type.__hash__()
> 429437081997812647
> >>> y.type.__hash__()
> 429437081997812647
> >>> x
> pyarrow.Field<record: struct<x: int32>>
> >>> y
> pyarrow.Field<record: struct<x: int32>>
> {code}
> Expected: 
> y.__hash__() should be the same as z.__hash__()
> x.type.__hash__() should be different than y.type.__hash__()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)