You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/30 17:24:01 UTC

[jira] [Resolved] (IMPALA-1159) fnv_hash UDF initialized with 32 bits offset basis

     [ https://issues.apache.org/jira/browse/IMPALA-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-1159.
-----------------------------------
    Resolution: Won't Fix

We've moved away from using FNV anyway, doesn't seem worth enhancing it.

> fnv_hash UDF initialized with 32 bits offset basis
> --------------------------------------------------
>
>                 Key: IMPALA-1159
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1159
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.4
>         Environment: Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Thierry Herrmann
>            Priority: Minor
>              Labels: correctness, downgraded, incompatibility
>
> According to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
> the fnv_hash UDF implements the 64 bits FNV-1a variation.
> According to http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
> the algorithm should be seeded with the 64-bit FNV offset basis value: 14695981039346656037 (in hex, 0xcbf29ce484222325)
> Implementing this, I did not obtain the same FNV 1a hashes as Impala
> E.g. with impala-shell I obtain
> {code}
> +---------------------+
> | fnv_hash('hello')   |
> +---------------------+
> | 6414202926103426347 |
> +---------------------+
> {code}
> whereas it should be -6615550055289275125
> By looking at the Impala unit tests:
> https://github.com/cloudera/Impala/blob/8567b51f8c38bd389a338c761242a316d8ffe5c8/be/src/exprs/expr-test.cc
> Excerpt:
> {code}
> // Test fnv_hash
> string s("hello world");
> uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), HashUtil::FNV_SEED);
> TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);
> {code} 
> I see that the algorithm is seeded with the 32 bits offset basis
> instead of FNV64_SEED.
> If I update my algorithm and seed it with the 32 bits offset basis, I obtain the same hashes as impala.
> For backward compatibility, it may not be easy to fix. Or it could be deprecated and replaced with a fixed UDF ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org