You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Bankim Bhavsar (Jira)" <ji...@apache.org> on 2021/06/04 13:48:00 UTC

[jira] [Updated] (KUDU-3286) Add special handling for empty strings for Bloom filter predicate push down

     [ https://issues.apache.org/jira/browse/KUDU-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bankim Bhavsar updated KUDU-3286:
---------------------------------
    Docs Text: 
Updated hash computation for empty strings in the FastHash implementation to conform with the
handling in Apache Impala. For Bloom filter predicate pushdown feature that uses FastHash,
this makes the Kudu client older than version 1.15.0 incompatible with Kudu server version 1.15.0
and Kudu client version at or newer than 1.15.0 incompatible with Kudu server version earlier than
1.15.0. Both client library and Kudu server need to be updated to version 1.15.0 or above if using
the Bloom filter predicate feature.

Manifestations of this incompatibility are following messages in the logs:

- "Not implemented: call requires unsupported application feature flags: 4".
- "Not implemented: call requires unsupported application feature flags: 5".

> Add special handling for empty strings for Bloom filter predicate push down
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-3286
>                 URL: https://issues.apache.org/jira/browse/KUDU-3286
>             Project: Kudu
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Bankim Bhavsar
>            Assignee: Bankim Bhavsar
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Fast hash used with Bloom filter predicate pushdown has special handling for nullptr.
> [https://github.com/apache/kudu/blob/master/src/kudu/util/hash_util.h#L95]
> However there isn't any special handling for empty objects/strings. Fast hash for an empty string with seed=0 generates a hash value of 0. This doesn't set any bits in Bloom filter and as a result empty strings are reported as not present.
> Impala uses the direct bloom filter approach and includes special handling for empty strings.
> [https://github.com/apache/impala/blob/master/be/src/runtime/raw-value.inline.h#L352]
> This leads to discrepancy between Impala and Kudu and returns incorrect join results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)