You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (JIRA)" <ji...@apache.org> on 2019/06/22 17:16:00 UTC

[jira] [Commented] (IMPALA-8698) test_bloom_filters fails when run on seq/gzip/record table format

    [ https://issues.apache.org/jira/browse/IMPALA-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870312#comment-16870312 ] 

Fang-Yu Rao commented on IMPALA-8698:
-------------------------------------

Thanks for pointing out this Bikram!

It seems the reported test does not fail that often. According to my previous 6 tests listed in the following, the test TestBloomFilters::test_bloom_filters (corresponding to the test file at testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test) NEVER failed. Otherwise I would have modified the corresponding test file "bloom_filters.test" in my proposed patch set ([https://gerrit.cloudera.org/c/12974/]).

1. [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/511/]

2. [https://jenkins.impala.io/job/pre-review-test/378]

3. [https://jenkins.impala.io/job/pre-review-test/379]

4. [https://jenkins.impala.io/job/pre-review-test/380]

5. [https://jenkins.impala.io/job/pre-review-test/381]

6. [https://jenkins.impala.io/job/pre-review-test/38|https://jenkins.impala.io/job/pre-review-test/381]3

One quick workaround might be to add the following additional statement before the query to disable the newly added feature.

"SET DISABLE_HDFS_NUM_ROWS_ESTIMATE=1;"

On the other hand, I will also take a look at how Impala comes up with the expected size of a filter.

Please also let me know if you have any other good ideas.

> test_bloom_filters fails when run on seq/gzip/record table format
> -----------------------------------------------------------------
>
>                 Key: IMPALA-8698
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8698
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Bikramjeet Vig
>            Assignee: Fang-Yu Rao
>            Priority: Critical
>              Labels: broken-build
>
> test_bloom_filters seems to fail on the last test case when run on seq/gzip/record table format (database used: functional_seq_record_gzip) during exhaustive test runs.
> test case:
> {noformat}
> ---- QUERY
> ####################################################
> # Test case 4: Filter size is >= the min buffer size that can be allocated by the
> # buffer pool
> ####################################################
> SET RUNTIME_FILTER_MODE=GLOBAL;
> SET RUNTIME_FILTER_WAIT_TIME_MS=30000;
> SET RUNTIME_FILTER_MIN_SIZE=4KB;
> SET RUNTIME_BLOOM_FILTER_SIZE=4KB;
> # The min buffer size is set to 8KB for end to end tests. This query would
> # produce a 4KB filter if the min buffer size limit bound is not enforced.
> select STRAIGHT_JOIN count(*) from alltypes a join [SHUFFLE] alltypes b on a.id = b.id;
> ---- RESULTS
> 7300
> ---- RUNTIME_PROFILE
> row_regex: .*1 of 1 Runtime Filter Published.*
> row_regex: .*Filter 0 \(8.00 KB\).*
> ====
> {noformat}
> Expected size for Filter 0 is 8KB but the actual size comes out to 16 KB
> Bloom filter sizes are based on NDV estimates, and considering that the previous runs were successful and the failed run had the patch for IMPALA-7608 which affects stats, I suspect that might be the reason for the failure.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org