You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (JIRA)" <ji...@apache.org> on 2019/06/28 22:17:00 UTC

[jira] [Commented] (IMPALA-8730) TestExplain.test_explain_validate_cardinality_estimates flakiness due to reliance on exact cardinality numbers.

    [ https://issues.apache.org/jira/browse/IMPALA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875266#comment-16875266 ] 

Fang-Yu Rao commented on IMPALA-8730:
-------------------------------------

Thanks for pointing this out Anurag.

After trying to execute the following SQL statement "SHOW TABLE STATS functional.alltypes;" from the Impala shell, I found that the table functional.alltypes DOES have statistical information, i.e., #Rows is not "-1" for each partition. That is, the cardinality returned by this SQL statement should be deterministic. However, the patch set for IMPALA-7608 ([https://gerrit.cloudera.org/c/12974/]) should not affect the code path when an hdfs table already has the statistical information and thus I suspect IMPALA-7608 might not cause this problem.

For easy reference, I also provide the returned results of the SQL statement in the following.

 

 
{code:java}
+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| year  | month | #Rows | #Files | Size     | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                                          |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| 2009  | 1     | 310   | 1      | 19.95KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=1  |

| 2009  | 2     | 280   | 1      | 18.12KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=2  |

| 2009  | 3     | 310   | 1      | 20.06KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=3  |

| 2009  | 4     | 300   | 1      | 19.61KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=4  |

| 2009  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5  |

| 2009  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=6  |

| 2009  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=7  |

| 2009  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=8  |

| 2009  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=9  |

| 2009  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=10 |

| 2009  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11 |

| 2009  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=12 |

| 2010  | 1     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=1  |

| 2010  | 2     | 280   | 1      | 18.39KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=2  |

| 2010  | 3     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=3  |

| 2010  | 4     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=4  |

| 2010  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=5  |

| 2010  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=6  |

| 2010  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=7  |

| 2010  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=8  |

| 2010  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=9  |

| 2010  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10 |

| 2010  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=11 |

| 2010  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=12 |

| Total |       | 7300  | 24     | 478.45KB | 0B           |                   |        |                   |                                                                   |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+{code}
In addition, I also tried to execute the following SQL statement to get the corresponding query plan, which is the same as the SQL statement issued in that EE test ([https://github.com/apache/impala/blob/master/tests/metadata/test_explain.py#L110-L113]).

 
{code:java}
SET EXPLAIN_LEVEL=3;
EXPLAIN SELECT * FROM functional.alltypes;{code}
The resulting query plan is provided as follows. It can be seen that the cardinality is 7.30K instead of 7.00K. That being said, if we've got '7.00K' in the test, then there seems to be some missing rows/partitions in that table (i.e., functional.alltypes).

 

 
{code:java}
+----------------------------------------------------------------------------------------+

| Explain String                                                                         |

+----------------------------------------------------------------------------------------+

| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3                            |

| Per-Host Resource Estimates: Memory=128MB                                              |

| Codegen disabled by planner                                                            |

| Analyzed query: SELECT * FROM functional.alltypes                                      |

|                                                                                        |

| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                  |

| Per-Host Resources: mem-estimate=490.49KB mem-reservation=0B thread-reservation=1      |

|   PLAN-ROOT SINK                                                                       |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                           |

|   |                                                                                    |

|   01:EXCHANGE [UNPARTITIONED]                                                          |

|      mem-estimate=490.49KB mem-reservation=0B thread-reservation=0                     |

|      tuple-ids=0 row-size=89B cardinality=7.30K                                        |

|      in pipelines: 00(GETNEXT)                                                         |

|                                                                                        |

| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                         |

| Per-Host Resources: mem-estimate=128.00MB mem-reservation=32.00KB thread-reservation=2 |

|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                           |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                           |

|   00:SCAN HDFS [functional.alltypes, RANDOM]                                           |

|      HDFS partitions=24/24 files=24 size=478.45KB                                      |

|      stored statistics:                                                                |

|        table: rows=7.30K size=478.45KB                                                 |

|        partitions: 24/24 rows=7.30K                                                    |

|        columns: all                                                                    |

|      extrapolated-rows=disabled max-scan-range-rows=310                                |

|      mem-estimate=128.00MB mem-reservation=32.00KB thread-reservation=1                |

|      tuple-ids=0 row-size=89B cardinality=7.30K                                        |

|      in pipelines: 00(GETNEXT)                                                         |

+----------------------------------------------------------------------------------------+
{code}
 

 

 

> TestExplain.test_explain_validate_cardinality_estimates flakiness due to reliance on exact cardinality numbers.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8730
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8730
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Anurag Mantripragada
>            Assignee: Fang-Yu Rao
>            Priority: Critical
>              Labels: broken-build
>
> Another flaky test uncovered due to IMPALA-7608. The test relies on exact cardinality numbers but after IMPALA-7608, these could be indeterministic. 
>  
> Below is the error message:
> {code:java}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates     check_cardinality(result.data, '7.30K') metadata/test_explain.py:98: in check_cardinality     query_result, expected_cardinality=expected_cardinality) metadata/test_explain.py:86: in check_row_size_and_cardinality     assert m.groups()[1] == expected_cardinality E   assert '7.00K' == '7.30K' E     - 7.00K E     ?   ^ E     + 7.30K E     ?   ^{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org