You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:18:20 UTC

[jira] [Created] (TRAFODION-823) LP Bug: 1402031 - Update stats gets wrong row counts with the first run after loading data

Alice Chen created TRAFODION-823:
------------------------------------

             Summary: LP Bug: 1402031 - Update stats gets wrong row counts with the first run after loading data
                 Key: TRAFODION-823
                 URL: https://issues.apache.org/jira/browse/TRAFODION-823
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
            Reporter: Weishiun Tsai
            Assignee: Barry Fritchman
            Priority: Critical
             Fix For: 2.0-incubating


This problem has been seen several times in the past few builds, the latest one being the v1209_0830 build.  When QA populates the g_tpch2x tables on QA clusters, the scripts load the data, do select count(*) on each table, and then run update stats on each table.  The select count(*) prior to the update stats show the correct row counts.  But showstats after update stats show that the row count in the stats for each table is way off.  Some tables get more row counts than the actual row counts and some less.

Rerunning the same set of update stats statements often correct this situation.  But this is causing a huge problem for testing.  When the stats are this bad, some of the larger queries using these tables would lapse back to nested join and would then hang for 20 hours without finishing.  

This problem can’t be reliably reproduced on all clusters, but it does show up frequently enough to cause problems.  This case is created to document this problem.  Investigation needs to be done in the implementation of update stats to see why select count(*) gets the correct counts while update stats afterwards does not.

=======================================================

Here is the execution output of the select count(*) statements and the update stats statements, in that order, after the data loading.

SQL>select count(*) from region;
(EXPR)
--------------------
                   5

--- 1 row(s) selected.

SQL>select count(*) from nation;
(EXPR)
--------------------
                  25

--- 1 row(s) selected.

SQL>select count(*) from supplier;
(EXPR)
--------------------
               20000

--- 1 row(s) selected.

SQL>select count(*) from customer;
(EXPR)
--------------------
              300000

--- 1 row(s) selected.

SQL>select count(*) from part;
(EXPR)
--------------------
              400000

--- 1 row(s) selected.

SQL>select count(*) from partsupp;
(EXPR)
--------------------
             1600000

--- 1 row(s) selected.

SQL>select count(*) from orders;
(EXPR)
--------------------
             3000000

--- 1 row(s) selected.

SQL>select count(*) from lineitem;
(EXPR)
--------------------
            11997996

--- 1 row(s) selected.

-------------------------------------------------------------------------------
== TEST: tcase.test003
-------------------------------------------------------------------------------
SQL>update statistics for table region on every column;
--- SQL operation complete.

SQL>update statistics for table nation on every column;
--- SQL operation complete.

SQL>update statistics for table supplier on every column;
--- SQL operation complete.

SQL>update statistics for table customer on every column;
--- SQL operation complete.

SQL>update statistics for table part on every column;
--- SQL operation complete.

SQL>update statistics for table partsupp on every column;
--- SQL operation complete.

SQL>update statistics for table orders on every column sample random 10 percent;
--- SQL operation complete.

SQL>update statistics for table lineitem on every column sample random 10 percent;
--- SQL operation complete.

=======================================================

Here is the showstats output for each table after update stats statements were first run.  The row counts are completely off.

>>set schema g_tpch2x;

--- SQL operation complete.
>>showstats for table CUSTOMER on every column;

Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
Table ID: 314741800727389741

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1392898339     48     1100767     1100767 C_CUSTKEY
1392898334     62     1100767     1100767 C_NAME
1392898329     62     1100767     1100767 C_ADDRESS
1392898324     25     1100767           0 C_NATIONKEY
1392898319     62     1100767     1100767 C_PHONE
1392898314     36     1100767           6 C_ACCTBAL
1392898309      5     1100767           0 C_MKTSEGMENT
1392898304     62     1100767           0 C_COMMENT
1392898299      8     1100767           0 "_SALT_"
1392898344      1     1100767     1100767 "_SALT_", C_CUSTKEY


--- SQL operation complete.
>>showstats for table LINEITEM on every column;

Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
Table ID: 314741800727390211

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1537555901     28    12720858     2555961 L_ORDERKEY
1537555896     52    12720858      399694 L_PARTKEY
1537555891     48    12720858       20000 L_SUPPKEY
1537555886      7    12720858           7 L_LINENUMBER
1537555881     50    12720858          50 L_QUANTITY
1537555876     39    12720858      939869 L_EXTENDEDPRICE
1537555871     11    12720858          11 L_DISCOUNT
1537555866      9    12720858           9 L_TAX
1537555861      3    12720858           3 L_RETURNFLAG
1537555856      2    12720858           2 L_LINESTATUS
1537555851     50    12720858        2526 L_SHIPDATE
1537555846     50    12720858        2466 L_COMMITDATE
1537555841     50    12720858        2550 L_RECEIPTDATE
1537555836      4    12720858           4 L_SHIPINSTRUCT
1537555831      7    12720858           7 L_SHIPMODE
1537555826     62    12720858     7447670 L_COMMENT
1537555821      8    12720858           8 "_SALT_"
1537555906      1    12720858    12720858 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER


--- SQL operation complete.
>>showstats for table NATION on every column;

Histogram data for Table TRAFODION.G_TPCH2X.NATION
Table ID: 314741800727389515

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1374739566     25          25          25 N_NATIONKEY
1374739561     25          25          25 N_NAME
1374739556      5          25           5 N_REGIONKEY
1374739551     25          25          25 N_COMMENT
1374739546      8          25           8 "_SALT_"
1374739571      1          25          25 "_SALT_", N_NATIONKEY


--- SQL operation complete.
>>showstats for table ORDERS on every column;

Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
Table ID: 314741800727390091

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1464778717     42     4585155     4585155 O_ORDERKEY
1464778712     56     4585155      193098 O_CUSTKEY
1464778707      3     4585155           3 O_ORDERSTATUS
1464778702     57     4585155     4045135 O_TOTALPRICE
1464778697     50     4585155        2406 O_ORDERDATE
1464778692      5     4585155           5 O_ORDERPRIORITY
1464778687     62     4585155        2000 O_CLERK
1464778682      1     4585155           1 O_SHIPPRIORITY
1464778677     62     4585155     4259632 O_COMMENT
1464778672      8     4585155           8 "_SALT_"
1464778722      1     4585155     4585155 "_SALT_", O_ORDERKEY


--- SQL operation complete.
>>showstats for table PART on every column;

Histogram data for Table TRAFODION.G_TPCH2X.PART
Table ID: 314741800727389857

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1403671365     48     1725243     1725243 P_PARTKEY
1403671360     62     1725243     1279978 P_NAME
1403671355      5     1725243           0 P_MFGR
1403671350     25     1725243           0 P_BRAND
1403671345    150     1725243           0 P_TYPE
1403671340     50     1725243           0 P_SIZE
1403671335     40     1725243           0 P_CONTAINER
1403671330     55     1725243           0 P_RETAILPRICE
1403671325     76     1725243           0 P_COMMENT
1403671320      8     1725243           0 "_SALT_"
1403671370      1     1725243     1725243 "_SALT_", P_PARTKEY


--- SQL operation complete.
>>showstats for table PARTSUPP on every column;

Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
Table ID: 314741800727389978

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1433360836     48     2972264           0 PS_PARTKEY
1433360831     48     2972264           0 PS_SUPPKEY
1433360826     48     2972264           0 PS_AVAILQTY
1433360821     48     2972264           0 PS_SUPPLYCOST
1433360816     62     2972264     2205231 PS_COMMENT
1433360811      8     2972264           0 "_SALT_"
1433360841      1     2972264     2972264 "_SALT_", PS_PARTKEY, PS_SUPPKEY


--- SQL operation complete.
>>showstats for table REGION on every column;

Histogram data for Table TRAFODION.G_TPCH2X.REGION
Table ID: 314741800727389399

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1369726849      5           5           5 R_REGIONKEY
1369726844      5           5           5 R_NAME
1369726839      5           5           5 R_COMMENT
1369726834      3           5           3 "_SALT_"
1369726854      1           5           5 "_SALT_", R_REGIONKEY


--- SQL operation complete.
>>showstats for table SUPPLIER on every column;

Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
Table ID: 314741800727389626

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1380401507     48       64035       64035 S_SUPPKEY
1380401502     62       64035       64035 S_NAME
1380401497     62       64035       64035 S_ADDRESS
1380401492     25       64035           0 S_NATIONKEY
1380401487     62       64035       64035 S_PHONE
1380401482     56       64035           0 S_ACCTBAL
1380401477     62       64035       47479 S_COMMENT
1380401472      8       64035           0 "_SALT_"
1380401512      1       64035       64035 "_SALT_", S_SUPPKEY


--- SQL operation complete.

=======================================================

Here is the showstats output after rerunning the same set of update stats statements.  The row counts are now correct.

>>set schema g_tpch2x;

--- SQL operation complete.
>>showstats for table CUSTOMER on every column;

Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
Table ID: 314741800727389741

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1392898338     48      300000      300000 C_CUSTKEY
1392898335     62      300000      300000 C_NAME
1392898328     62      300000      300000 C_ADDRESS
1392898325     25      300000          25 C_NATIONKEY
1392898318     62      300000      300000 C_PHONE
1392898315     36      300000      262499 C_ACCTBAL
1392898308      5      300000           5 C_MKTSEGMENT
1392898305     62      300000      299263 C_COMMENT
1392898298      8      300000           8 "_SALT_"
1392898345      1      300000      300000 "_SALT_", C_CUSTKEY


--- SQL operation complete.
>>showstats for table LINEITEM on every column;

Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
Table ID: 314741800727390211

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1537555900     27    11997996     2536382 L_ORDERKEY
1537555897     60    11997996      399669 L_PARTKEY
1537555890     48    11997996       20000 L_SUPPKEY
1537555887      7    11997996           7 L_LINENUMBER
1537555880     50    11997996          50 L_QUANTITY
1537555877     40    11997996      940199 L_EXTENDEDPRICE
1537555870     11    11997996          11 L_DISCOUNT
1537555867      9    11997996           9 L_TAX
1537555860      3    11997996           3 L_RETURNFLAG
1537555857      2    11997996           2 L_LINESTATUS
1537555850     50    11997996        2526 L_SHIPDATE
1537555847     50    11997996        2466 L_COMMITDATE
1537555840     50    11997996        2549 L_RECEIPTDATE
1537555837      4    11997996           4 L_SHIPINSTRUCT
1537555830      7    11997996           7 L_SHIPMODE
1537555827     62    11997996     7101582 L_COMMENT
1537555820      8    11997996           8 "_SALT_"
1537555907      1    11997996    11997996 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER


--- SQL operation complete.
>>showstats for table NATION on every column;

Histogram data for Table TRAFODION.G_TPCH2X.NATION
Table ID: 314741800727389515

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1374739567     25          25          25 N_NATIONKEY
1374739560     25          25          25 N_NAME
1374739557      5          25           5 N_REGIONKEY
1374739550     25          25          25 N_COMMENT
1374739547      8          25           8 "_SALT_"
1374739570      1          25          25 "_SALT_", N_NATIONKEY


--- SQL operation complete.
>>showstats for table ORDERS on every column;

Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
Table ID: 314741800727390091

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1464778716     30     3000000     3000000 O_ORDERKEY
1464778713     59     3000000      191492 O_CUSTKEY
1464778706      3     3000000           3 O_ORDERSTATUS
1464778703     53     3000000     2759999 O_TOTALPRICE
1464778696     50     3000000        2406 O_ORDERDATE
1464778693      5     3000000           5 O_ORDERPRIORITY
1464778686     62     3000000        2000 O_CLERK
1464778683      1     3000000           1 O_SHIPPRIORITY
1464778676     62     3000000     2898572 O_COMMENT
1464778673      8     3000000           8 "_SALT_"
1464778723      1     3000000     3000000 "_SALT_", O_ORDERKEY


--- SQL operation complete.
>>showstats for table PART on every column;

Histogram data for Table TRAFODION.G_TPCH2X.PART
Table ID: 314741800727389857

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1403671364     48      400000      400000 P_PARTKEY
1403671361     62      400000      399982 P_NAME
1403671354      5      400000           5 P_MFGR
1403671351     25      400000          25 P_BRAND
1403671344    150      400000         150 P_TYPE
1403671341     50      400000          50 P_SIZE
1403671334     40      400000          40 P_CONTAINER
1403671331     55      400000       22097 P_RETAILPRICE
1403671324     76      400000       69074 P_COMMENT
1403671321      8      400000           8 "_SALT_"
1403671371      1      400000      400000 "_SALT_", P_PARTKEY


--- SQL operation complete.
>>showstats for table PARTSUPP on every column;

Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
Table ID: 314741800727389978

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1433360837     48     1600000      400000 PS_PARTKEY
1433360830     48     1600000       20000 PS_SUPPKEY
1433360827     48     1600000        9999 PS_AVAILQTY
1433360820     48     1600000       99901 PS_SUPPLYCOST
1433360817     62     1600000     1599985 PS_COMMENT
1433360810      8     1600000           8 "_SALT_"
1433360840      1     1600000     1600000 "_SALT_", PS_PARTKEY, PS_SUPPKEY


--- SQL operation complete.
>>showstats for table REGION on every column;

Histogram data for Table TRAFODION.G_TPCH2X.REGION
Table ID: 314741800727389399

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1369726848      5           5           5 R_REGIONKEY
1369726845      5           5           5 R_NAME
1369726838      5           5           5 R_COMMENT
1369726835      3           5           3 "_SALT_"
1369726855      1           5           5 "_SALT_", R_REGIONKEY


--- SQL operation complete.
>>showstats for table SUPPLIER on every column;

Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
Table ID: 314741800727389626

   Hist ID # Ints    Rowcount         UEC Colname(s)
========== ====== =========== =========== ===========================
1380401506     48       20000       20000 S_SUPPKEY
1380401503     62       20000       20000 S_NAME
1380401496     62       20000       20000 S_ADDRESS
1380401493     25       20000          25 S_NATIONKEY
1380401486     62       20000       20000 S_PHONE
1380401483     56       20000       19803 S_ACCTBAL
1380401476     62       20000       19972 S_COMMENT
1380401473      8       20000           8 "_SALT_"
1380401513      1       20000       20000 "_SALT_", S_SUPPKEY


--- SQL operation complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)