You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Suresh Subbiah (JIRA)" <ji...@apache.org> on 2015/10/05 17:43:26 UTC

[jira] [Updated] (TRAFODION-823) LP Bug: 1402031 - Update stats gets wrong row counts with the first run after loading data

     [ https://issues.apache.org/jira/browse/TRAFODION-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Subbiah updated TRAFODION-823:
-------------------------------------
    Assignee: David Wayne Birdsall  (was: Barry Fritchman)

> LP Bug: 1402031 - Update stats gets wrong row counts with the first run after loading data
> ------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-823
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-823
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>            Reporter: Weishiun Tsai
>            Assignee: David Wayne Birdsall
>            Priority: Critical
>             Fix For: 2.0-incubating
>
>
> This problem has been seen several times in the past few builds, the latest one being the v1209_0830 build.  When QA populates the g_tpch2x tables on QA clusters, the scripts load the data, do select count(*) on each table, and then run update stats on each table.  The select count(*) prior to the update stats show the correct row counts.  But showstats after update stats show that the row count in the stats for each table is way off.  Some tables get more row counts than the actual row counts and some less.
> Rerunning the same set of update stats statements often correct this situation.  But this is causing a huge problem for testing.  When the stats are this bad, some of the larger queries using these tables would lapse back to nested join and would then hang for 20 hours without finishing.  
> This problem can’t be reliably reproduced on all clusters, but it does show up frequently enough to cause problems.  This case is created to document this problem.  Investigation needs to be done in the implementation of update stats to see why select count(*) gets the correct counts while update stats afterwards does not.
> =======================================================
> Here is the execution output of the select count(*) statements and the update stats statements, in that order, after the data loading.
> SQL>select count(*) from region;
> (EXPR)
> --------------------
>                    5
> --- 1 row(s) selected.
> SQL>select count(*) from nation;
> (EXPR)
> --------------------
>                   25
> --- 1 row(s) selected.
> SQL>select count(*) from supplier;
> (EXPR)
> --------------------
>                20000
> --- 1 row(s) selected.
> SQL>select count(*) from customer;
> (EXPR)
> --------------------
>               300000
> --- 1 row(s) selected.
> SQL>select count(*) from part;
> (EXPR)
> --------------------
>               400000
> --- 1 row(s) selected.
> SQL>select count(*) from partsupp;
> (EXPR)
> --------------------
>              1600000
> --- 1 row(s) selected.
> SQL>select count(*) from orders;
> (EXPR)
> --------------------
>              3000000
> --- 1 row(s) selected.
> SQL>select count(*) from lineitem;
> (EXPR)
> --------------------
>             11997996
> --- 1 row(s) selected.
> -------------------------------------------------------------------------------
> == TEST: tcase.test003
> -------------------------------------------------------------------------------
> SQL>update statistics for table region on every column;
> --- SQL operation complete.
> SQL>update statistics for table nation on every column;
> --- SQL operation complete.
> SQL>update statistics for table supplier on every column;
> --- SQL operation complete.
> SQL>update statistics for table customer on every column;
> --- SQL operation complete.
> SQL>update statistics for table part on every column;
> --- SQL operation complete.
> SQL>update statistics for table partsupp on every column;
> --- SQL operation complete.
> SQL>update statistics for table orders on every column sample random 10 percent;
> --- SQL operation complete.
> SQL>update statistics for table lineitem on every column sample random 10 percent;
> --- SQL operation complete.
> =======================================================
> Here is the showstats output for each table after update stats statements were first run.  The row counts are completely off.
> >>set schema g_tpch2x;
> --- SQL operation complete.
> >>showstats for table CUSTOMER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
> Table ID: 314741800727389741
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1392898339     48     1100767     1100767 C_CUSTKEY
> 1392898334     62     1100767     1100767 C_NAME
> 1392898329     62     1100767     1100767 C_ADDRESS
> 1392898324     25     1100767           0 C_NATIONKEY
> 1392898319     62     1100767     1100767 C_PHONE
> 1392898314     36     1100767           6 C_ACCTBAL
> 1392898309      5     1100767           0 C_MKTSEGMENT
> 1392898304     62     1100767           0 C_COMMENT
> 1392898299      8     1100767           0 "_SALT_"
> 1392898344      1     1100767     1100767 "_SALT_", C_CUSTKEY
> --- SQL operation complete.
> >>showstats for table LINEITEM on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
> Table ID: 314741800727390211
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1537555901     28    12720858     2555961 L_ORDERKEY
> 1537555896     52    12720858      399694 L_PARTKEY
> 1537555891     48    12720858       20000 L_SUPPKEY
> 1537555886      7    12720858           7 L_LINENUMBER
> 1537555881     50    12720858          50 L_QUANTITY
> 1537555876     39    12720858      939869 L_EXTENDEDPRICE
> 1537555871     11    12720858          11 L_DISCOUNT
> 1537555866      9    12720858           9 L_TAX
> 1537555861      3    12720858           3 L_RETURNFLAG
> 1537555856      2    12720858           2 L_LINESTATUS
> 1537555851     50    12720858        2526 L_SHIPDATE
> 1537555846     50    12720858        2466 L_COMMITDATE
> 1537555841     50    12720858        2550 L_RECEIPTDATE
> 1537555836      4    12720858           4 L_SHIPINSTRUCT
> 1537555831      7    12720858           7 L_SHIPMODE
> 1537555826     62    12720858     7447670 L_COMMENT
> 1537555821      8    12720858           8 "_SALT_"
> 1537555906      1    12720858    12720858 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
> --- SQL operation complete.
> >>showstats for table NATION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.NATION
> Table ID: 314741800727389515
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1374739566     25          25          25 N_NATIONKEY
> 1374739561     25          25          25 N_NAME
> 1374739556      5          25           5 N_REGIONKEY
> 1374739551     25          25          25 N_COMMENT
> 1374739546      8          25           8 "_SALT_"
> 1374739571      1          25          25 "_SALT_", N_NATIONKEY
> --- SQL operation complete.
> >>showstats for table ORDERS on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
> Table ID: 314741800727390091
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1464778717     42     4585155     4585155 O_ORDERKEY
> 1464778712     56     4585155      193098 O_CUSTKEY
> 1464778707      3     4585155           3 O_ORDERSTATUS
> 1464778702     57     4585155     4045135 O_TOTALPRICE
> 1464778697     50     4585155        2406 O_ORDERDATE
> 1464778692      5     4585155           5 O_ORDERPRIORITY
> 1464778687     62     4585155        2000 O_CLERK
> 1464778682      1     4585155           1 O_SHIPPRIORITY
> 1464778677     62     4585155     4259632 O_COMMENT
> 1464778672      8     4585155           8 "_SALT_"
> 1464778722      1     4585155     4585155 "_SALT_", O_ORDERKEY
> --- SQL operation complete.
> >>showstats for table PART on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PART
> Table ID: 314741800727389857
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1403671365     48     1725243     1725243 P_PARTKEY
> 1403671360     62     1725243     1279978 P_NAME
> 1403671355      5     1725243           0 P_MFGR
> 1403671350     25     1725243           0 P_BRAND
> 1403671345    150     1725243           0 P_TYPE
> 1403671340     50     1725243           0 P_SIZE
> 1403671335     40     1725243           0 P_CONTAINER
> 1403671330     55     1725243           0 P_RETAILPRICE
> 1403671325     76     1725243           0 P_COMMENT
> 1403671320      8     1725243           0 "_SALT_"
> 1403671370      1     1725243     1725243 "_SALT_", P_PARTKEY
> --- SQL operation complete.
> >>showstats for table PARTSUPP on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
> Table ID: 314741800727389978
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1433360836     48     2972264           0 PS_PARTKEY
> 1433360831     48     2972264           0 PS_SUPPKEY
> 1433360826     48     2972264           0 PS_AVAILQTY
> 1433360821     48     2972264           0 PS_SUPPLYCOST
> 1433360816     62     2972264     2205231 PS_COMMENT
> 1433360811      8     2972264           0 "_SALT_"
> 1433360841      1     2972264     2972264 "_SALT_", PS_PARTKEY, PS_SUPPKEY
> --- SQL operation complete.
> >>showstats for table REGION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.REGION
> Table ID: 314741800727389399
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1369726849      5           5           5 R_REGIONKEY
> 1369726844      5           5           5 R_NAME
> 1369726839      5           5           5 R_COMMENT
> 1369726834      3           5           3 "_SALT_"
> 1369726854      1           5           5 "_SALT_", R_REGIONKEY
> --- SQL operation complete.
> >>showstats for table SUPPLIER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
> Table ID: 314741800727389626
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1380401507     48       64035       64035 S_SUPPKEY
> 1380401502     62       64035       64035 S_NAME
> 1380401497     62       64035       64035 S_ADDRESS
> 1380401492     25       64035           0 S_NATIONKEY
> 1380401487     62       64035       64035 S_PHONE
> 1380401482     56       64035           0 S_ACCTBAL
> 1380401477     62       64035       47479 S_COMMENT
> 1380401472      8       64035           0 "_SALT_"
> 1380401512      1       64035       64035 "_SALT_", S_SUPPKEY
> --- SQL operation complete.
> =======================================================
> Here is the showstats output after rerunning the same set of update stats statements.  The row counts are now correct.
> >>set schema g_tpch2x;
> --- SQL operation complete.
> >>showstats for table CUSTOMER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
> Table ID: 314741800727389741
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1392898338     48      300000      300000 C_CUSTKEY
> 1392898335     62      300000      300000 C_NAME
> 1392898328     62      300000      300000 C_ADDRESS
> 1392898325     25      300000          25 C_NATIONKEY
> 1392898318     62      300000      300000 C_PHONE
> 1392898315     36      300000      262499 C_ACCTBAL
> 1392898308      5      300000           5 C_MKTSEGMENT
> 1392898305     62      300000      299263 C_COMMENT
> 1392898298      8      300000           8 "_SALT_"
> 1392898345      1      300000      300000 "_SALT_", C_CUSTKEY
> --- SQL operation complete.
> >>showstats for table LINEITEM on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
> Table ID: 314741800727390211
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1537555900     27    11997996     2536382 L_ORDERKEY
> 1537555897     60    11997996      399669 L_PARTKEY
> 1537555890     48    11997996       20000 L_SUPPKEY
> 1537555887      7    11997996           7 L_LINENUMBER
> 1537555880     50    11997996          50 L_QUANTITY
> 1537555877     40    11997996      940199 L_EXTENDEDPRICE
> 1537555870     11    11997996          11 L_DISCOUNT
> 1537555867      9    11997996           9 L_TAX
> 1537555860      3    11997996           3 L_RETURNFLAG
> 1537555857      2    11997996           2 L_LINESTATUS
> 1537555850     50    11997996        2526 L_SHIPDATE
> 1537555847     50    11997996        2466 L_COMMITDATE
> 1537555840     50    11997996        2549 L_RECEIPTDATE
> 1537555837      4    11997996           4 L_SHIPINSTRUCT
> 1537555830      7    11997996           7 L_SHIPMODE
> 1537555827     62    11997996     7101582 L_COMMENT
> 1537555820      8    11997996           8 "_SALT_"
> 1537555907      1    11997996    11997996 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
> --- SQL operation complete.
> >>showstats for table NATION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.NATION
> Table ID: 314741800727389515
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1374739567     25          25          25 N_NATIONKEY
> 1374739560     25          25          25 N_NAME
> 1374739557      5          25           5 N_REGIONKEY
> 1374739550     25          25          25 N_COMMENT
> 1374739547      8          25           8 "_SALT_"
> 1374739570      1          25          25 "_SALT_", N_NATIONKEY
> --- SQL operation complete.
> >>showstats for table ORDERS on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
> Table ID: 314741800727390091
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1464778716     30     3000000     3000000 O_ORDERKEY
> 1464778713     59     3000000      191492 O_CUSTKEY
> 1464778706      3     3000000           3 O_ORDERSTATUS
> 1464778703     53     3000000     2759999 O_TOTALPRICE
> 1464778696     50     3000000        2406 O_ORDERDATE
> 1464778693      5     3000000           5 O_ORDERPRIORITY
> 1464778686     62     3000000        2000 O_CLERK
> 1464778683      1     3000000           1 O_SHIPPRIORITY
> 1464778676     62     3000000     2898572 O_COMMENT
> 1464778673      8     3000000           8 "_SALT_"
> 1464778723      1     3000000     3000000 "_SALT_", O_ORDERKEY
> --- SQL operation complete.
> >>showstats for table PART on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PART
> Table ID: 314741800727389857
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1403671364     48      400000      400000 P_PARTKEY
> 1403671361     62      400000      399982 P_NAME
> 1403671354      5      400000           5 P_MFGR
> 1403671351     25      400000          25 P_BRAND
> 1403671344    150      400000         150 P_TYPE
> 1403671341     50      400000          50 P_SIZE
> 1403671334     40      400000          40 P_CONTAINER
> 1403671331     55      400000       22097 P_RETAILPRICE
> 1403671324     76      400000       69074 P_COMMENT
> 1403671321      8      400000           8 "_SALT_"
> 1403671371      1      400000      400000 "_SALT_", P_PARTKEY
> --- SQL operation complete.
> >>showstats for table PARTSUPP on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
> Table ID: 314741800727389978
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1433360837     48     1600000      400000 PS_PARTKEY
> 1433360830     48     1600000       20000 PS_SUPPKEY
> 1433360827     48     1600000        9999 PS_AVAILQTY
> 1433360820     48     1600000       99901 PS_SUPPLYCOST
> 1433360817     62     1600000     1599985 PS_COMMENT
> 1433360810      8     1600000           8 "_SALT_"
> 1433360840      1     1600000     1600000 "_SALT_", PS_PARTKEY, PS_SUPPKEY
> --- SQL operation complete.
> >>showstats for table REGION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.REGION
> Table ID: 314741800727389399
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1369726848      5           5           5 R_REGIONKEY
> 1369726845      5           5           5 R_NAME
> 1369726838      5           5           5 R_COMMENT
> 1369726835      3           5           3 "_SALT_"
> 1369726855      1           5           5 "_SALT_", R_REGIONKEY
> --- SQL operation complete.
> >>showstats for table SUPPLIER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
> Table ID: 314741800727389626
>    Hist ID # Ints    Rowcount         UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1380401506     48       20000       20000 S_SUPPKEY
> 1380401503     62       20000       20000 S_NAME
> 1380401496     62       20000       20000 S_ADDRESS
> 1380401493     25       20000          25 S_NATIONKEY
> 1380401486     62       20000       20000 S_PHONE
> 1380401483     56       20000       19803 S_ACCTBAL
> 1380401476     62       20000       19972 S_COMMENT
> 1380401473      8       20000           8 "_SALT_"
> 1380401513      1       20000       20000 "_SALT_", S_SUPPKEY
> --- SQL operation complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)