You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "David Wayne Birdsall (JIRA)" <ji...@apache.org> on 2016/06/07 15:49:21 UTC
[jira] [Resolved] (TRAFODION-823) LP Bug: 1402031 - Update stats
gets wrong row counts with the first run after loading data
[ https://issues.apache.org/jira/browse/TRAFODION-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Wayne Birdsall resolved TRAFODION-823.
--------------------------------------------
Resolution: Fixed
Fix Version/s: 2.1-incubating
> LP Bug: 1402031 - Update stats gets wrong row counts with the first run after loading data
> ------------------------------------------------------------------------------------------
>
> Key: TRAFODION-823
> URL: https://issues.apache.org/jira/browse/TRAFODION-823
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Reporter: Weishiun Tsai
> Assignee: David Wayne Birdsall
> Priority: Critical
> Fix For: 2.1-incubating
>
>
> This problem has been seen several times in the past few builds, the latest one being the v1209_0830 build. When QA populates the g_tpch2x tables on QA clusters, the scripts load the data, do select count(*) on each table, and then run update stats on each table. The select count(*) prior to the update stats show the correct row counts. But showstats after update stats show that the row count in the stats for each table is way off. Some tables get more row counts than the actual row counts and some less.
> Rerunning the same set of update stats statements often correct this situation. But this is causing a huge problem for testing. When the stats are this bad, some of the larger queries using these tables would lapse back to nested join and would then hang for 20 hours without finishing.
> This problem can’t be reliably reproduced on all clusters, but it does show up frequently enough to cause problems. This case is created to document this problem. Investigation needs to be done in the implementation of update stats to see why select count(*) gets the correct counts while update stats afterwards does not.
> =======================================================
> Here is the execution output of the select count(*) statements and the update stats statements, in that order, after the data loading.
> SQL>select count(*) from region;
> (EXPR)
> --------------------
> 5
> --- 1 row(s) selected.
> SQL>select count(*) from nation;
> (EXPR)
> --------------------
> 25
> --- 1 row(s) selected.
> SQL>select count(*) from supplier;
> (EXPR)
> --------------------
> 20000
> --- 1 row(s) selected.
> SQL>select count(*) from customer;
> (EXPR)
> --------------------
> 300000
> --- 1 row(s) selected.
> SQL>select count(*) from part;
> (EXPR)
> --------------------
> 400000
> --- 1 row(s) selected.
> SQL>select count(*) from partsupp;
> (EXPR)
> --------------------
> 1600000
> --- 1 row(s) selected.
> SQL>select count(*) from orders;
> (EXPR)
> --------------------
> 3000000
> --- 1 row(s) selected.
> SQL>select count(*) from lineitem;
> (EXPR)
> --------------------
> 11997996
> --- 1 row(s) selected.
> -------------------------------------------------------------------------------
> == TEST: tcase.test003
> -------------------------------------------------------------------------------
> SQL>update statistics for table region on every column;
> --- SQL operation complete.
> SQL>update statistics for table nation on every column;
> --- SQL operation complete.
> SQL>update statistics for table supplier on every column;
> --- SQL operation complete.
> SQL>update statistics for table customer on every column;
> --- SQL operation complete.
> SQL>update statistics for table part on every column;
> --- SQL operation complete.
> SQL>update statistics for table partsupp on every column;
> --- SQL operation complete.
> SQL>update statistics for table orders on every column sample random 10 percent;
> --- SQL operation complete.
> SQL>update statistics for table lineitem on every column sample random 10 percent;
> --- SQL operation complete.
> =======================================================
> Here is the showstats output for each table after update stats statements were first run. The row counts are completely off.
> >>set schema g_tpch2x;
> --- SQL operation complete.
> >>showstats for table CUSTOMER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
> Table ID: 314741800727389741
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1392898339 48 1100767 1100767 C_CUSTKEY
> 1392898334 62 1100767 1100767 C_NAME
> 1392898329 62 1100767 1100767 C_ADDRESS
> 1392898324 25 1100767 0 C_NATIONKEY
> 1392898319 62 1100767 1100767 C_PHONE
> 1392898314 36 1100767 6 C_ACCTBAL
> 1392898309 5 1100767 0 C_MKTSEGMENT
> 1392898304 62 1100767 0 C_COMMENT
> 1392898299 8 1100767 0 "_SALT_"
> 1392898344 1 1100767 1100767 "_SALT_", C_CUSTKEY
> --- SQL operation complete.
> >>showstats for table LINEITEM on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
> Table ID: 314741800727390211
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1537555901 28 12720858 2555961 L_ORDERKEY
> 1537555896 52 12720858 399694 L_PARTKEY
> 1537555891 48 12720858 20000 L_SUPPKEY
> 1537555886 7 12720858 7 L_LINENUMBER
> 1537555881 50 12720858 50 L_QUANTITY
> 1537555876 39 12720858 939869 L_EXTENDEDPRICE
> 1537555871 11 12720858 11 L_DISCOUNT
> 1537555866 9 12720858 9 L_TAX
> 1537555861 3 12720858 3 L_RETURNFLAG
> 1537555856 2 12720858 2 L_LINESTATUS
> 1537555851 50 12720858 2526 L_SHIPDATE
> 1537555846 50 12720858 2466 L_COMMITDATE
> 1537555841 50 12720858 2550 L_RECEIPTDATE
> 1537555836 4 12720858 4 L_SHIPINSTRUCT
> 1537555831 7 12720858 7 L_SHIPMODE
> 1537555826 62 12720858 7447670 L_COMMENT
> 1537555821 8 12720858 8 "_SALT_"
> 1537555906 1 12720858 12720858 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
> --- SQL operation complete.
> >>showstats for table NATION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.NATION
> Table ID: 314741800727389515
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1374739566 25 25 25 N_NATIONKEY
> 1374739561 25 25 25 N_NAME
> 1374739556 5 25 5 N_REGIONKEY
> 1374739551 25 25 25 N_COMMENT
> 1374739546 8 25 8 "_SALT_"
> 1374739571 1 25 25 "_SALT_", N_NATIONKEY
> --- SQL operation complete.
> >>showstats for table ORDERS on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
> Table ID: 314741800727390091
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1464778717 42 4585155 4585155 O_ORDERKEY
> 1464778712 56 4585155 193098 O_CUSTKEY
> 1464778707 3 4585155 3 O_ORDERSTATUS
> 1464778702 57 4585155 4045135 O_TOTALPRICE
> 1464778697 50 4585155 2406 O_ORDERDATE
> 1464778692 5 4585155 5 O_ORDERPRIORITY
> 1464778687 62 4585155 2000 O_CLERK
> 1464778682 1 4585155 1 O_SHIPPRIORITY
> 1464778677 62 4585155 4259632 O_COMMENT
> 1464778672 8 4585155 8 "_SALT_"
> 1464778722 1 4585155 4585155 "_SALT_", O_ORDERKEY
> --- SQL operation complete.
> >>showstats for table PART on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PART
> Table ID: 314741800727389857
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1403671365 48 1725243 1725243 P_PARTKEY
> 1403671360 62 1725243 1279978 P_NAME
> 1403671355 5 1725243 0 P_MFGR
> 1403671350 25 1725243 0 P_BRAND
> 1403671345 150 1725243 0 P_TYPE
> 1403671340 50 1725243 0 P_SIZE
> 1403671335 40 1725243 0 P_CONTAINER
> 1403671330 55 1725243 0 P_RETAILPRICE
> 1403671325 76 1725243 0 P_COMMENT
> 1403671320 8 1725243 0 "_SALT_"
> 1403671370 1 1725243 1725243 "_SALT_", P_PARTKEY
> --- SQL operation complete.
> >>showstats for table PARTSUPP on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
> Table ID: 314741800727389978
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1433360836 48 2972264 0 PS_PARTKEY
> 1433360831 48 2972264 0 PS_SUPPKEY
> 1433360826 48 2972264 0 PS_AVAILQTY
> 1433360821 48 2972264 0 PS_SUPPLYCOST
> 1433360816 62 2972264 2205231 PS_COMMENT
> 1433360811 8 2972264 0 "_SALT_"
> 1433360841 1 2972264 2972264 "_SALT_", PS_PARTKEY, PS_SUPPKEY
> --- SQL operation complete.
> >>showstats for table REGION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.REGION
> Table ID: 314741800727389399
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1369726849 5 5 5 R_REGIONKEY
> 1369726844 5 5 5 R_NAME
> 1369726839 5 5 5 R_COMMENT
> 1369726834 3 5 3 "_SALT_"
> 1369726854 1 5 5 "_SALT_", R_REGIONKEY
> --- SQL operation complete.
> >>showstats for table SUPPLIER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
> Table ID: 314741800727389626
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1380401507 48 64035 64035 S_SUPPKEY
> 1380401502 62 64035 64035 S_NAME
> 1380401497 62 64035 64035 S_ADDRESS
> 1380401492 25 64035 0 S_NATIONKEY
> 1380401487 62 64035 64035 S_PHONE
> 1380401482 56 64035 0 S_ACCTBAL
> 1380401477 62 64035 47479 S_COMMENT
> 1380401472 8 64035 0 "_SALT_"
> 1380401512 1 64035 64035 "_SALT_", S_SUPPKEY
> --- SQL operation complete.
> =======================================================
> Here is the showstats output after rerunning the same set of update stats statements. The row counts are now correct.
> >>set schema g_tpch2x;
> --- SQL operation complete.
> >>showstats for table CUSTOMER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.CUSTOMER
> Table ID: 314741800727389741
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1392898338 48 300000 300000 C_CUSTKEY
> 1392898335 62 300000 300000 C_NAME
> 1392898328 62 300000 300000 C_ADDRESS
> 1392898325 25 300000 25 C_NATIONKEY
> 1392898318 62 300000 300000 C_PHONE
> 1392898315 36 300000 262499 C_ACCTBAL
> 1392898308 5 300000 5 C_MKTSEGMENT
> 1392898305 62 300000 299263 C_COMMENT
> 1392898298 8 300000 8 "_SALT_"
> 1392898345 1 300000 300000 "_SALT_", C_CUSTKEY
> --- SQL operation complete.
> >>showstats for table LINEITEM on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.LINEITEM
> Table ID: 314741800727390211
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1537555900 27 11997996 2536382 L_ORDERKEY
> 1537555897 60 11997996 399669 L_PARTKEY
> 1537555890 48 11997996 20000 L_SUPPKEY
> 1537555887 7 11997996 7 L_LINENUMBER
> 1537555880 50 11997996 50 L_QUANTITY
> 1537555877 40 11997996 940199 L_EXTENDEDPRICE
> 1537555870 11 11997996 11 L_DISCOUNT
> 1537555867 9 11997996 9 L_TAX
> 1537555860 3 11997996 3 L_RETURNFLAG
> 1537555857 2 11997996 2 L_LINESTATUS
> 1537555850 50 11997996 2526 L_SHIPDATE
> 1537555847 50 11997996 2466 L_COMMITDATE
> 1537555840 50 11997996 2549 L_RECEIPTDATE
> 1537555837 4 11997996 4 L_SHIPINSTRUCT
> 1537555830 7 11997996 7 L_SHIPMODE
> 1537555827 62 11997996 7101582 L_COMMENT
> 1537555820 8 11997996 8 "_SALT_"
> 1537555907 1 11997996 11997996 "_SALT_", L_SHIPDATE, L_ORDERKEY, L_LINENUMBER
> --- SQL operation complete.
> >>showstats for table NATION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.NATION
> Table ID: 314741800727389515
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1374739567 25 25 25 N_NATIONKEY
> 1374739560 25 25 25 N_NAME
> 1374739557 5 25 5 N_REGIONKEY
> 1374739550 25 25 25 N_COMMENT
> 1374739547 8 25 8 "_SALT_"
> 1374739570 1 25 25 "_SALT_", N_NATIONKEY
> --- SQL operation complete.
> >>showstats for table ORDERS on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.ORDERS
> Table ID: 314741800727390091
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1464778716 30 3000000 3000000 O_ORDERKEY
> 1464778713 59 3000000 191492 O_CUSTKEY
> 1464778706 3 3000000 3 O_ORDERSTATUS
> 1464778703 53 3000000 2759999 O_TOTALPRICE
> 1464778696 50 3000000 2406 O_ORDERDATE
> 1464778693 5 3000000 5 O_ORDERPRIORITY
> 1464778686 62 3000000 2000 O_CLERK
> 1464778683 1 3000000 1 O_SHIPPRIORITY
> 1464778676 62 3000000 2898572 O_COMMENT
> 1464778673 8 3000000 8 "_SALT_"
> 1464778723 1 3000000 3000000 "_SALT_", O_ORDERKEY
> --- SQL operation complete.
> >>showstats for table PART on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PART
> Table ID: 314741800727389857
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1403671364 48 400000 400000 P_PARTKEY
> 1403671361 62 400000 399982 P_NAME
> 1403671354 5 400000 5 P_MFGR
> 1403671351 25 400000 25 P_BRAND
> 1403671344 150 400000 150 P_TYPE
> 1403671341 50 400000 50 P_SIZE
> 1403671334 40 400000 40 P_CONTAINER
> 1403671331 55 400000 22097 P_RETAILPRICE
> 1403671324 76 400000 69074 P_COMMENT
> 1403671321 8 400000 8 "_SALT_"
> 1403671371 1 400000 400000 "_SALT_", P_PARTKEY
> --- SQL operation complete.
> >>showstats for table PARTSUPP on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.PARTSUPP
> Table ID: 314741800727389978
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1433360837 48 1600000 400000 PS_PARTKEY
> 1433360830 48 1600000 20000 PS_SUPPKEY
> 1433360827 48 1600000 9999 PS_AVAILQTY
> 1433360820 48 1600000 99901 PS_SUPPLYCOST
> 1433360817 62 1600000 1599985 PS_COMMENT
> 1433360810 8 1600000 8 "_SALT_"
> 1433360840 1 1600000 1600000 "_SALT_", PS_PARTKEY, PS_SUPPKEY
> --- SQL operation complete.
> >>showstats for table REGION on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.REGION
> Table ID: 314741800727389399
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1369726848 5 5 5 R_REGIONKEY
> 1369726845 5 5 5 R_NAME
> 1369726838 5 5 5 R_COMMENT
> 1369726835 3 5 3 "_SALT_"
> 1369726855 1 5 5 "_SALT_", R_REGIONKEY
> --- SQL operation complete.
> >>showstats for table SUPPLIER on every column;
> Histogram data for Table TRAFODION.G_TPCH2X.SUPPLIER
> Table ID: 314741800727389626
> Hist ID # Ints Rowcount UEC Colname(s)
> ========== ====== =========== =========== ===========================
> 1380401506 48 20000 20000 S_SUPPKEY
> 1380401503 62 20000 20000 S_NAME
> 1380401496 62 20000 20000 S_ADDRESS
> 1380401493 25 20000 25 S_NATIONKEY
> 1380401486 62 20000 20000 S_PHONE
> 1380401483 56 20000 19803 S_ACCTBAL
> 1380401476 62 20000 19972 S_COMMENT
> 1380401473 8 20000 8 "_SALT_"
> 1380401513 1 20000 20000 "_SALT_", S_SUPPKEY
> --- SQL operation complete.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)