You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/07/09 17:52:00 UTC
[jira] [Work logged] (TRAFODION-3316) Fix some issues with incremental UPDATE STATISTICS

     [ https://issues.apache.org/jira/browse/TRAFODION-3316?focusedWorklogId=274273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-274273 ]

ASF GitHub Bot logged work on TRAFODION-3316:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jul/19 17:51
            Start Date: 09/Jul/19 17:51
    Worklog Time Spent: 10m 
      Work Description: DaveBirdsall commented on pull request #1848: [TRAFODION-3316] Three fixes to UPDATE STATISTICS
URL: https://github.com/apache/trafodion/pull/1848
 
 
   Three fixes to UPDATE STATISTICS:
   
   1. Add support for Hive TIMESTAMP data type. The only change needed is to tolerate a precision of 9 on a TIMESTAMP.
   
   2. When creating a sample table on a small Hive table, the SAMPLING_RATIO recorded in SB_PERSISTENT_SAMPLES is incorrect. This happened because the estimated size for the table was off by two orders of magnitude. We calculated the sampling ratio from the number of rows returned in the sample and this estimated size. There was logic to adjust the estimated size of the table as well from the statistics of the sample SELECT, however that logic was stubbed out because it didn't work. That logic has been removed -- the design premise was flawed anyway because it does not take into account the possibility that a key predicate will be used in incremental UPDATE STATISTICS. Instead, we now re-estimate the rows in the original table using the user-specified sampling ratio and the number of rows in the sample.
   
   Note: As part of this change, a parameter to HSFuncExecQuery is nearly obsolete. I did not replace it however because this function is called in over a hundred places in the code. Will leave that for a later clean-up.
   
   3. There was one place (extractTblName and its caller) where we were using a const char * pointer into an NAString object built on the stack that goes out of scope. This causes occasional failures. This has been fixed.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 274273)
            Time Spent: 10m
    Remaining Estimate: 0h

> Fix some issues with incremental UPDATE STATISTICS
> --------------------------------------------------
>
>                 Key: TRAFODION-3316
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3316
>             Project: Apache Trafodion
>          Issue Type: Improvement
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix some issues with incremental UPDATE STATISTICS.
>  # The Hive TIMESTAMP datatype has precision 9. When processing a Hive TIMESTAMP column, UPDATE STATISTICS abends.
>  # After fixing that, when testing a small example, the sampling ratio for the sample table is made incorrectly small, resulting in 9207 errors (sample is empty).
>  # Sometimes, when populating the sample table, the UPSERT fails because garbage is used for the table name. This is because the table name is taken from an NAString that has gone out of scope.
> The following test script can be used to test these conditions. In Hive, do the following:
> {quote}create database if not exists h_increUpdatestats;
> use h_increUpdatestats;
> drop table if exists hive_sequencefile ;
> create table hive_sequencefile(C_CUSTKEY int,C_NAME char(20),C_ADDRESS varchar(20), C_NATIONKEY int, C_TIMESTAMP timestamp,C_PHONE char(20), C_ACCTBAL string, 
> C_MKTSEGMENT char(20), C_DATE date) stored as sequencefile;
> insert into hive_sequencefile values 
> (1,'x','xx',1,'2018-08-22 12:00:00.123','xxx','xxxx','xxxxx','2018-08-22'),
> (2,'x','xx',2,'2018-08-23 12:00:00.123','xxx','xxxx','xxxxx','2018-08-23'),
> (3,'x','xx',3,'2018-08-24 12:00:00.123','xxx','xxxx','xxxxx','2018-08-24'),
> (4,'x','xx',4,'2018-08-25 12:00:00.123','xxx','xxxx','xxxxx','2018-08-25'),
> (5,'x','xx',5,'2018-08-26 12:00:00.123','xxx','xxxx','xxxxx','2018-08-26');
> {quote}
> Then in Trafodion do the following:
> {quote}update statistics for table hive.h_increUpdatestats.hive_sequencefile remove sample;
> update statistics for table hive.h_increUpdatestats.hive_sequencefile create sample random 100 percent;
> update statistics for table hive.h_increUpdatestats.hive_sequencefile on every column;
> update statistics for table hive.h_increUpdatestats.hive_sequencefile on existing columns incremental where C_CUSTKEY>=0;
> {quote}
> Without any fixes, the third UPDATE STATISTICS command abends.
> With a fix for the first problem, the last UPDATE STATISTICS command sometimes fails with error 2109, and always fails with error 9207.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)