You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/03/28 17:29:41 UTC

[jira] [Commented] (TRAFODION-2376) Improve UPDATE STATISTICS performance for varchar columns

    [ https://issues.apache.org/jira/browse/TRAFODION-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945592#comment-15945592 ] 

ASF GitHub Bot commented on TRAFODION-2376:
-------------------------------------------

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/1029

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

    This pull request submits a performance enhancement to the UPDATE STATISTICS utility. This work is the completion of a prototype originally done by Barry Fritchman (@blfritch).
    
    For the moment, the feature is turned off by default. Use CQD  USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.
    
    What this feature does is compact varchars in memory for the internal sort code path in UPDATE STATISTICS. In the old code, varchars are expanded out to their full length. (Actually, we already truncate them at 256 characters -- the setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in UEC computation perhaps but improving performance dramatically for very long varchar columns.) In the new code, we estimate the average length of the column, and allocate space assuming the column still adheres to that average. For columns that already have statistics, we use the average varchar length stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that the average is one-half the declared length of the column.
    
    The performance gain from using this feature comes from reducing the number of scans of the table or sample table because more columns can fit in memory in each scan. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/1029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1029
    
----
commit 3366fdba1b9d52e7d04d21ee33f92698089cdb36
Author: Dave Birdsall <db...@apache.org>
Date:   2017-03-28T17:16:00Z

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

----


> Improve UPDATE STATISTICS performance for varchar columns
> ---------------------------------------------------------
>
>                 Key: TRAFODION-2376
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2376
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> Today when UPDATE STATISTICS uses internal sort, varchar columns are expanded out to their maximum length with blank padding. This can be quite wasteful both of memory and CPU cycles, as often the average length of a varchar is much less (even orders of magnitude less) than the maximum length. We can do much better performance wise by not doing this expansion, at the cost of some complexity in comparison.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)