You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/02/01 01:51:00 UTC

[jira] [Work logged] (TRAFODION-3265) INSERT values to VARCHAR column of UTF8 charset can be extremely slow

     [ https://issues.apache.org/jira/browse/TRAFODION-3265?focusedWorklogId=193145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-193145 ]

ASF GitHub Bot logged work on TRAFODION-3265:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Feb/19 01:50
            Start Date: 01/Feb/19 01:50
    Worklog Time Spent: 10m 
      Work Description: selvaganesang commented on pull request #1785: [TRAFODION-3265] INSERT VARCHAR of UTF8 charset can be extremely slow
URL: https://github.com/apache/trafodion/pull/1785
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 193145)
    Time Spent: 2.5h  (was: 2h 20m)

> INSERT values to VARCHAR column of UTF8 charset can be extremely slow
> ---------------------------------------------------------------------
>
>                 Key: TRAFODION-3265
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3265
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: any
>            Reporter: He Zhenxing
>            Priority: Major
>             Fix For: 2.4
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When inserting values to varchar column of UTF8 character set, and the values are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters), the performance will degrade exponentially according to the number of alphabet characters.
> {noformat}
> >> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8);
> >> INSERT INTO t1 VALUES ('aaa...中文');
> {noformat}
> Here is a simple test result with different number of 'a's in the insert value:
> {noformat}
> 100  'a's                 10 ms
> 1k   'a's                 100 ms
> 10k  'a's                 10 seconds
> 60k  'a's                 160 seconds{noformat}
> The reason is because we internally convert UTF8 to UCS2, which means all 'a's will become "a\0"s, and then ConstValue::getText() function will try to replace all '\0' to printable string "\&shy;\0", and thus cause the performance degrade exponentially with the number of 'a's in the string.
> {code:java}
> const NAString ConstValue::getText() const
>  {
>  if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE)
>  {
>  ...
> // Change imbedded NULL and \377 chars to \0 and \377
>  // This comes up in key values quite often.
>  size_t index;
>  while((index = result.first('\0')) != NA_NPOS
>  && index != result.length())
>  result(index,1) = "\\0";
>  while((index = result.first('\377')) != NA_NPOS
>  && index != result.length())
>  result(index,1) = "\\377";
> ...
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)