You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mike Matrigali (JIRA)" <ji...@apache.org> on 2007/05/26 02:22:17 UTC

[jira] Created: (DERBY-2699) performance of like in territory based collation databases may be improved by changing way collation elements are calculated.

performance of like in territory based collation databases may be improved by changing way collation elements are calculated.
-----------------------------------------------------------------------------------------------------------------------------

                 Key: DERBY-2699
                 URL: https://issues.apache.org/jira/browse/DERBY-2699
             Project: Derby
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 10.3.0.0
            Reporter: Mike Matrigali


WorkHorseForCollatorDatatypes.java has a method getCollationElementsForString() which currently gets
called when processing like clauses in databases that have been created with territory based collation, this is
not an issue in pre-10.3 databases or post 10.3 default databases.
getCollationElementsForString gets the collation elements for the entire  value of the String held by
the datatype using the class.

If you take the case of pattern 'A%' and the value of datatype is 'BXXXXXXXXXXXXXXXXXXXXXXX', 
then it would have been better to  better to get collation elements one character of the String value at a time
to avoid the  process of getting collation elements for the entire string when we don't really need it 
One could imagine this might have a huge performance impact on running like against a long clob where
the like pattern has leading fixed-length pattern to match.

Comments on this from Dan and Dag can be found in DERBY-2416.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-2699) performance of like in territory based collation databases may be improved by changing way collation elements are calculated.

Posted by "Daniel John Debrunner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525494 ] 

Daniel John Debrunner commented on DERBY-2699:
----------------------------------------------

I think the approach of getting collation elements as needed would have a large affect on all string comparisons.

I created a scale 4 order entry database with and without a collated database. Just looking at the load collation will only affect 'index.sql' which creates an index including the customer's last name. With UCS_BASIC collation the index created in about 2.5 seconds, with TERRITORY_BASED collation the time was over 11 seconds.

I don't think that the collation overhead should be that high, I would expect maybe a 10-20% overhead, not around 450%

> performance of like in territory based collation databases may be improved by changing way collation elements are calculated.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2699
>                 URL: https://issues.apache.org/jira/browse/DERBY-2699
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.3.1.4
>            Reporter: Mike Matrigali
>
> WorkHorseForCollatorDatatypes.java has a method getCollationElementsForString() which currently gets
> called when processing like clauses in databases that have been created with territory based collation, this is
> not an issue in pre-10.3 databases or post 10.3 default databases.
> getCollationElementsForString gets the collation elements for the entire  value of the String held by
> the datatype using the class.
> If you take the case of pattern 'A%' and the value of datatype is 'BXXXXXXXXXXXXXXXXXXXXXXX', 
> then it would have been better to  better to get collation elements one character of the String value at a time
> to avoid the  process of getting collation elements for the entire string when we don't really need it 
> One could imagine this might have a huge performance impact on running like against a long clob where
> the like pattern has leading fixed-length pattern to match.
> Comments on this from Dan and Dag can be found in DERBY-2416.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.