You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2013/01/02 17:22:13 UTC
[jira] [Updated] (DERBY-2699) performance of like in territory based collation databases may be improved by changing way collation elements are calculated.

     [ https://issues.apache.org/jira/browse/DERBY-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-2699:
--------------------------------------

    Attachment: d2699-1a.diff

DERBY-3136 improved the LIKE implementation along the lines suggested in this issue, so now WorkHorseForCollatorDatatypes.getCollationElementsForString() is only used for checking that the ESCAPE clause of a LIKE ... ESCAPE ... expression does not contain more than a single collation element (for example to disallow 'ß' in an ESCAPE clause, as it has two collation elements).

Since it's only used for ESCAPE clauses now, and they are typically just a single character, the performance benefits are probably not that big anymore. But we can simplify how the collation elements are calculated now that we only need to check if it's a single element. For example, there is no need to have an intermediate int[] representation of the collation elements.

Attached is a patch that removes the getCollationElementsForString() and getCountOfCollationElements() methods from WorkHorseForCollatorDatatypes, CollationElementsInterface, and all classes that implement CollationElementsInterface. Those methods are replaced by a simpler hasSingleCollationElement() method.

This shrinks the source files by approximately 100 lines in addition to reducing the number of objects allocated when evaluating LIKE ... ESCAPE with territory based collation, which might (perhaps) slightly improve the performance.

I'm running the full regression test suite on the patch now.
                
> performance of like in territory based collation databases may be improved by changing way collation elements are calculated.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2699
>                 URL: https://issues.apache.org/jira/browse/DERBY-2699
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.3.1.4
>            Reporter: Mike Matrigali
>              Labels: derby_triage10_10
>         Attachments: d2699-1a.diff
>
>
> WorkHorseForCollatorDatatypes.java has a method getCollationElementsForString() which currently gets
> called when processing like clauses in databases that have been created with territory based collation, this is
> not an issue in pre-10.3 databases or post 10.3 default databases.
> getCollationElementsForString gets the collation elements for the entire  value of the String held by
> the datatype using the class.
> If you take the case of pattern 'A%' and the value of datatype is 'BXXXXXXXXXXXXXXXXXXXXXXX', 
> then it would have been better to  better to get collation elements one character of the String value at a time
> to avoid the  process of getting collation elements for the entire string when we don't really need it 
> One could imagine this might have a huge performance impact on running like against a long clob where
> the like pattern has leading fixed-length pattern to match.
> Comments on this from Dan and Dag can be found in DERBY-2416.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira