You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cayenne.apache.org by "Musall, Maik" <ma...@selbstdenker.ag> on 2017/12/15 08:44:26 UTC

ILIKE vs. upper/lower with PostgreSQL

Hi all,

after migrating an EOF application to Cayenne, I noticed many queries running much slower than before and with more load on the database. Turns out that Cayenne generates queries using ILIKE where EOF used to generate UPPER() comparisons. Example:

EOF:     SELECT * FROM mytable t0 WHERE UPPER(mycolumn) LIKE UPPER('foo%')
Cayenne: SELECT * FROM mytable t0 WHERE mycolumn ILIKE 'foo%'

The database is PostgreSQL 9.5, and I used to cover the UPPER() queries with function-based indexes on UPPER(column), which used to work very well.

ILIKE is not as easy to index with PostgreSQL, because it's semantically different, especially with languages that don't have a simple alphabet. There are GiST and GIN index types in PostgreSQL, but those have other drawbacks (too many hits for short columns, needing additional table accesses, no sorting by index, expensive updates in the case of GiST, and so on).

So, my question is: can I change what Cayenne generates here and generate UPPER() or LOWER() comparisons so that I can continue using the existing indexes, and what would be the recommended way to do that?

Thanks
Maik

Re: [Legacy Email] Re: ILIKE vs. upper/lower with PostgreSQL

Posted by Reid Thompson <Re...@omnicell.com>.

On Fri, 2017-12-15 at 18:49 +0000, Hugi Thordarson wrote:
> [EXTERNAL SOURCE]
> 
> 
> 
> > On 15 Dec 2017, at 16:08, Andrus Adamchik <an...@objectstyle.org> wrote:
> > 
> > > On Dec 15, 2017, at 11:00 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
> > > 
> > > ILIKE is certainly useful for full text searches with CLOB/TEXT values and in combination with GiST/GIN indexes, but IMHO not for regular case-insensitive
> > > queries using b-tree indexes. So I'm not sure that PostgresQualifierTranslator does the right thing here.
> > > 
> > > How is everyone else indexing varchar columns for case-insensitive queries on PostgreSQL?
> > 
> > I'd be interested to hear from others as well. PostgreSQL docs simply say "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale." There
> > is not a slightest hint at performance degradation and not using indexes.
> > 
> > Andrus
> 
> I am not a Postgres specialist, but some research on the topic seems reveals (a lot of) anecdotal evidence that matching on UPPER is more performant both with and without indexes and is generally
> preferred to ILIKE for the generic case. So I believe a modification of the standard behaviour would be sensible.
> 
> - hugi

https://www.postgresql.org/docs/9.6/static/indexes-types.html

The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col
LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your database does not use the C locale you will need to create the index with a special operator class to support indexing of
pattern-matching queries; see Section 11.9 below. It is also possible to use B-tree indexes for ILIKE and ~*, but only if the pattern starts with non-alphabetic characters, i.e., characters that are
not affected by upper/lower case conversion.

https://dba.stackexchange.com/questions/4521/how-to-create-an-index-to-speed-up-an-aggregate-like-query-on-an-expression

There is no index support for LIKE / ILIKE in PostgreSQL 8.4 - except for left anchored search terms.

PostgreSQL 9.1 or later provides new features in the extension pg_trgm that enable index support for LIKE / ILIKE expressions (or simple regular expressions, operator ~ & friends) of any form with a
GIN or GiST index using the provided operator classes.

Install the extension once per database:

CREATE EXTENSION pg_trgm;

Example GIN index:

CREATE INDEX tbl_col_gin_trgm_idx ON tbl USING gin (col gin_trgm_ops);

More info and links in this related answer:

    How is LIKE implemented?

Overview on pattern matching and appropriate indices:

    Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Re: ILIKE vs. upper/lower with PostgreSQL

Posted by "Musall, Maik" <ma...@selbstdenker.ag>.

> Am 15.12.2017 um 19:49 schrieb Hugi Thordarson <hu...@karlmenn.is>:
> 
>> On 15 Dec 2017, at 16:08, Andrus Adamchik <an...@objectstyle.org> wrote:
>> 
>>> On Dec 15, 2017, at 11:00 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
>>> 
>>> ILIKE is certainly useful for full text searches with CLOB/TEXT values and in combination with GiST/GIN indexes, but IMHO not for regular case-insensitive
>>> queries using b-tree indexes. So I'm not sure that PostgresQualifierTranslator does the right thing here.
>>> 
>>> How is everyone else indexing varchar columns for case-insensitive queries on PostgreSQL?
>> 
>> I'd be interested to hear from others as well. PostgreSQL docs simply say "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale." There is not a slightest hint at performance degradation and not using indexes.
>> 
>> Andrus
> 
> I am not a Postgres specialist, but some research on the topic seems reveals (a lot of) anecdotal evidence that matching on UPPER is more performant both with and without indexes and is generally preferred to ILIKE for the generic case. So I believe a modification of the standard behaviour would be sensible.

I think so, too. From PostgreSQL's own documentation, I'm not sure what ILIKE would be good for anyway. It's certainly a nicer syntax than an UPPER() comparison, but internally the implementation is different, and it's not just syntactic sugar for UPPER(). It may have coverage for some use cases around non-alphabetic languages or something, but there isn't even a hint to that in the documentation that I can see.

And even the pg manual explicitly recommends <https://www.postgresql.org/docs/10/static/indexes-expressional.html> using upper() or lower() to index case-insensitive queries:
> An index column need not be just a column of the underlying table, but can be a function or scalar expression computed from one or more columns of the table. This feature is useful to obtain fast access to tables based on the results of computations.
> 
> For example, a common way to do case-insensitive comparisons is to use the lower function:
> 
> SELECT * FROM test1 WHERE lower(col1) = 'value';
> This query can use an index if one has been defined on the result of the lower(col1) function:
> 
> CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

So, I also think Cayenne should ditch ILIKE. And it should be configurable whether UPPER() or LOWER() is used, because while UPPER() is the de-facto standard for this that everyone in the industry used for decades, LOWER() works just as well, and LOWER() is what's mentioned in the documentation. And people might already have indexes in place using either.

Maik

Re: ILIKE vs. upper/lower with PostgreSQL

Posted by Hugi Thordarson <hu...@karlmenn.is>.

> On 15 Dec 2017, at 16:08, Andrus Adamchik <an...@objectstyle.org> wrote:
> 
>> On Dec 15, 2017, at 11:00 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
>> 
>> ILIKE is certainly useful for full text searches with CLOB/TEXT values and in combination with GiST/GIN indexes, but IMHO not for regular case-insensitive
>> queries using b-tree indexes. So I'm not sure that PostgresQualifierTranslator does the right thing here.
>> 
>> How is everyone else indexing varchar columns for case-insensitive queries on PostgreSQL?
> 
> I'd be interested to hear from others as well. PostgreSQL docs simply say "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale." There is not a slightest hint at performance degradation and not using indexes.
> 
> Andrus

I am not a Postgres specialist, but some research on the topic seems reveals (a lot of) anecdotal evidence that matching on UPPER is more performant both with and without indexes and is generally preferred to ILIKE for the generic case. So I believe a modification of the standard behaviour would be sensible.

- hugi

Re: ILIKE vs. upper/lower with PostgreSQL

Posted by Andrus Adamchik <an...@objectstyle.org>.

> On Dec 15, 2017, at 11:00 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
> 
> ILIKE is certainly useful for full text searches with CLOB/TEXT values and in combination with GiST/GIN indexes, but IMHO not for regular case-insensitive
> queries using b-tree indexes. So I'm not sure that PostgresQualifierTranslator does the right thing here.
> 
> How is everyone else indexing varchar columns for case-insensitive queries on PostgreSQL?

I'd be interested to hear from others as well. PostgreSQL docs simply say "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale." There is not a slightest hint at performance degradation and not using indexes.

Andrus

Re: ILIKE vs. upper/lower with PostgreSQL

Posted by "Musall, Maik" <ma...@selbstdenker.ag>.

Hi Nikita,

thanks for the reponse. However, changing hundreds of expressions to something less elegant isn't really a compelling solution to me.

So Hugi (who works on this project with me) solved it by creating a custom alternative to PostgresQualifierTranslator which just omits the ILIKE-related code. That way I get the regular SQL using UPPER(), which is easily indexable.

ILIKE is certainly useful for full text searches with CLOB/TEXT values and in combination with GiST/GIN indexes, but IMHO not for regular case-insensitive
queries using b-tree indexes. So I'm not sure that PostgresQualifierTranslator does the right thing here.

How is everyone else indexing varchar columns for case-insensitive queries on PostgreSQL?

Maik


> Am 15.12.2017 um 12:21 schrieb Nikita Timofeev <nt...@objectstyle.com>:
> 
> Hi,
> 
> I don't think it will be easy to change Cayenne translator behavior
> (but still should be possible if necessary).
> It may be easier to use upper().like() functions instead of
> likeIgnoreCase() in your case.
> 
> I.e. you can do something like this:
>    ObjectSelect.query(MyTable.class).where(MyTable.MYCOLUMN.upper().like("FOO%"))
> 
> 
> On Fri, Dec 15, 2017 at 11:44 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
>> Hi all,
>> 
>> after migrating an EOF application to Cayenne, I noticed many queries running much slower than before and with more load on the database. Turns out that Cayenne generates queries using ILIKE where EOF used to generate UPPER() comparisons. Example:
>> 
>> EOF:     SELECT * FROM mytable t0 WHERE UPPER(mycolumn) LIKE UPPER('foo%')
>> Cayenne: SELECT * FROM mytable t0 WHERE mycolumn ILIKE 'foo%'
>> 
>> The database is PostgreSQL 9.5, and I used to cover the UPPER() queries with function-based indexes on UPPER(column), which used to work very well.
>> 
>> ILIKE is not as easy to index with PostgreSQL, because it's semantically different, especially with languages that don't have a simple alphabet. There are GiST and GIN index types in PostgreSQL, but those have other drawbacks (too many hits for short columns, needing additional table accesses, no sorting by index, expensive updates in the case of GiST, and so on).
>> 
>> So, my question is: can I change what Cayenne generates here and generate UPPER() or LOWER() comparisons so that I can continue using the existing indexes, and what would be the recommended way to do that?
>> 
>> Thanks
>> Maik
>> 
> 
> 
> 
> -- 
> Best regards,
> Nikita Timofeev

Re: ILIKE vs. upper/lower with PostgreSQL

Posted by Nikita Timofeev <nt...@objectstyle.com>.

Hi,

I don't think it will be easy to change Cayenne translator behavior
(but still should be possible if necessary).
It may be easier to use upper().like() functions instead of
likeIgnoreCase() in your case.

I.e. you can do something like this:
    ObjectSelect.query(MyTable.class).where(MyTable.MYCOLUMN.upper().like("FOO%"))


On Fri, Dec 15, 2017 at 11:44 AM, Musall, Maik <ma...@selbstdenker.ag> wrote:
> Hi all,
>
> after migrating an EOF application to Cayenne, I noticed many queries running much slower than before and with more load on the database. Turns out that Cayenne generates queries using ILIKE where EOF used to generate UPPER() comparisons. Example:
>
> EOF:     SELECT * FROM mytable t0 WHERE UPPER(mycolumn) LIKE UPPER('foo%')
> Cayenne: SELECT * FROM mytable t0 WHERE mycolumn ILIKE 'foo%'
>
> The database is PostgreSQL 9.5, and I used to cover the UPPER() queries with function-based indexes on UPPER(column), which used to work very well.
>
> ILIKE is not as easy to index with PostgreSQL, because it's semantically different, especially with languages that don't have a simple alphabet. There are GiST and GIN index types in PostgreSQL, but those have other drawbacks (too many hits for short columns, needing additional table accesses, no sorting by index, expensive updates in the case of GiST, and so on).
>
> So, my question is: can I change what Cayenne generates here and generate UPPER() or LOWER() comparisons so that I can continue using the existing indexes, and what would be the recommended way to do that?
>
> Thanks
> Maik
>



-- 
Best regards,
Nikita Timofeev