You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jim Apple (JIRA)" <ji...@apache.org> on 2017/05/09 16:40:04 UTC

[jira] [Resolved] (IMPALA-5273) StringCompare is very slow

     [ https://issues.apache.org/jira/browse/IMPALA-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Apple resolved IMPALA-5273.
-------------------------------
       Resolution: Fixed
         Assignee: Jim Apple
    Fix Version/s: Impala 2.9.0

{noformat}
IMPALA-5273: Replace StringCompare with glibc memcmp

glibc's memcmp, which dispatches dynamically based on the instructions
the processor supports, uses sse4.1's ptest, which is faster than our
implementation.

I ran a the benchmark below. The final query sped up by about 5x with
this patch.

    create table long_strings (s string) stored as parquet;
    insert into long_strings values (repeat("a", 2048));
    insert into long_strings select a.s from long_strings a,
      long_strings b;
    insert into long_strings select a.s from long_strings a,
      long_strings b;
    insert into long_strings select a.s from long_strings a,
      long_strings b;
    insert into long_strings select a.s from long_strings a,
      long_strings b;
    insert into long_strings select a.s from long_strings a,
      long_strings b;
    insert into long_strings select a.s from long_strings a,
      (select * from long_strings limit 10) b;
    select count(*) from long_strings where s <= repeat("a", 2048);

Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Reviewed-on: http://gerrit.cloudera.org:8080/6768
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins
{noformat}

> StringCompare is very slow
> --------------------------
>
>                 Key: IMPALA-5273
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5273
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Jim Apple
>            Assignee: Jim Apple
>              Labels: performance
>             Fix For: Impala 2.9.0
>
>
> Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.
> memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.
> To replicate:
> {noformat}
> create table long_strings (s string) stored as parquet;
> insert into long_strings values (repeat("a", 2048));
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
> select count(*) from long_strings where s <= repeat("a", 2048);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)