You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jim Apple (JIRA)" <ji...@apache.org> on 2017/05/02 21:47:04 UTC

[jira] [Created] (IMPALA-5273) StringCompare is very slow

Jim Apple created IMPALA-5273:
---------------------------------

             Summary: StringCompare is very slow
                 Key: IMPALA-5273
                 URL: https://issues.apache.org/jira/browse/IMPALA-5273
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Jim Apple


Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.

memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.

To replicate:

{noformat}
create table long_strings (s string) stored as parquet;
insert into long_strings values (repeat("a", 2048));
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
select count(*) from long_strings where s <= repeat("a", 2048);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)