You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jim Apple (JIRA)" <ji...@apache.org> on 2017/05/02 21:47:04 UTC
[jira] [Created] (IMPALA-5273) StringCompare is very slow
Jim Apple created IMPALA-5273:
---------------------------------
Summary: StringCompare is very slow
Key: IMPALA-5273
URL: https://issues.apache.org/jira/browse/IMPALA-5273
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 2.9.0
Reporter: Jim Apple
Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.
memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.
To replicate:
{noformat}
create table long_strings (s string) stored as parquet;
insert into long_strings values (repeat("a", 2048));
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
select count(*) from long_strings where s <= repeat("a", 2048);
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)