You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jim Apple (JIRA)" <ji...@apache.org> on 2017/05/09 16:40:04 UTC
[jira] [Resolved] (IMPALA-5273) StringCompare is very slow
[ https://issues.apache.org/jira/browse/IMPALA-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Apple resolved IMPALA-5273.
-------------------------------
Resolution: Fixed
Assignee: Jim Apple
Fix Version/s: Impala 2.9.0
{noformat}
IMPALA-5273: Replace StringCompare with glibc memcmp
glibc's memcmp, which dispatches dynamically based on the instructions
the processor supports, uses sse4.1's ptest, which is faster than our
implementation.
I ran a the benchmark below. The final query sped up by about 5x with
this patch.
create table long_strings (s string) stored as parquet;
insert into long_strings values (repeat("a", 2048));
insert into long_strings select a.s from long_strings a,
long_strings b;
insert into long_strings select a.s from long_strings a,
long_strings b;
insert into long_strings select a.s from long_strings a,
long_strings b;
insert into long_strings select a.s from long_strings a,
long_strings b;
insert into long_strings select a.s from long_strings a,
long_strings b;
insert into long_strings select a.s from long_strings a,
(select * from long_strings limit 10) b;
select count(*) from long_strings where s <= repeat("a", 2048);
Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Reviewed-on: http://gerrit.cloudera.org:8080/6768
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins
{noformat}
> StringCompare is very slow
> --------------------------
>
> Key: IMPALA-5273
> URL: https://issues.apache.org/jira/browse/IMPALA-5273
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.9.0
> Reporter: Jim Apple
> Assignee: Jim Apple
> Labels: performance
> Fix For: Impala 2.9.0
>
>
> Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.
> memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.
> To replicate:
> {noformat}
> create table long_strings (s string) stored as parquet;
> insert into long_strings values (repeat("a", 2048));
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, long_strings b;
> insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
> select count(*) from long_strings where s <= repeat("a", 2048);
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)