You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Amogh Margoor (Code Review)" <ge...@cloudera.org> on 2021/06/03 12:31:41 UTC

[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library
......................................................................


Patch Set 6:

> I think it would be nice to see a comparision between
 > fast_double_parser and strtod() for short strings.
 > 
 > If we can measure a significant benefit for fast_double_parser we
 > can use it for short strings with a VLA, and use strtod() for long
 > strings with heap allocation (in this case we don't need to
 > preprocess the string).
 > 
 > If we can't measure a difference between the two then maybe we
 > should just use strtod() since it's more robust and doesn't need
 > any preprocessing of the input.

There was no observable difference found when running through Impala as that function would be small fraction of entire runtime. However I benchmarked the individual functions and also added malloc and VAC operations to check the difference in the benchmark. This is the result:

=== trial 1 ===
fast_double_parser_malloc  397.00 MB/s
fast_double_parser_VAC  738.19 MB/s
fast_double_parser  926.76 MB/s
strtod     	110.63 MB/s


=== trial 2 ===
fast_double_parser_malloc  395.90 MB/s
fast_double_parser_VAC  694.98 MB/s
fast_double_parser  926.33 MB/s
strtod     	110.42 MB/s

So we see that malloc and strtod are significantly slower and in bigger data sets it may start showing difference. tcmalloc however might be faster than above but still will incur some penalty. So I will use VAC for shorter strings to avoid malloc and for larger strings we can use malloc + strtod.


-- 
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Jun 2021 12:31:41 +0000
Gerrit-HasComments: No