You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Amogh Margoor (Code Review)" <ge...@cloudera.org> on 2021/07/05 14:34:48 UTC

[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library
......................................................................


Patch Set 6:

> (4 comments)
 > 
 > It is great to know that Impala can achieve 926 MB/s conversion
 > rate and very attempting to get the best from fast_double_parser():-)
 > 
 > The key is not to populate a new std::string when the original
 > input conforms to the requirements of the library (well formed
 > null-terminated string via string::c_str() in constant speed),
 > which should be true in most cases.
 > 
 > Throughout the code base of Impala, I was able to find only the
 > following call that needs the service of converting string to
 > double which makes the above idea feasible.
 > 
 > 346 static bool ParseProbability(const string& prob_str, bool*
 > should_execute) {
 > 347   StringParser::ParseResult parse_result;
 > 348   double probability = StringParser::StringToFloat<double>(
 > 349       prob_str.c_str(), prob_str.size(), &parse_result);
 > 350   if (parse_result != StringParser::PARSE_SUCCESS ||
 > 351       probability < 0.0 || probability > 1.0) {
 > 352     return false;
 > 353   }
 > 354   // +1L ensures probability of 0.0 and 1.0 work as expected.
 > 355   *should_execute = rand() < probability * (RAND_MAX + 1L);
 > 356   return true;
 > 357 }

Hi Qifan, I got late to the comment. So the other important code path which can lead to non-null terminated strings are due to the cast: 'select cast("0.454" as double)' or 'select cast(x as double) from foo' etc. The code path will pass through CastFunctions::CastToDoubleVal generated via Macro:
#define CAST_FROM_STRING(num_type, native_type, string_parser_fn) \
  num_type CastFunctions::CastTo##num_type(FunctionContext* ctx, const StringVal& val) { \
    if (val.is_null) return num_type::null(); \
    StringParser::ParseResult result; \
    num_type ret; \
    ret.val = StringParser::string_parser_fn<native_type>( \
        reinterpret_cast<char*>(val.ptr), val.len, &result); \
    if (UNLIKELY(result != StringParser::PARSE_SUCCESS)) return num_type::null(); \
    return ret; \
  }

this code can probably be frequently used based on usage of cast by client/customer. But the point you are making is valid that well formed null-terminated string need no extra processing and should directly be passed to library function.


-- 
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Jul 2021 14:34:48 +0000
Gerrit-HasComments: No