You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Lars Volker (Code Review)" <ge...@cloudera.org> on 2016/09/01 13:43:18 UTC

[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2063:   // Hive returns 0 if substr was not found in str (or on other error coditions).
What happens if you pass other types as the needle or haystack, e.g. int or double or bool? Do we cast them? Or throw an error?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 22: #include <stdint.h>
are the includes still needed?


PS5, Line 289:   if (str.is_null || substr.is_null) return IntVal::null();
             :   StringValue str_sv = StringValue::FromStringVal(str);
             :   StringValue substr_sv = StringValue::FromStringVal(substr);
             :   StringSearch search(&substr_sv);
             :   // Hive returns positions starting from 1.
             :   return IntVal(search.Search(&str_sv) + 1);
Will replacing this with a call to the more general version of Instr() come with a performance impact? If not I think we should forward here, similar to the other Instr() overload below.


PS5, Line 299: occurance
spelling


PS5, Line 316: no
nit: use 'num' to abbreviate number


PS5, Line 329: allowed match
what does "allowed match" mean?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   if (len == -1) {
single line


PS5, Line 69: Haystack is not full string
This comment could be clearer, Haystack is a full string. Maybe rephrase "Test searching through part of haystack" or similar.


PS5, Line 70: 5
What is the difference between this and just specifying "abcab" as the haystack? It looks like the tests with varying len parameters mostly test that the c'tor of StringValue works correctly. Is there a downside to it, if we rely on that (or test it elsewhere) and then just specify the strings we want to test here without len parameters.


Line 74:   // Haystack and needle not full len
Same here


Line 81:   // Test last bit, this is the interesting edge case
What does "last bit" mean? Can you add a comment why this is an interesting case?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 204:         }
Can we advance i by skip_ here, too? Is this what the comment in line 139 refers to?


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi...@cloudera.com>
Gerrit-HasComments: Yes