You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2016/08/06 01:07:20 UTC
[jira] [Created] (HIVE-14450) Vectorization: StringExpr::truncate()
can assume 1 byte per-char minimum
Gopal V created HIVE-14450:
------------------------------
Summary: Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
Key: HIVE-14450
URL: https://issues.apache.org/jira/browse/HIVE-14450
Project: Hive
Issue Type: Improvement
Reporter: Gopal V
{code}
public static int truncate(byte[] bytes, int start, int length, int maxLength) {
int end = start + length;
// count characters forward
int j = start;
int charCount = 0;
while(j < end) {
// UTF-8 continuation bytes have 2 high bits equal to 0x80.
if ((bytes[j] & 0xc0) != 0x80) {
if (charCount == maxLength) {
break;
}
++charCount;
}
j++;
}
return (j - start);
}
{code}
Should not dirty the L1 cache if the maxLength is 4096 and the input string has 256 bytes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)