You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2016/08/06 01:07:20 UTC
[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate()
can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-14450:
---------------------------
Affects Version/s: 2.2.0
> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> ------------------------------------------------------------------------
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
> Issue Type: Improvement
> Components: Vectorization
> Affects Versions: 2.2.0
> Reporter: Gopal V
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
> // UTF-8 continuation bytes have 2 high bits equal to 0x80.
> if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
> break;
> }
> ++charCount;
> }
> j++;
> }
> return (j - start);
> }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 256 bytes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)