You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2017/06/13 16:53:02 UTC
[jira] [Created] (HADOOP-14525) org.apache.hadoop.io.Text Truncate
BELUGA BEHR created HADOOP-14525:
------------------------------------
Summary: org.apache.hadoop.io.Text Truncate
Key: HADOOP-14525
URL: https://issues.apache.org/jira/browse/HADOOP-14525
Project: Hadoop Common
Issue Type: Improvement
Components: io
Affects Versions: 2.8.1
Reporter: BELUGA BEHR
For Apache Hive, VARCHAR fields are much slower than STRING fields when a precision (string length cap) is included. Keep in mind that this precision is the number of UTF-8 characters in the string, not the number of bytes.
The general procedure is:
# Load an entire byte buffer into a {{Text}} object
# Convert it to a {{String}}
# Count N number of character code points
# Substring the {{String}} at the correct place
# Convert the String back into a byte array and populate the {{Text}} object
It would be great if the {{Text}} object could offer a truncate/substring method based on character count that did not require copying data around
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org