You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2017/06/13 15:12:00 UTC
[jira] [Created] (HIVE-16889) Improve Performance Of VARCHAR
BELUGA BEHR created HIVE-16889:
----------------------------------
Summary: Improve Performance Of VARCHAR
Key: HIVE-16889
URL: https://issues.apache.org/jira/browse/HIVE-16889
Project: Hive
Issue Type: Improvement
Components: Types
Affects Versions: 2.1.1, 3.0.0
Reporter: BELUGA BEHR
Often times, organizations use tools that create table schemas on the fly and they specify a VARCHAR column with precision. In these scenarios, performance suffers even though one could assume performance should be better since there is pre-existing knowledge about the size of the data and buffers could be more efficiently setup then in the case where no such knowledge exists.
Most of the performance seems to be caused by reading a STRING from a file into a byte buffer, checking the length of the STRING, truncating the STRING if needed, and then serializing the STRING back into bytes again.
From the code, I have identified several areas where develops left notes about later improvements.
# org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
# org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, int)
# org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object, PrimitiveObjectInspector)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)