You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2021/08/11 09:29:00 UTC
[jira] [Created] (HIVE-25443) Arrow SerDe Cannot
serialize/deserialize complex data types When there are more than 1024
values
Syed Shameerur Rahman created HIVE-25443:
--------------------------------------------
Summary: Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
Key: HIVE-25443
URL: https://issues.apache.org/jira/browse/HIVE-25443
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers
Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman
Fix For: 4.0.0
Complex data types like MAP, STRUCT cannot be serialized/deserialzed using Arrow SerDe when there are more than 1024 values. This happens due to ColumnVector always being initialized with a size of 1024.
Issue #1 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
Issue #2 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
{code:java}
@Test
public void testListBooleanWithMoreThan1024Values() throws SerDeException {
String[][] schema = {
{"boolean_list", "array<boolean>"},
};
Object[][] rows = new Object[1025][1];
for (int i = 0; i < 1025; i++) {
rows[i][0] = new BooleanWritable(true);
}
initAndSerializeAndDeserialize(schema, toList(rows));
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)