You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2021/08/11 10:16:00 UTC

[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

    [ https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397249#comment-17397249 ] 

Syed Shameerur Rahman commented on HIVE-25443:
----------------------------------------------

[~kgyrtkirk] [~pgaref] Could you please review the pull request ?
Thanks

> Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25443
>                 URL: https://issues.apache.org/jira/browse/HIVE-25443
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using Arrow SerDe when there are more than 1024 values. This happens due to ColumnVector always being initialized with a size of 1024.
> Issue #1 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>    public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>      String[][] schema = {
>              {"boolean_list", "array<boolean>"},
>      };
>   
>      Object[][] rows = new Object[1025][1];
>      for (int i = 0; i < 1025; i++) {
>        rows[i][0] = new BooleanWritable(true);
>      }
>   
>      initAndSerializeAndDeserialize(schema, toList(rows));
>    }
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)