You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/11 10:13:00 UTC

[jira] [Work logged] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

     [ https://issues.apache.org/jira/browse/HIVE-25443?focusedWorklogId=636846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636846 ]

ASF GitHub Bot logged work on HIVE-25443:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Aug/21 10:12
            Start Date: 11/Aug/21 10:12
    Worklog Time Spent: 10m 
      Work Description: shameersss1 opened a new pull request #2581:
URL: https://github.com/apache/hive/pull/2581


   …pes When there are more than 1024 values
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
   
   -->
   
   ### What changes were proposed in this pull request?
   Instead of initializing the ColumnVector with default size which is 1024, Initialize it with the the size of record size required.
   
   ### Why are the changes needed?
   
   Changes are needed to allow Arrow SerDe to Serialize/deserialize complex data types When there are more than 1024 values
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test were added to confirm the behaviour
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 636846)
    Remaining Estimate: 0h
            Time Spent: 10m

> Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25443
>                 URL: https://issues.apache.org/jira/browse/HIVE-25443
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using Arrow SerDe when there are more than 1024 values. This happens due to ColumnVector always being initialized with a size of 1024.
> Issue #1 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>    public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>      String[][] schema = {
>              {"boolean_list", "array<boolean>"},
>      };
>   
>      Object[][] rows = new Object[1025][1];
>      for (int i = 0; i < 1025; i++) {
>        rows[i][0] = new BooleanWritable(true);
>      }
>   
>      initAndSerializeAndDeserialize(schema, toList(rows));
>    }
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)