You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2013/03/27 09:19:16 UTC

[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table

    [ https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615020#comment-13615020 ] 

Navis commented on HIVE-4223:
-----------------------------

Can I ask that the query which made above exception is using UDTF?
                
> LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-4223
>                 URL: https://issues.apache.org/jira/browse/HIVE-4223
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>         Environment: Hive 0.9.0
>            Reporter: Yong Zhang
>
> The LazySimpleSerDe will throw IndexOutOfBoundsException if the column structure is struct containing array of struct. 
> I have a table with one column defined like this:
> columnA
> array <
>     struct<
>        col1:primiType,
>        col2:primiType,
>        col3:primiType,
>        col4:primiType,
>        col5:primiType,
>        col6:primiType,
>        col7:primiType,
>        col8:array<
>             struct<
>               col1:primiType,
>               col2::primiType,
>               col3::primiType,
>               col4:primiType,
>               col5:primiType,
>               col6:primiType,
>               col7:primiType,
>               col8:primiType,
>               col9:primiType
>             >
>        >
>     >
> >
> In this example, the outside struct has 8 columns (including the array), and the inner struct has 9 columns. As long as the outside struct has LESS column count than the inner struct column count, I think we will get the following exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
> Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
>         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>         at java.util.ArrayList.get(ArrayList.java:322)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
>         at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>         at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
>         ... 9 more
> I am not very sure about exactly the reason of this problem. I believe that the   public static void serialize(ByteStream.Output out, Object obj,ObjectInspector objInspector, byte[] separators, int level, Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is recursively invoking itself when facing nest structure. But for the nested struct structure, the list reference will mass up, and the size() will return wrong data.
> In the above example case I faced, 
> for these 2 lines:
>       List<? extends StructField> fields = soi.getAllStructFieldRefs();
>       list = soi.getStructFieldsDataAsList(obj);
> my StructObjectInspector(soi) will return the CORRECT data for getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example, for one row, for the outsider 8 columns struct, I have 2 elements in the inner array of struct, and each element will have 9 columns (as there are 9 columns in the inner struct). During runtime, after I added more logging in the LazySimpleSerDe, I will see the following behavior in the logging:
> for 8 outside column, loop
>     for 9 inside columns, loop for serialize
>     for 9 inside columns, loop for serialize
> code broken here, for the outside loop, it will try to access the 9th element,which not exist in the outside loop, as you will see the stracktrace as it tried to access location 8 of size 8 of list.
> What I did is to change the following line of code, it look like fixing this problem. But I don't know if it is the right way, but it did fix this problem, and I did it on hive 0.9.0 version of code:
> 481c481,482
> <         for (int i = 0; i < list.size(); i++) {
> ---
> >         int listSize = list.size();
> >         for (int i = 0; i < listSize; i++) {
> I believe the reason of this bug is that if the code did the current way like
>         for (int i = 0; i < list.size(); i++)
> the method list.size() will be invoked for every loop. But in the nest structure, the list.size() will return different result during the recursive call, and that caused the problem I am currently facing.
> Thanks
> Yong Zhang

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira