You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Thejas Nair <th...@hortonworks.com> on 2013/06/13 00:07:52 UTC

Review Request: HIVE-3253- ArrayIndexOutOfBounds exception for deeply nested structs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11854/
-----------------------------------------------------------

Review request for hive.


Description
-------

(description patch from the jira comment )
It increases the number of control charactors used by LazySimpleSerde, avoiding the chars that are likely to be present in data. Using new control chars is not backward compatible change, so you need to set the serde property hive.serialization.extend.nesting.levels to enable it for a table that is using LazySimpleSerde. If your input table has data that might contain these delimiter control chars, you should escape the delimiter chars, and set escape char using serde property.
Example :
create table nestedcomplex (
simple_int int,
max_nested_array  array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (  'hive.serialization.extend.nesting.levels'='true'
)
;
LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of levels of nesting supported by the serde. We should look at using LazyBinarySerde here as it would be more efficient and can go beyond this nesting level restriction.
LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the levels of nesting using the new serde property for that use case.
The patch has fix to give better error message when the levels of nesting exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception anymore)


This addresses bug HIVE-3253.
    https://issues.apache.org/jira/browse/HIVE-3253


Diffs
-----

  data/files/nested_complex.txt PRE-CREATION 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 3bd0919 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 04921d5 
  ql/src/test/queries/clientnegative/nested_complex_neg.q PRE-CREATION 
  ql/src/test/queries/clientpositive/nested_complex.q PRE-CREATION 
  ql/src/test/results/clientnegative/nested_complex_neg.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out d9c48aa 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 492be3a 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 7ed2448 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5b49c35 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 1b585bf 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out c5315fb 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out a9ab616 
  ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 7c4558f 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out fc2ffc5 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 3df0ca8 
  ql/src/test/results/clientpositive/bucket_map_join_1.q.out 56131b0 
  ql/src/test/results/clientpositive/bucket_map_join_2.q.out 1e7bea5 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 43e34ce 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out ab44de5 
  ql/src/test/results/clientpositive/bucketcontext_3.q.out 592765a 
  ql/src/test/results/clientpositive/bucketcontext_4.q.out 6fc94a7 
  ql/src/test/results/clientpositive/bucketcontext_5.q.out 8eb9a71 
  ql/src/test/results/clientpositive/bucketcontext_6.q.out 8271292 
  ql/src/test/results/clientpositive/bucketcontext_7.q.out db9bb1d 
  ql/src/test/results/clientpositive/bucketcontext_8.q.out 21b5dc5 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out 4bbd35f 
  ql/src/test/results/clientpositive/bucketmapjoin10.q.out 3466e6d 
  ql/src/test/results/clientpositive/bucketmapjoin11.q.out 1c12c09 
  ql/src/test/results/clientpositive/bucketmapjoin12.q.out abf9783 
  ql/src/test/results/clientpositive/bucketmapjoin13.q.out 870cb35 
  ql/src/test/results/clientpositive/bucketmapjoin7.q.out b8ba7c0 
  ql/src/test/results/clientpositive/bucketmapjoin8.q.out 2a5a5d5 
  ql/src/test/results/clientpositive/bucketmapjoin9.q.out c2db270 
  ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 2230fd1 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 2c32730 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 007bc31 
  ql/src/test/results/clientpositive/combine2_hadoop20.q.out 1ef67f4 
  ql/src/test/results/clientpositive/filter_join_breaktask.q.out 52bac6a 
  ql/src/test/results/clientpositive/groupby_sort_1.q.out e6f3a7a 
  ql/src/test/results/clientpositive/groupby_sort_skew_1.q.out b7ca0ee 
  ql/src/test/results/clientpositive/input23.q.out f71a43f 
  ql/src/test/results/clientpositive/input42.q.out 67679af 
  ql/src/test/results/clientpositive/input_part7.q.out 538a742 
  ql/src/test/results/clientpositive/input_part9.q.out 91d1794 
  ql/src/test/results/clientpositive/join_filters_overlap.q.out 4f79d38 
  ql/src/test/results/clientpositive/louter_join_ppr.q.out 32827a3 
  ql/src/test/results/clientpositive/metadataonly1.q.out aa6402e 
  ql/src/test/results/clientpositive/nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/outer_join_ppr.q.out f311cce 
  ql/src/test/results/clientpositive/pcr.q.out cd3caff 
  ql/src/test/results/clientpositive/ppd_join_filter.q.out d76d5bd 
  ql/src/test/results/clientpositive/ppd_union_view.q.out e3e404e 
  ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out 2136a33 
  ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 2f006c6 
  ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 600a834 
  ql/src/test/results/clientpositive/regexp_extract.q.out 361d8ed 
  ql/src/test/results/clientpositive/router_join_ppr.q.out 52d7888 
  ql/src/test/results/clientpositive/sample10.q.out e4fecbe 
  ql/src/test/results/clientpositive/sample6.q.out cd78d8b 
  ql/src/test/results/clientpositive/sample8.q.out 8f26dc8 
  ql/src/test/results/clientpositive/sample9.q.out 7694961 
  ql/src/test/results/clientpositive/smb_mapjoin9.q.out 9a7a793 
  ql/src/test/results/clientpositive/smb_mapjoin_13.q.out 1204f88 
  ql/src/test/results/clientpositive/smb_mapjoin_15.q.out 8990856 
  ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out c390b5e 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 7dabb55 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out c321351 
  ql/src/test/results/clientpositive/transform_ppr1.q.out 740a931 
  ql/src/test/results/clientpositive/transform_ppr2.q.out fb8e039 
  ql/src/test/results/clientpositive/udf_explode.q.out dc6a513 
  ql/src/test/results/clientpositive/udf_java_method.q.out 15e71e6 
  ql/src/test/results/clientpositive/udf_reflect.q.out 91aeab5 
  ql/src/test/results/clientpositive/udf_reflect2.q.out f2c64cd 
  ql/src/test/results/clientpositive/udtf_explode.q.out 2905d44 
  ql/src/test/results/clientpositive/union24.q.out 50ae7e3 
  ql/src/test/results/clientpositive/union_ppr.q.out 756a9cd 
  ql/src/test/results/compiler/plan/cast1.q.xml bd40304 
  ql/src/test/results/compiler/plan/groupby2.q.xml 13cca32 
  ql/src/test/results/compiler/plan/groupby3.q.xml 06f0864 
  ql/src/test/results/compiler/plan/groupby4.q.xml 21deeb9 
  ql/src/test/results/compiler/plan/groupby5.q.xml 521ee86 
  ql/src/test/results/compiler/plan/groupby6.q.xml b50d796 
  ql/src/test/results/compiler/plan/input20.q.xml 3174490 
  ql/src/test/results/compiler/plan/input8.q.xml cc567d4 
  ql/src/test/results/compiler/plan/input_part1.q.xml ed9d218 
  ql/src/test/results/compiler/plan/input_testxpath.q.xml 58edf34 
  ql/src/test/results/compiler/plan/input_testxpath2.q.xml 031b955 
  ql/src/test/results/compiler/plan/join4.q.xml 391b58d 
  ql/src/test/results/compiler/plan/join5.q.xml 2669097 
  ql/src/test/results/compiler/plan/join6.q.xml b92d70b 
  ql/src/test/results/compiler/plan/join7.q.xml d6253de 
  ql/src/test/results/compiler/plan/join8.q.xml e1e71a7 
  ql/src/test/results/compiler/plan/sample1.q.xml b2c40a3 
  ql/src/test/results/compiler/plan/udf1.q.xml ddc36ec 
  ql/src/test/results/compiler/plan/udf4.q.xml 8ea82eb 
  ql/src/test/results/compiler/plan/udf6.q.xml 334fe0c 
  ql/src/test/results/compiler/plan/udf_case.q.xml 67d55b8 
  ql/src/test/results/compiler/plan/udf_when.q.xml 8334326 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 59b1406 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java d6b31a6 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 27ed4ef 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 99628dc 

Diff: https://reviews.apache.org/r/11854/diff/


Testing
-------

unit tests and beeline 


Thanks,

Thejas Nair


Re: Review Request 11854: HIVE-3253- ArrayIndexOutOfBounds exception for deeply nested structs

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11854/
-----------------------------------------------------------

(Updated July 2, 2013, 11:15 p.m.)


Review request for hive.


Changes
-------

Updates q.out files for 0.23


Bugs: HIVE-3253
    https://issues.apache.org/jira/browse/HIVE-3253


Repository: hive-git


Description
-------

(description patch from the jira comment )
It increases the number of control charactors used by LazySimpleSerde, avoiding the chars that are likely to be present in data. Using new control chars is not backward compatible change, so you need to set the serde property hive.serialization.extend.nesting.levels to enable it for a table that is using LazySimpleSerde. If your input table has data that might contain these delimiter control chars, you should escape the delimiter chars, and set escape char using serde property.
Example :
create table nestedcomplex (
simple_int int,
max_nested_array  array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (  'hive.serialization.extend.nesting.levels'='true'
)
;
LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of levels of nesting supported by the serde. We should look at using LazyBinarySerde here as it would be more efficient and can go beyond this nesting level restriction.
LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the levels of nesting using the new serde property for that use case.
The patch has fix to give better error message when the levels of nesting exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception anymore)


Diffs (updated)
-----

  data/files/nested_complex.txt PRE-CREATION 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 3bd0919 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 04921d5 
  ql/src/test/queries/clientnegative/nested_complex_neg.q PRE-CREATION 
  ql/src/test/queries/clientpositive/nested_complex.q PRE-CREATION 
  ql/src/test/results/clientnegative/nested_complex_neg.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out d9c48aa 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 492be3a 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 7ed2448 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5b49c35 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 1b585bf 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out c5315fb 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out a9ab616 
  ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 7c4558f 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out fc2ffc5 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 3df0ca8 
  ql/src/test/results/clientpositive/bucket_map_join_1.q.out 56131b0 
  ql/src/test/results/clientpositive/bucket_map_join_2.q.out 1e7bea5 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 43e34ce 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out ab44de5 
  ql/src/test/results/clientpositive/bucketcontext_3.q.out 592765a 
  ql/src/test/results/clientpositive/bucketcontext_4.q.out 6fc94a7 
  ql/src/test/results/clientpositive/bucketcontext_5.q.out 8eb9a71 
  ql/src/test/results/clientpositive/bucketcontext_6.q.out 8271292 
  ql/src/test/results/clientpositive/bucketcontext_7.q.out db9bb1d 
  ql/src/test/results/clientpositive/bucketcontext_8.q.out 21b5dc5 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out 4bbd35f 
  ql/src/test/results/clientpositive/bucketmapjoin10.q.out 3466e6d 
  ql/src/test/results/clientpositive/bucketmapjoin11.q.out 1c12c09 
  ql/src/test/results/clientpositive/bucketmapjoin12.q.out abf9783 
  ql/src/test/results/clientpositive/bucketmapjoin13.q.out 870cb35 
  ql/src/test/results/clientpositive/bucketmapjoin7.q.out b8ba7c0 
  ql/src/test/results/clientpositive/bucketmapjoin8.q.out 2a5a5d5 
  ql/src/test/results/clientpositive/bucketmapjoin9.q.out c2db270 
  ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 2230fd1 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 2c32730 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 007bc31 
  ql/src/test/results/clientpositive/combine2.q.out 1d51def 
  ql/src/test/results/clientpositive/combine2_hadoop20.q.out 1ef67f4 
  ql/src/test/results/clientpositive/filter_join_breaktask.q.out 52bac6a 
  ql/src/test/results/clientpositive/groupby_sort_1.q.out e6f3a7a 
  ql/src/test/results/clientpositive/groupby_sort_skew_1.q.out b7ca0ee 
  ql/src/test/results/clientpositive/input23.q.out f71a43f 
  ql/src/test/results/clientpositive/input42.q.out 67679af 
  ql/src/test/results/clientpositive/input_part7.q.out 538a742 
  ql/src/test/results/clientpositive/input_part9.q.out 91d1794 
  ql/src/test/results/clientpositive/join_filters_overlap.q.out 4f79d38 
  ql/src/test/results/clientpositive/list_bucket_dml_1.q.out 7d15a6c 
  ql/src/test/results/clientpositive/list_bucket_dml_11.q.out d631b14 
  ql/src/test/results/clientpositive/list_bucket_dml_12.q.out 343798d 
  ql/src/test/results/clientpositive/list_bucket_dml_13.q.out 3a896fd 
  ql/src/test/results/clientpositive/list_bucket_dml_2.q.out e95e05f 
  ql/src/test/results/clientpositive/list_bucket_dml_3.q.out a197c8f 
  ql/src/test/results/clientpositive/list_bucket_dml_4.q.out 795e2fc 
  ql/src/test/results/clientpositive/list_bucket_dml_5.q.out acf0b69 
  ql/src/test/results/clientpositive/list_bucket_dml_6.q.out 3d547dd 
  ql/src/test/results/clientpositive/list_bucket_dml_7.q.out 8f39c7e 
  ql/src/test/results/clientpositive/list_bucket_dml_8.q.out 8f9c0b2 
  ql/src/test/results/clientpositive/list_bucket_dml_9.q.out ea14fcf 
  ql/src/test/results/clientpositive/list_bucket_query_multiskew_1.q.out a3a8276 
  ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out 26eb5ca 
  ql/src/test/results/clientpositive/list_bucket_query_multiskew_3.q.out 492d31f 
  ql/src/test/results/clientpositive/list_bucket_query_oneskew_1.q.out ced0500 
  ql/src/test/results/clientpositive/list_bucket_query_oneskew_2.q.out f8d8b3f 
  ql/src/test/results/clientpositive/list_bucket_query_oneskew_3.q.out d55fd84 
  ql/src/test/results/clientpositive/louter_join_ppr.q.out 32827a3 
  ql/src/test/results/clientpositive/macro.q.out 3d74674 
  ql/src/test/results/clientpositive/metadataonly1.q.out aa6402e 
  ql/src/test/results/clientpositive/nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/outer_join_ppr.q.out f311cce 
  ql/src/test/results/clientpositive/pcr.q.out cd3caff 
  ql/src/test/results/clientpositive/ppd_join_filter.q.out d76d5bd 
  ql/src/test/results/clientpositive/ppd_union_view.q.out e3e404e 
  ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out 2136a33 
  ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 2f006c6 
  ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 600a834 
  ql/src/test/results/clientpositive/regexp_extract.q.out 361d8ed 
  ql/src/test/results/clientpositive/router_join_ppr.q.out 52d7888 
  ql/src/test/results/clientpositive/sample10.q.out e4fecbe 
  ql/src/test/results/clientpositive/sample6.q.out cd78d8b 
  ql/src/test/results/clientpositive/sample8.q.out 8f26dc8 
  ql/src/test/results/clientpositive/sample9.q.out 7694961 
  ql/src/test/results/clientpositive/smb_mapjoin9.q.out 9a7a793 
  ql/src/test/results/clientpositive/smb_mapjoin_13.q.out 1204f88 
  ql/src/test/results/clientpositive/smb_mapjoin_15.q.out 8990856 
  ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out c390b5e 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 7dabb55 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out c321351 
  ql/src/test/results/clientpositive/transform_ppr1.q.out 740a931 
  ql/src/test/results/clientpositive/transform_ppr2.q.out fb8e039 
  ql/src/test/results/clientpositive/truncate_column_list_bucket.q.out c7e14fb 
  ql/src/test/results/clientpositive/udf_explode.q.out dc6a513 
  ql/src/test/results/clientpositive/udf_java_method.q.out 15e71e6 
  ql/src/test/results/clientpositive/udf_reflect.q.out 91aeab5 
  ql/src/test/results/clientpositive/udf_reflect2.q.out f2c64cd 
  ql/src/test/results/clientpositive/udtf_explode.q.out 2905d44 
  ql/src/test/results/clientpositive/union24.q.out 50ae7e3 
  ql/src/test/results/clientpositive/union_ppr.q.out 756a9cd 
  ql/src/test/results/compiler/plan/cast1.q.xml bd40304 
  ql/src/test/results/compiler/plan/groupby2.q.xml 13cca32 
  ql/src/test/results/compiler/plan/groupby3.q.xml 06f0864 
  ql/src/test/results/compiler/plan/groupby4.q.xml 21deeb9 
  ql/src/test/results/compiler/plan/groupby5.q.xml 521ee86 
  ql/src/test/results/compiler/plan/groupby6.q.xml b50d796 
  ql/src/test/results/compiler/plan/input20.q.xml 3174490 
  ql/src/test/results/compiler/plan/input8.q.xml cc567d4 
  ql/src/test/results/compiler/plan/input_part1.q.xml ed9d218 
  ql/src/test/results/compiler/plan/input_testxpath.q.xml 58edf34 
  ql/src/test/results/compiler/plan/input_testxpath2.q.xml 031b955 
  ql/src/test/results/compiler/plan/join4.q.xml 391b58d 
  ql/src/test/results/compiler/plan/join5.q.xml 2669097 
  ql/src/test/results/compiler/plan/join6.q.xml b92d70b 
  ql/src/test/results/compiler/plan/join7.q.xml d6253de 
  ql/src/test/results/compiler/plan/join8.q.xml e1e71a7 
  ql/src/test/results/compiler/plan/sample1.q.xml b2c40a3 
  ql/src/test/results/compiler/plan/udf1.q.xml ddc36ec 
  ql/src/test/results/compiler/plan/udf4.q.xml 8ea82eb 
  ql/src/test/results/compiler/plan/udf6.q.xml 334fe0c 
  ql/src/test/results/compiler/plan/udf_case.q.xml 67d55b8 
  ql/src/test/results/compiler/plan/udf_when.q.xml 8334326 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java d891249 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java d6b31a6 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 27ed4ef 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 99628dc 

Diff: https://reviews.apache.org/r/11854/diff/


Testing
-------

unit tests and beeline 


Thanks,

Thejas Nair