You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Thejas Nair <th...@hortonworks.com> on 2013/06/13 00:07:52 UTC
Review Request: HIVE-3253- ArrayIndexOutOfBounds exception for deeply nested
structs
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11854/
-----------------------------------------------------------
Review request for hive.
Description
-------
(description patch from the jira comment )
It increases the number of control charactors used by LazySimpleSerde, avoiding the chars that are likely to be present in data. Using new control chars is not backward compatible change, so you need to set the serde property hive.serialization.extend.nesting.levels to enable it for a table that is using LazySimpleSerde. If your input table has data that might contain these delimiter control chars, you should escape the delimiter chars, and set escape char using serde property.
Example :
create table nestedcomplex (
simple_int int,
max_nested_array array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true'
)
;
LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of levels of nesting supported by the serde. We should look at using LazyBinarySerde here as it would be more efficient and can go beyond this nesting level restriction.
LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the levels of nesting using the new serde property for that use case.
The patch has fix to give better error message when the levels of nesting exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception anymore)
This addresses bug HIVE-3253.
https://issues.apache.org/jira/browse/HIVE-3253
Diffs
-----
data/files/nested_complex.txt PRE-CREATION
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 3bd0919
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 04921d5
ql/src/test/queries/clientnegative/nested_complex_neg.q PRE-CREATION
ql/src/test/queries/clientpositive/nested_complex.q PRE-CREATION
ql/src/test/results/clientnegative/nested_complex_neg.q.out PRE-CREATION
ql/src/test/results/clientpositive/alter_partition_coltype.q.out d9c48aa
ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 492be3a
ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 7ed2448
ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5b49c35
ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 1b585bf
ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out c5315fb
ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out a9ab616
ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 7c4558f
ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out fc2ffc5
ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 3df0ca8
ql/src/test/results/clientpositive/bucket_map_join_1.q.out 56131b0
ql/src/test/results/clientpositive/bucket_map_join_2.q.out 1e7bea5
ql/src/test/results/clientpositive/bucketcontext_1.q.out 43e34ce
ql/src/test/results/clientpositive/bucketcontext_2.q.out ab44de5
ql/src/test/results/clientpositive/bucketcontext_3.q.out 592765a
ql/src/test/results/clientpositive/bucketcontext_4.q.out 6fc94a7
ql/src/test/results/clientpositive/bucketcontext_5.q.out 8eb9a71
ql/src/test/results/clientpositive/bucketcontext_6.q.out 8271292
ql/src/test/results/clientpositive/bucketcontext_7.q.out db9bb1d
ql/src/test/results/clientpositive/bucketcontext_8.q.out 21b5dc5
ql/src/test/results/clientpositive/bucketmapjoin1.q.out 4bbd35f
ql/src/test/results/clientpositive/bucketmapjoin10.q.out 3466e6d
ql/src/test/results/clientpositive/bucketmapjoin11.q.out 1c12c09
ql/src/test/results/clientpositive/bucketmapjoin12.q.out abf9783
ql/src/test/results/clientpositive/bucketmapjoin13.q.out 870cb35
ql/src/test/results/clientpositive/bucketmapjoin7.q.out b8ba7c0
ql/src/test/results/clientpositive/bucketmapjoin8.q.out 2a5a5d5
ql/src/test/results/clientpositive/bucketmapjoin9.q.out c2db270
ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 2230fd1
ql/src/test/results/clientpositive/columnstats_partlvl.q.out 2c32730
ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 007bc31
ql/src/test/results/clientpositive/combine2_hadoop20.q.out 1ef67f4
ql/src/test/results/clientpositive/filter_join_breaktask.q.out 52bac6a
ql/src/test/results/clientpositive/groupby_sort_1.q.out e6f3a7a
ql/src/test/results/clientpositive/groupby_sort_skew_1.q.out b7ca0ee
ql/src/test/results/clientpositive/input23.q.out f71a43f
ql/src/test/results/clientpositive/input42.q.out 67679af
ql/src/test/results/clientpositive/input_part7.q.out 538a742
ql/src/test/results/clientpositive/input_part9.q.out 91d1794
ql/src/test/results/clientpositive/join_filters_overlap.q.out 4f79d38
ql/src/test/results/clientpositive/louter_join_ppr.q.out 32827a3
ql/src/test/results/clientpositive/metadataonly1.q.out aa6402e
ql/src/test/results/clientpositive/nested_complex.q.out PRE-CREATION
ql/src/test/results/clientpositive/outer_join_ppr.q.out f311cce
ql/src/test/results/clientpositive/pcr.q.out cd3caff
ql/src/test/results/clientpositive/ppd_join_filter.q.out d76d5bd
ql/src/test/results/clientpositive/ppd_union_view.q.out e3e404e
ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out 2136a33
ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 2f006c6
ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 600a834
ql/src/test/results/clientpositive/regexp_extract.q.out 361d8ed
ql/src/test/results/clientpositive/router_join_ppr.q.out 52d7888
ql/src/test/results/clientpositive/sample10.q.out e4fecbe
ql/src/test/results/clientpositive/sample6.q.out cd78d8b
ql/src/test/results/clientpositive/sample8.q.out 8f26dc8
ql/src/test/results/clientpositive/sample9.q.out 7694961
ql/src/test/results/clientpositive/smb_mapjoin9.q.out 9a7a793
ql/src/test/results/clientpositive/smb_mapjoin_13.q.out 1204f88
ql/src/test/results/clientpositive/smb_mapjoin_15.q.out 8990856
ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out c390b5e
ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 7dabb55
ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out c321351
ql/src/test/results/clientpositive/transform_ppr1.q.out 740a931
ql/src/test/results/clientpositive/transform_ppr2.q.out fb8e039
ql/src/test/results/clientpositive/udf_explode.q.out dc6a513
ql/src/test/results/clientpositive/udf_java_method.q.out 15e71e6
ql/src/test/results/clientpositive/udf_reflect.q.out 91aeab5
ql/src/test/results/clientpositive/udf_reflect2.q.out f2c64cd
ql/src/test/results/clientpositive/udtf_explode.q.out 2905d44
ql/src/test/results/clientpositive/union24.q.out 50ae7e3
ql/src/test/results/clientpositive/union_ppr.q.out 756a9cd
ql/src/test/results/compiler/plan/cast1.q.xml bd40304
ql/src/test/results/compiler/plan/groupby2.q.xml 13cca32
ql/src/test/results/compiler/plan/groupby3.q.xml 06f0864
ql/src/test/results/compiler/plan/groupby4.q.xml 21deeb9
ql/src/test/results/compiler/plan/groupby5.q.xml 521ee86
ql/src/test/results/compiler/plan/groupby6.q.xml b50d796
ql/src/test/results/compiler/plan/input20.q.xml 3174490
ql/src/test/results/compiler/plan/input8.q.xml cc567d4
ql/src/test/results/compiler/plan/input_part1.q.xml ed9d218
ql/src/test/results/compiler/plan/input_testxpath.q.xml 58edf34
ql/src/test/results/compiler/plan/input_testxpath2.q.xml 031b955
ql/src/test/results/compiler/plan/join4.q.xml 391b58d
ql/src/test/results/compiler/plan/join5.q.xml 2669097
ql/src/test/results/compiler/plan/join6.q.xml b92d70b
ql/src/test/results/compiler/plan/join7.q.xml d6253de
ql/src/test/results/compiler/plan/join8.q.xml e1e71a7
ql/src/test/results/compiler/plan/sample1.q.xml b2c40a3
ql/src/test/results/compiler/plan/udf1.q.xml ddc36ec
ql/src/test/results/compiler/plan/udf4.q.xml 8ea82eb
ql/src/test/results/compiler/plan/udf6.q.xml 334fe0c
ql/src/test/results/compiler/plan/udf_case.q.xml 67d55b8
ql/src/test/results/compiler/plan/udf_when.q.xml 8334326
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 59b1406
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java d6b31a6
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 27ed4ef
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 99628dc
Diff: https://reviews.apache.org/r/11854/diff/
Testing
-------
unit tests and beeline
Thanks,
Thejas Nair
Re: Review Request 11854: HIVE-3253- ArrayIndexOutOfBounds exception for
deeply nested structs
Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11854/
-----------------------------------------------------------
(Updated July 2, 2013, 11:15 p.m.)
Review request for hive.
Changes
-------
Updates q.out files for 0.23
Bugs: HIVE-3253
https://issues.apache.org/jira/browse/HIVE-3253
Repository: hive-git
Description
-------
(description patch from the jira comment )
It increases the number of control charactors used by LazySimpleSerde, avoiding the chars that are likely to be present in data. Using new control chars is not backward compatible change, so you need to set the serde property hive.serialization.extend.nesting.levels to enable it for a table that is using LazySimpleSerde. If your input table has data that might contain these delimiter control chars, you should escape the delimiter chars, and set escape char using serde property.
Example :
create table nestedcomplex (
simple_int int,
max_nested_array array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true'
)
;
LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of levels of nesting supported by the serde. We should look at using LazyBinarySerde here as it would be more efficient and can go beyond this nesting level restriction.
LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the levels of nesting using the new serde property for that use case.
The patch has fix to give better error message when the levels of nesting exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception anymore)
Diffs (updated)
-----
data/files/nested_complex.txt PRE-CREATION
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 3bd0919
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 04921d5
ql/src/test/queries/clientnegative/nested_complex_neg.q PRE-CREATION
ql/src/test/queries/clientpositive/nested_complex.q PRE-CREATION
ql/src/test/results/clientnegative/nested_complex_neg.q.out PRE-CREATION
ql/src/test/results/clientpositive/alter_partition_coltype.q.out d9c48aa
ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 492be3a
ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 7ed2448
ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5b49c35
ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 1b585bf
ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out c5315fb
ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out a9ab616
ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 7c4558f
ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out fc2ffc5
ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 3df0ca8
ql/src/test/results/clientpositive/bucket_map_join_1.q.out 56131b0
ql/src/test/results/clientpositive/bucket_map_join_2.q.out 1e7bea5
ql/src/test/results/clientpositive/bucketcontext_1.q.out 43e34ce
ql/src/test/results/clientpositive/bucketcontext_2.q.out ab44de5
ql/src/test/results/clientpositive/bucketcontext_3.q.out 592765a
ql/src/test/results/clientpositive/bucketcontext_4.q.out 6fc94a7
ql/src/test/results/clientpositive/bucketcontext_5.q.out 8eb9a71
ql/src/test/results/clientpositive/bucketcontext_6.q.out 8271292
ql/src/test/results/clientpositive/bucketcontext_7.q.out db9bb1d
ql/src/test/results/clientpositive/bucketcontext_8.q.out 21b5dc5
ql/src/test/results/clientpositive/bucketmapjoin1.q.out 4bbd35f
ql/src/test/results/clientpositive/bucketmapjoin10.q.out 3466e6d
ql/src/test/results/clientpositive/bucketmapjoin11.q.out 1c12c09
ql/src/test/results/clientpositive/bucketmapjoin12.q.out abf9783
ql/src/test/results/clientpositive/bucketmapjoin13.q.out 870cb35
ql/src/test/results/clientpositive/bucketmapjoin7.q.out b8ba7c0
ql/src/test/results/clientpositive/bucketmapjoin8.q.out 2a5a5d5
ql/src/test/results/clientpositive/bucketmapjoin9.q.out c2db270
ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 2230fd1
ql/src/test/results/clientpositive/columnstats_partlvl.q.out 2c32730
ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 007bc31
ql/src/test/results/clientpositive/combine2.q.out 1d51def
ql/src/test/results/clientpositive/combine2_hadoop20.q.out 1ef67f4
ql/src/test/results/clientpositive/filter_join_breaktask.q.out 52bac6a
ql/src/test/results/clientpositive/groupby_sort_1.q.out e6f3a7a
ql/src/test/results/clientpositive/groupby_sort_skew_1.q.out b7ca0ee
ql/src/test/results/clientpositive/input23.q.out f71a43f
ql/src/test/results/clientpositive/input42.q.out 67679af
ql/src/test/results/clientpositive/input_part7.q.out 538a742
ql/src/test/results/clientpositive/input_part9.q.out 91d1794
ql/src/test/results/clientpositive/join_filters_overlap.q.out 4f79d38
ql/src/test/results/clientpositive/list_bucket_dml_1.q.out 7d15a6c
ql/src/test/results/clientpositive/list_bucket_dml_11.q.out d631b14
ql/src/test/results/clientpositive/list_bucket_dml_12.q.out 343798d
ql/src/test/results/clientpositive/list_bucket_dml_13.q.out 3a896fd
ql/src/test/results/clientpositive/list_bucket_dml_2.q.out e95e05f
ql/src/test/results/clientpositive/list_bucket_dml_3.q.out a197c8f
ql/src/test/results/clientpositive/list_bucket_dml_4.q.out 795e2fc
ql/src/test/results/clientpositive/list_bucket_dml_5.q.out acf0b69
ql/src/test/results/clientpositive/list_bucket_dml_6.q.out 3d547dd
ql/src/test/results/clientpositive/list_bucket_dml_7.q.out 8f39c7e
ql/src/test/results/clientpositive/list_bucket_dml_8.q.out 8f9c0b2
ql/src/test/results/clientpositive/list_bucket_dml_9.q.out ea14fcf
ql/src/test/results/clientpositive/list_bucket_query_multiskew_1.q.out a3a8276
ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out 26eb5ca
ql/src/test/results/clientpositive/list_bucket_query_multiskew_3.q.out 492d31f
ql/src/test/results/clientpositive/list_bucket_query_oneskew_1.q.out ced0500
ql/src/test/results/clientpositive/list_bucket_query_oneskew_2.q.out f8d8b3f
ql/src/test/results/clientpositive/list_bucket_query_oneskew_3.q.out d55fd84
ql/src/test/results/clientpositive/louter_join_ppr.q.out 32827a3
ql/src/test/results/clientpositive/macro.q.out 3d74674
ql/src/test/results/clientpositive/metadataonly1.q.out aa6402e
ql/src/test/results/clientpositive/nested_complex.q.out PRE-CREATION
ql/src/test/results/clientpositive/outer_join_ppr.q.out f311cce
ql/src/test/results/clientpositive/pcr.q.out cd3caff
ql/src/test/results/clientpositive/ppd_join_filter.q.out d76d5bd
ql/src/test/results/clientpositive/ppd_union_view.q.out e3e404e
ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out 2136a33
ql/src/test/results/clientpositive/rand_partitionpruner1.q.out 2f006c6
ql/src/test/results/clientpositive/rand_partitionpruner3.q.out 600a834
ql/src/test/results/clientpositive/regexp_extract.q.out 361d8ed
ql/src/test/results/clientpositive/router_join_ppr.q.out 52d7888
ql/src/test/results/clientpositive/sample10.q.out e4fecbe
ql/src/test/results/clientpositive/sample6.q.out cd78d8b
ql/src/test/results/clientpositive/sample8.q.out 8f26dc8
ql/src/test/results/clientpositive/sample9.q.out 7694961
ql/src/test/results/clientpositive/smb_mapjoin9.q.out 9a7a793
ql/src/test/results/clientpositive/smb_mapjoin_13.q.out 1204f88
ql/src/test/results/clientpositive/smb_mapjoin_15.q.out 8990856
ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out c390b5e
ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 7dabb55
ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out c321351
ql/src/test/results/clientpositive/transform_ppr1.q.out 740a931
ql/src/test/results/clientpositive/transform_ppr2.q.out fb8e039
ql/src/test/results/clientpositive/truncate_column_list_bucket.q.out c7e14fb
ql/src/test/results/clientpositive/udf_explode.q.out dc6a513
ql/src/test/results/clientpositive/udf_java_method.q.out 15e71e6
ql/src/test/results/clientpositive/udf_reflect.q.out 91aeab5
ql/src/test/results/clientpositive/udf_reflect2.q.out f2c64cd
ql/src/test/results/clientpositive/udtf_explode.q.out 2905d44
ql/src/test/results/clientpositive/union24.q.out 50ae7e3
ql/src/test/results/clientpositive/union_ppr.q.out 756a9cd
ql/src/test/results/compiler/plan/cast1.q.xml bd40304
ql/src/test/results/compiler/plan/groupby2.q.xml 13cca32
ql/src/test/results/compiler/plan/groupby3.q.xml 06f0864
ql/src/test/results/compiler/plan/groupby4.q.xml 21deeb9
ql/src/test/results/compiler/plan/groupby5.q.xml 521ee86
ql/src/test/results/compiler/plan/groupby6.q.xml b50d796
ql/src/test/results/compiler/plan/input20.q.xml 3174490
ql/src/test/results/compiler/plan/input8.q.xml cc567d4
ql/src/test/results/compiler/plan/input_part1.q.xml ed9d218
ql/src/test/results/compiler/plan/input_testxpath.q.xml 58edf34
ql/src/test/results/compiler/plan/input_testxpath2.q.xml 031b955
ql/src/test/results/compiler/plan/join4.q.xml 391b58d
ql/src/test/results/compiler/plan/join5.q.xml 2669097
ql/src/test/results/compiler/plan/join6.q.xml b92d70b
ql/src/test/results/compiler/plan/join7.q.xml d6253de
ql/src/test/results/compiler/plan/join8.q.xml e1e71a7
ql/src/test/results/compiler/plan/sample1.q.xml b2c40a3
ql/src/test/results/compiler/plan/udf1.q.xml ddc36ec
ql/src/test/results/compiler/plan/udf4.q.xml 8ea82eb
ql/src/test/results/compiler/plan/udf6.q.xml 334fe0c
ql/src/test/results/compiler/plan/udf_case.q.xml 67d55b8
ql/src/test/results/compiler/plan/udf_when.q.xml 8334326
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java d891249
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java d6b31a6
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 27ed4ef
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 99628dc
Diff: https://reviews.apache.org/r/11854/diff/
Testing
-------
unit tests and beeline
Thanks,
Thejas Nair