You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Anthony Hsu <ah...@linkedin.com> on 2014/04/07 21:18:15 UTC
Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if
partition schema does not match table schema
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------
Review request for hive.
Repository: hive-git
Description
-------
The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
ql/src/test/queries/clientpositive/avro_partitioned.q 068a13c
ql/src/test/results/clientpositive/avro_partitioned.q.out 352ec0d
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
Diff: https://reviews.apache.org/r/20096/diff/
Testing
-------
Added test cases
Thanks,
Anthony Hsu
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40245
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73193>
private static final variable names should be ALL_CAPS_WITH_UNDERSCORES (see variables on preceding lines).
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73194>
Formatting and whitespace cleanup should generally be reserved for patches specifically devoted to that task. While I sympathize with the urge to clean things up it makes backporting and merging patches a lot harder. If your IDE is automatically doing this you need to disable this behavior.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73198>
I think it would be good to explain the motivation for this change in the comment.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73199>
I think this would be a bit cleaner if lines 173 and 174 were left unchanged and line 181 was updated to iterate over tableDesc.getProperties().
ql/src/test/queries/clientpositive/avro_partitioned.q
<https://reviews.apache.org/r/20096/#comment73195>
Good attention to detail!
ql/src/test/queries/clientpositive/avro_partitioned.q
<https://reviews.apache.org/r/20096/#comment73196>
May want to add "... even if it has an old schema relative to the current table level schema".
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73200>
We should avoid defining this string token in two locations. I think it makes sense to refer to the one in PartitionDesc.
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73201>
I think it's a little confusing that useTablePropertiesIfAvailable mutates the contents of the properties object, which is then read on the next line. I think this code will be easier to understand if useTablePropertiesIfAvailable is eliminated and the code is moved into an if/else if/else block in determineSchemaOrThrowException().
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73202>
A comment explaining what you're testing would be nice.
- Carl Steinbach
On April 7, 2014, 7:18 p.m., Anthony Hsu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
>
> (Updated April 7, 2014, 7:18 p.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
>
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
> ql/src/test/queries/clientpositive/avro_partitioned.q 068a13c
> ql/src/test/results/clientpositive/avro_partitioned.q.out 352ec0d
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
>
> Diff: https://reviews.apache.org/r/20096/diff/
>
>
> Testing
> -------
>
> Added test cases
>
>
> Thanks,
>
> Anthony Hsu
>
>
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review41266
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
<https://reviews.apache.org/r/20096/#comment74708>
This might need some caution here. getOverlayedProperties() gives a mixed of table and partition properties. We want to give pure table and partition properties to Serde and Serde decides what to do about them. If the serde here is Avro, it might defeat the purpose. (You can try to do a select * from an Avro table to verify.) In addition, what if a new serde needs pure partition properties to initialize. The code here will prevent from doing that.
I suggest we pass pure table properties and pure partition properties to SerDeUtils.initializeSerDe().
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
<https://reviews.apache.org/r/20096/#comment74709>
Same as above.
ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/20096/#comment74710>
Same as above. partProperties are not pure partition properties, but overlayed table and partition properties.
serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java
<https://reviews.apache.org/r/20096/#comment74711>
Here we should put the previous overlay logic. If both table and partition properties are given, we should make overlayed properties to initialize as default implementation.
serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java
<https://reviews.apache.org/r/20096/#comment74712>
Again, we should overlay the properties and passed to serde.initialize() to keep the original behavior.
- Xuefu Zhang
On April 24, 2014, 1:25 a.m., Anthony Hsu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
>
> (Updated April 24, 2014, 1:25 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
>
> I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
>
>
> Diffs
> -----
>
> contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f
> hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550
> jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e
> ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba
> ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925
> ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199
> ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3
> ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948
> ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36
> ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6
> ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1
> ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff
> ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
> ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15
> ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265
> ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9
> ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
> ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
> serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8
> serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e
> serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f
> serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42
> serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff
> serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b
> serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868
> serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e
> service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5
>
> Diff: https://reviews.apache.org/r/20096/diff/
>
>
> Testing
> -------
>
> Added test cases
>
>
> Thanks,
>
> Anthony Hsu
>
>
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review41375
-----------------------------------------------------------
Ship it!
Ship It!
- Xuefu Zhang
On April 24, 2014, 5:42 p.m., Anthony Hsu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
>
> (Updated April 24, 2014, 5:42 p.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
>
> I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
>
>
> Diffs
> -----
>
> contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f
> hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550
> jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e
> ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba
> ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925
> ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199
> ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3
> ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948
> ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36
> ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c52a093
> ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1
> ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff
> ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
> ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15
> ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265
> ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9
> ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
> ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
> serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8
> serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e
> serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f
> serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42
> serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff
> serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b
> serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868
> serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e
> service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5
>
> Diff: https://reviews.apache.org/r/20096/diff/
>
>
> Testing
> -------
>
> Added test cases
>
>
> Thanks,
>
> Anthony Hsu
>
>
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Anthony Hsu <ah...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------
(Updated April 24, 2014, 5:42 p.m.)
Review request for hive.
Changes
-------
Removed PartitionDesc.getOverlayedProperties(). Created a new method SerDeUtils.createOverlayedProperties(). Changed behavior of SerDeUtils.initializeSerDe() and AbstractSerDe.initialize() to use the new SerDeUtils.createOverlayedProperties() method.
Repository: hive-git
Description
-------
The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
Diffs (updated)
-----
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd
hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789
hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a
hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550
jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e
ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba
ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199
ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3
ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e
ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948
ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36
ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c52a093
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1
ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff
ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15
ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9
ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8
serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e
serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f
serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868
serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e
service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5
Diff: https://reviews.apache.org/r/20096/diff/
Testing
-------
Added test cases
Thanks,
Anthony Hsu
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Anthony Hsu <ah...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------
(Updated April 24, 2014, 1:25 a.m.)
Review request for hive.
Changes
-------
Implemented a new approach based on Xuefu and Ashutosh's comments on HIVE-6835. See the updated Description for details.
Repository: hive-git
Description (updated)
-------
The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
Diffs (updated)
-----
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd
hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789
hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a
hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550
jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e
ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba
ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199
ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3
ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e
ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948
ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36
ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1
ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff
ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15
ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9
ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8
serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e
serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f
serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868
serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e
service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5
Diff: https://reviews.apache.org/r/20096/diff/
Testing
-------
Added test cases
Thanks,
Anthony Hsu
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40626
-----------------------------------------------------------
Ship it!
Ship It!
- Carl Steinbach
On April 17, 2014, 1:14 a.m., Anthony Hsu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
>
> (Updated April 17, 2014, 1:14 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
>
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
> ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
> ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
> serde/if/serde.thrift 31c87ee
> serde/src/gen/thrift/gen-cpp/serde_constants.h d56c917
> serde/src/gen/thrift/gen-cpp/serde_constants.cpp 54503e3
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25
> serde/src/gen/thrift/gen-php/org/apache/hadoop/hive/serde/Types.php 837dd11
> serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py 8eac87d
> serde/src/gen/thrift/gen-rb/serde_constants.rb ed86522
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
>
> Diff: https://reviews.apache.org/r/20096/diff/
>
>
> Testing
> -------
>
> Added test cases
>
>
> Thanks,
>
> Anthony Hsu
>
>
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Anthony Hsu <ah...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------
(Updated April 17, 2014, 1:14 a.m.)
Review request for hive.
Changes
-------
Addressed Ashutosh's comments in HIVE-6835. Added the constant to serde.thrift and used the Thrift compiler to generate all the language-specific bindings.
Repository: hive-git
Description
-------
The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
serde/if/serde.thrift 31c87ee
serde/src/gen/thrift/gen-cpp/serde_constants.h d56c917
serde/src/gen/thrift/gen-cpp/serde_constants.cpp 54503e3
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25
serde/src/gen/thrift/gen-php/org/apache/hadoop/hive/serde/Types.php 837dd11
serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py 8eac87d
serde/src/gen/thrift/gen-rb/serde_constants.rb ed86522
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
Diff: https://reviews.apache.org/r/20096/diff/
Testing
-------
Added test cases
Thanks,
Anthony Hsu
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40353
-----------------------------------------------------------
Ship it!
Ship It!
- Carl Steinbach
On April 14, 2014, 6:49 p.m., Anthony Hsu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
>
> (Updated April 14, 2014, 6:49 p.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
>
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
> ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
> ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
> serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
>
> Diff: https://reviews.apache.org/r/20096/diff/
>
>
> Testing
> -------
>
> Added test cases
>
>
> Thanks,
>
> Anthony Hsu
>
>
Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails
if partition schema does not match table schema
Posted by Anthony Hsu <ah...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------
(Updated April 14, 2014, 6:49 p.m.)
Review request for hive.
Changes
-------
Addressed Carl's comments. Changes:
- Reverted whitespace changes.
- Moved the TABLE_PROP_PREFIX ("table.") to serdeConstants.
- Removed code that mutated the Properties passed to the AvroSerDe
- Added/improved comments
- Synced with latest
Repository: hive-git
Description
-------
The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c
ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117
ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570
Diff: https://reviews.apache.org/r/20096/diff/
Testing
-------
Added test cases
Thanks,
Anthony Hsu