You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Anthony Hsu <ah...@linkedin.com> on 2014/04/07 21:18:15 UTC

Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.

I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
  ql/src/test/queries/clientpositive/avro_partitioned.q 068a13c 
  ql/src/test/results/clientpositive/avro_partitioned.q.out 352ec0d 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 

Diff: https://reviews.apache.org/r/20096/diff/


Testing
-------

Added test cases


Thanks,

Anthony Hsu

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Carl Steinbach <cw...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40245
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73193>

    private static final variable names should be ALL_CAPS_WITH_UNDERSCORES (see variables on preceding lines).
    



ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73194>

    Formatting and whitespace cleanup should generally be reserved for patches specifically devoted to that task. While I sympathize with the urge to clean things up it makes backporting and merging patches a lot harder. If your IDE is automatically doing this you need to disable this behavior.



ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73198>

    I think it would be good to explain the motivation for this change in the comment.



ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
<https://reviews.apache.org/r/20096/#comment73199>

    I think this would be a bit cleaner if lines 173 and 174 were left unchanged and line 181 was updated to iterate over tableDesc.getProperties().



ql/src/test/queries/clientpositive/avro_partitioned.q
<https://reviews.apache.org/r/20096/#comment73195>

    Good attention to detail!



ql/src/test/queries/clientpositive/avro_partitioned.q
<https://reviews.apache.org/r/20096/#comment73196>

    May want to add "... even if it has an old schema relative to the current table level schema".



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73200>

    We should avoid defining this string token in two locations. I think it makes sense to refer to the one in PartitionDesc.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73201>

    I think it's a little confusing that useTablePropertiesIfAvailable mutates the contents of the properties object, which is then read on the next line. I think this code will be easier to understand if useTablePropertiesIfAvailable is eliminated and the code is moved into an if/else if/else block in determineSchemaOrThrowException().



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/20096/#comment73202>

    A comment explaining what you're testing would be nice.


- Carl Steinbach


On April 7, 2014, 7:18 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
> 
> (Updated April 7, 2014, 7:18 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
> 
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
>   ql/src/test/queries/clientpositive/avro_partitioned.q 068a13c 
>   ql/src/test/results/clientpositive/avro_partitioned.q.out 352ec0d 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
> 
> Diff: https://reviews.apache.org/r/20096/diff/
> 
> 
> Testing
> -------
> 
> Added test cases
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review41266
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
<https://reviews.apache.org/r/20096/#comment74708>

    This might need some caution here. getOverlayedProperties() gives a mixed of table and partition properties. We want to give pure table and partition properties to Serde and Serde decides what to do about them. If the serde here is Avro, it might defeat the purpose. (You can try to do a select * from an Avro table to verify.) In addition, what if a new serde needs pure partition properties to initialize. The code here will prevent from doing that.
    
    I suggest we pass pure table properties and pure partition properties to SerDeUtils.initializeSerDe().



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
<https://reviews.apache.org/r/20096/#comment74709>

    Same as above.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
<https://reviews.apache.org/r/20096/#comment74710>

    Same as above. partProperties are not pure partition properties, but overlayed table and partition properties.



serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java
<https://reviews.apache.org/r/20096/#comment74711>

    Here we should put the previous overlay logic. If both table and partition properties are given, we should make overlayed properties to initialize as default implementation.



serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java
<https://reviews.apache.org/r/20096/#comment74712>

    Again, we should overlay the properties and passed to serde.initialize() to keep the original behavior.


- Xuefu Zhang


On April 24, 2014, 1:25 a.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
> 
> (Updated April 24, 2014, 1:25 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
> 
> I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
> 
> 
> Diffs
> -----
> 
>   contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550 
>   jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 
>   ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
>   ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
>   serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8 
>   serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e 
>   serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f 
>   serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42 
>   serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d 
>   serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7 
>   serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5 
> 
> Diff: https://reviews.apache.org/r/20096/diff/
> 
> 
> Testing
> -------
> 
> Added test cases
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Xuefu Zhang <xz...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review41375
-----------------------------------------------------------

Ship it!


Ship It!

- Xuefu Zhang


On April 24, 2014, 5:42 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
> 
> (Updated April 24, 2014, 5:42 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
> 
> I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.
> 
> 
> Diffs
> -----
> 
>   contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f 
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a 
>   hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550 
>   jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c52a093 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 
>   ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
>   ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
>   serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8 
>   serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e 
>   serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f 
>   serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42 
>   serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868 
>   serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d 
>   serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7 
>   serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5 
> 
> Diff: https://reviews.apache.org/r/20096/diff/
> 
> 
> Testing
> -------
> 
> Added test cases
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Anthony Hsu <ah...@linkedin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------

(Updated April 24, 2014, 5:42 p.m.)


Review request for hive.


Changes
-------

Removed PartitionDesc.getOverlayedProperties(). Created a new method SerDeUtils.createOverlayedProperties(). Changed behavior of SerDeUtils.initializeSerDe() and AbstractSerDe.initialize() to use the new SerDeUtils.createOverlayedProperties() method.


Repository: hive-git


Description
-------

The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.

I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.


Diffs (updated)
-----

  contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b 
  contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a 
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771 
  hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8 
  hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550 
  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java c52a093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 
  ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
  ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
  serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e 
  serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f 
  serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42 
  serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868 
  serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d 
  serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7 
  serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5 

Diff: https://reviews.apache.org/r/20096/diff/


Testing
-------

Added test cases


Thanks,

Anthony Hsu

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Anthony Hsu <ah...@linkedin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------

(Updated April 24, 2014, 1:25 a.m.)


Review request for hive.


Changes
-------

Implemented a new approach based on Xuefu and Ashutosh's comments on HIVE-6835. See the updated Description for details.


Repository: hive-git


Description (updated)
-------

The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.

I fixed this problem by adding a new initialize() method to AbstractSerDe that takes both table properties and partition properties. The default implementation of this new method uses partition properties if its not null and table properties otherwise. I then overrode the new initalize() method in the AvroSerDe, and had the AvroSerDe always use the table properties. I also added a helper method that takes a Deserializer and calls the new initialize() method whenever the Deserializer is an instanceof AbstractSerDe. I then had to change all calls to Deserializer.initialize() to use my helper method instead.


Diffs (updated)
-----

  contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java 69b618b 
  contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java 394ce3f 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 089a31a 
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java fb650dd 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestHCatRecordSerDe.java e84b789 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestJsonSerDe.java c1d170a 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9dde771 
  hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java 7ba6bb8 
  hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java 9b26550 
  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 3215178 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 25385ba 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java b0b0925 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 6daf199 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java e00b7d3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java c8003f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 80ccf5a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 055d13e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 2416948 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 1354b36 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java 3bf58f6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java 2ef79d4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 0e4bdff 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 49b8da1 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java f339651 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 77305ff 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java 3a258e4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 6144303 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 755d783 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java cea3529 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 4fc613e 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java 7f3cb15 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 5edd265 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5664f3f 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 
  ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
  ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
  serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java 1ab15a8 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java d226d21 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 55bfa2e 
  serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java a5d494f 
  serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java e512f42 
  serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java e8639ff 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java 714045b 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 28eb868 
  serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java 69c891d 
  serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java a69fcb7 
  serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java dd9610e 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 2a113d5 

Diff: https://reviews.apache.org/r/20096/diff/


Testing
-------

Added test cases


Thanks,

Anthony Hsu

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Carl Steinbach <cw...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40626
-----------------------------------------------------------

Ship it!


Ship It!

- Carl Steinbach


On April 17, 2014, 1:14 a.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
> 
> (Updated April 17, 2014, 1:14 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
> 
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
>   ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
>   ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
>   serde/if/serde.thrift 31c87ee 
>   serde/src/gen/thrift/gen-cpp/serde_constants.h d56c917 
>   serde/src/gen/thrift/gen-cpp/serde_constants.cpp 54503e3 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 
>   serde/src/gen/thrift/gen-php/org/apache/hadoop/hive/serde/Types.php 837dd11 
>   serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py 8eac87d 
>   serde/src/gen/thrift/gen-rb/serde_constants.rb ed86522 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
> 
> Diff: https://reviews.apache.org/r/20096/diff/
> 
> 
> Testing
> -------
> 
> Added test cases
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Anthony Hsu <ah...@linkedin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------

(Updated April 17, 2014, 1:14 a.m.)


Review request for hive.


Changes
-------

Addressed Ashutosh's comments in HIVE-6835. Added the constant to serde.thrift and used the Thrift compiler to generate all the language-specific bindings.


Repository: hive-git


Description
-------

The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.

I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
  ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
  ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
  serde/if/serde.thrift 31c87ee 
  serde/src/gen/thrift/gen-cpp/serde_constants.h d56c917 
  serde/src/gen/thrift/gen-cpp/serde_constants.cpp 54503e3 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 
  serde/src/gen/thrift/gen-php/org/apache/hadoop/hive/serde/Types.php 837dd11 
  serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py 8eac87d 
  serde/src/gen/thrift/gen-rb/serde_constants.rb ed86522 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 

Diff: https://reviews.apache.org/r/20096/diff/


Testing
-------

Added test cases


Thanks,

Anthony Hsu

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Carl Steinbach <cw...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/#review40353
-----------------------------------------------------------

Ship it!


Ship It!

- Carl Steinbach


On April 14, 2014, 6:49 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20096/
> -----------------------------------------------------------
> 
> (Updated April 14, 2014, 6:49 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.
> 
> I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
>   ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
>   ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
> 
> Diff: https://reviews.apache.org/r/20096/diff/
> 
> 
> Testing
> -------
> 
> Added test cases
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>

Re: Review Request 20096: HIVE-6835: Reading of partitioned Avro data fails if partition schema does not match table schema

Posted by Anthony Hsu <ah...@linkedin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20096/
-----------------------------------------------------------

(Updated April 14, 2014, 6:49 p.m.)


Review request for hive.


Changes
-------

Addressed Carl's comments. Changes:
- Reverted whitespace changes.
- Moved the TABLE_PROP_PREFIX ("table.") to serdeConstants.
- Removed code that mutated the Properties passed to the AvroSerDe
- Added/improved comments
- Synced with latest


Repository: hive-git


Description
-------

The problem occurs when you store the "avro.schema.(literal|url)" in the SERDEPROPERTIES instead of the TBLPROPERTIES, add a partition, change the table's schema, and then try reading from the old partition.

I fixed this problem by passing the table properties to the partition with a "table." prefix, and changing the Avro SerDe to always use the table properties when available.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 43cef5c 
  ql/src/test/queries/clientpositive/avro_partitioned.q 6fe5117 
  ql/src/test/results/clientpositive/avro_partitioned.q.out 644716d 
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 

Diff: https://reviews.apache.org/r/20096/diff/


Testing
-------

Added test cases


Thanks,

Anthony Hsu