You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Anthony Hsu <ah...@linkedin.com> on 2016/11/26 23:03:35 UTC

Review Request 54094: HIVE-15190: Field names are not preserved in ORC files written with ACID

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54094/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-15190
    https://issues.apache.org/jira/browse/HIVE-15190


Repository: hive-git


Description
-------

Previously, when writing to an ACID ORC table, the file written to disk would have a schema of `struct<...(acid columns)...,row:struct<_col0:int,_col1:string,...>>`, using virtual column names `_col0`, `_col1`, etc., instead of the actual table column names. This patch fixes this issue.

Having the actual table column names in the ORC file itself is needed when doing schema evolution based on field names: https://issues.apache.org/jira/browse/ORC-54


Diffs
-----

  orc/src/java/org/apache/orc/impl/SchemaEvolution.java 7379de93a7f39d734ef7695c197bd9f24bc84321 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 53660206e3f59c37be261b1a9796f04721a244f3 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db482367f1037c486df9c5cabd67b1368ed 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 492c64c29e8d4f38d857381bc375074e06868f7c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 75c7680e267ab44e426d0b21c6fd6dce6a352bbd 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 49ba6675bae5b3e6d8bf1fa2e9ed8d2a27b7f83a 

Diff: https://reviews.apache.org/r/54094/diff/


Testing
-------

Added unit test. Also ran some of the existing ACID tests and they still passed.


Thanks,

Anthony Hsu


Re: Review Request 54094: HIVE-15190: Field names are not preserved in ORC files written with ACID

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54094/#review204793
-----------------------------------------------------------




ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
Lines 1759 (patched)
<https://reviews.apache.org/r/54094/#comment287518>

    Can you also add a test for complex types (nested) to see if those column names are retained as well?


- Prasanth_J


On Nov. 26, 2016, 11:03 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54094/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2016, 11:03 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15190
>     https://issues.apache.org/jira/browse/HIVE-15190
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Previously, when writing to an ACID ORC table, the file written to disk would have a schema of `struct<...(acid columns)...,row:struct<_col0:int,_col1:string,...>>`, using virtual column names `_col0`, `_col1`, etc., instead of the actual table column names. This patch fixes this issue.
> 
> Having the actual table column names in the ORC file itself is needed when doing schema evolution based on field names: https://issues.apache.org/jira/browse/ORC-54
> 
> 
> Diffs
> -----
> 
>   orc/src/java/org/apache/orc/impl/SchemaEvolution.java 7379de93a7f39d734ef7695c197bd9f24bc84321 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 53660206e3f59c37be261b1a9796f04721a244f3 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db482367f1037c486df9c5cabd67b1368ed 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 492c64c29e8d4f38d857381bc375074e06868f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java 75c7680e267ab44e426d0b21c6fd6dce6a352bbd 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 49ba6675bae5b3e6d8bf1fa2e9ed8d2a27b7f83a 
> 
> 
> Diff: https://reviews.apache.org/r/54094/diff/1/
> 
> 
> Testing
> -------
> 
> Added unit test. Also ran some of the existing ACID tests and they still passed.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>