You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by cheng xu <ch...@intel.com> on 2015/05/20 16:54:06 UTC

Review Request 34473: HIVE-10749 Implement Insert statement for parquet

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/
-----------------------------------------------------------

Review request for hive, Alan Gates and Sergio Pena.


Bugs: HIVE-10749
    https://issues.apache.org/jira/browse/HIVE-10749


Repository: hive-git


Description
-------

Implement the insert statement for parquet format.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 000eb38 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 8380117 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 4e1820c 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 43c772f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 0a5edbb 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java 0d32e49 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 5f7f597 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34473/diff/


Testing
-------

Newly added qtest and UT passed locally


Thanks,

cheng xu


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84521
-----------------------------------------------------------


I merged master to parquet, so that you don't have to the NullWritable in this patch.
Could you update the patch, and upload it again?

- Sergio Pena


On May 20, 2015, 2:54 p.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 2:54 p.m.)
> 
> 
> Review request for hive, Alan Gates and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 000eb38 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 8380117 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 4e1820c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 43c772f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 0a5edbb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java 0d32e49 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 5f7f597 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.

> On May 20, 2015, 8:45 p.m., Alexander Pivovarov wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java, line 207
> > <https://reviews.apache.org/r/34473/diff/1/?file=965270#file965270line207>
> >
> >     you can use
> >     final ArrayList<Object> list = new ArrayList<Object>(Collections.nCopies(fields.size(), null));
> >     instead

I don't think so because only in the insert statement, we can't understand how to inspect the row object until creating parquet writer. This is why I create the new constructor in ParquetStructObjectInspector. Thank yoU!


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84574
-----------------------------------------------------------


On May 20, 2015, 2:54 p.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 2:54 p.m.)
> 
> 
> Review request for hive, Alan Gates and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 000eb38 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 8380117 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 4e1820c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 43c772f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 0a5edbb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java 0d32e49 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 5f7f597 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by Alexander Pivovarov <ap...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84574
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment135898>

    missing space after comma and before tableProperties



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
<https://reviews.apache.org/r/34473/#comment135901>

    you can use
    final ArrayList<Object> list = new ArrayList<Object>(Collections.nCopies(fields.size(), null));
    instead


- Alexander Pivovarov


On May 20, 2015, 2:54 p.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 2:54 p.m.)
> 
> 
> Review request for hive, Alan Gates and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 000eb38 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 8380117 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 4e1820c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRawRecordMerger.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 43c772f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 0a5edbb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java 0d32e49 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 5f7f597 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/
-----------------------------------------------------------

(Updated May 28, 2015, 8:29 a.m.)


Review request for hive, Alan Gates, Owen O'Malley, and Sergio Pena.


Bugs: HIVE-10749
    https://issues.apache.org/jira/browse/HIVE-10749


Repository: hive-git


Description
-------

Implement the insert statement for parquet format.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java 571f993 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34473/diff/


Testing
-------

Newly added qtest and UT passed locally


Thanks,

cheng xu


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review85265
-----------------------------------------------------------


I run a INSERT command, but delta directories were not created.

-rwxr-xr-x   1 sergio supergroup        257 2015-05-26 16:24 /user/hive/warehouse/numbers_bucketed/000000_0
-rwxr-xr-x   1 sergio supergroup        265 2015-05-26 16:24 /user/hive/warehouse/numbers_bucketed/000001_0

- Sergio Pena


On May 22, 2015, 6:26 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 6:26 a.m.)
> 
> 
> Review request for hive, Alan Gates, Owen O'Malley, and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java 571f993 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/
-----------------------------------------------------------

(Updated May 22, 2015, 6:26 a.m.)


Review request for hive, Alan Gates, Owen O'Malley, and Sergio Pena.


Changes
-------

Summary:
1. use some utility to reduce LOC
2. remove *ParquetRecordReaderWrapper.java* and use *ObjectArrayWritableObjectInspector* instead


Bugs: HIVE-10749
    https://issues.apache.org/jira/browse/HIVE-10749


Repository: hive-git


Description
-------

Implement the insert statement for parquet format.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java 571f993 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34473/diff/


Testing
-------

Newly added qtest and UT passed locally


Thanks,

cheng xu


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by Alan Gates <ga...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84729
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java
<https://reviews.apache.org/r/34473/#comment136066>

    Do you intend to use this in conjunction with hive.hcatalog.streaming?  If so, closing the file on a flush is not what you'll want.


- Alan Gates


On May 21, 2015, 7:45 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 7:45 a.m.)
> 
> 
> Review request for hive, Alan Gates and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.

> On May 21, 2015, 7:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java, line 1
> > <https://reviews.apache.org/r/34473/diff/2/?file=966163#file966163line1>
> >
> >     I am waiting to commit the patch from HIVE-10749 that uses a similar class named ObjectArrayWritableObjectInspector. 
> >     
> >     Also, I think this is already part o the parquet branch.

It's https://issues.apache.org/jira/browse/HIVE-9658


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84758
-----------------------------------------------------------


On May 22, 2015, 6:26 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 6:26 a.m.)
> 
> 
> Review request for hive, Alan Gates, Owen O'Malley, and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java 571f993 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.

> On May 21, 2015, 7:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java, line 59
> > <https://reviews.apache.org/r/34473/diff/2/?file=966160#file966160line59>
> >
> >     Could you separate words with _? Like ENABLE_ACID_SCHEMA_INFO. It helps to read the constant more easily.
> >     
> >     Do we have to enable transactions exclusively for parquet? Isn't there another variable that enables trasnactions on Hive that we can use?

This variable is used for setting the schema for parquet. It's only related to whether you need to write data to base file or not. So we have to use this way to append the original data with ACID info.


> On May 21, 2015, 7:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java, lines 98-103
> > <https://reviews.apache.org/r/34473/diff/2/?file=966160#file966160line98>
> >
> >     You can use this one line to return the column list:
> >     
> >     return (List<String>) StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS));
> >     
> >     It will return an empty list array if COLUMN is empty.

Great suggestion!


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84758
-----------------------------------------------------------


On May 22, 2015, 6:26 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 6:26 a.m.)
> 
> 
> Review request for hive, Alan Gates, Owen O'Malley, and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java 571f993 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/#review84758
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136104>

    Could you separate words with _? Like ENABLE_ACID_SCHEMA_INFO. It helps to read the constant more easily.
    
    Do we have to enable transactions exclusively for parquet? Isn't there another variable that enables trasnactions on Hive that we can use?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136111>

    Could you separate the workds? Like ENABLE_ACID_SCHEMA_INFO. It makes the code more readable.
    
    Also, isn't there another variable that we can use to detect if transactions are enabled? I am not sure if we should add more variables to Hive.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136107>

    You can use this one line to return the column list:
    
    return (List<String>) StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS));
    
    It will return an empty list array if COLUMN is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136112>

    You can save code by using this line:
    
    return (List<String>) StringUtils.getStringCollection(tableProperties.getProperty(IOConstants.COLUMNS));
    
    It will return an empty list if the parameter is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136108>

    You can call TypeInfoUtils.getTypeINfosFromTypeString() with an empty string here. It will return an empty list. Let's save code by using:
    
    ArrayList<TypeInfo> columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136113>

    You can save code by using this line:
    
    ArrayList<TypeInfo> columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
    
    It will return an empty list if the parameter is empty.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136109>

    Same here, you can save code with this:
    
    ArrayList<String> columnNames = (ArrayList<String>) StringUtils.getStringCollection(columnNameProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
<https://reviews.apache.org/r/34473/#comment136114>

    Same thing here:
    
    ArrayList<String> columnNames = (ArrayList<String>) StringUtils.getStringCollection(columnNameProperty);



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
<https://reviews.apache.org/r/34473/#comment136117>

    Why do you need a Writable? HIVE-9658 tries to avoid wrapping java types into writable if they are being used by Hive to save memory usage.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java
<https://reviews.apache.org/r/34473/#comment136116>

    I am waiting to commit the patch from HIVE-10749 that uses a similar class named ObjectArrayWritableObjectInspector. 
    
    Also, I think this is already part o the parquet branch.


- Sergio Pena


On May 21, 2015, 7:45 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34473/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 7:45 a.m.)
> 
> 
> Review request for hive, Alan Gates and Sergio Pena.
> 
> 
> Bugs: HIVE-10749
>     https://issues.apache.org/jira/browse/HIVE-10749
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement the insert statement for parquet format.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34473/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest and UT passed locally
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 34473: HIVE-10749 Implement Insert statement for parquet

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34473/
-----------------------------------------------------------

(Updated May 21, 2015, 7:45 a.m.)


Review request for hive, Alan Gates and Sergio Pena.


Changes
-------

Summary:
1. fix code style issues
2. remove codes irrelevant to the insert statement
3. fix one issue about SetParquetSchema from previous patch


Bugs: HIVE-10749
    https://issues.apache.org/jira/browse/HIVE-10749


Repository: hive-git


Description
-------

Implement the insert statement for parquet format.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java c6fb26c 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/acid/ParquetRecordUpdater.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java f513572 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetStructObjectInspector.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/acid/TestParquetRecordUpdater.java PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_parquet_insert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_parquet_insert.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/34473/diff/


Testing
-------

Newly added qtest and UT passed locally


Thanks,

cheng xu