You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Anthony Hsu via Review Board <no...@reviews.apache.org> on 2017/09/12 15:04:57 UTC

Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/
-----------------------------------------------------------

Review request for hive, Carl Steinbach and Ratandeep Ratti.


Bugs: HIVE-17394
    https://issues.apache.org/jira/browse/HIVE-17394


Repository: hive-git


Description
-------

Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.

In our testing, we found this improved count() queries by 2x.


Diffs
-----

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 


Diff: https://reviews.apache.org/r/62247/diff/1/


Testing
-------

`mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.


Thanks,

Anthony Hsu


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

Posted by Anthony Hsu via Review Board <no...@reviews.apache.org>.

> On 九月 12, 2017, 5:02 p.m., Ratandeep Ratti wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
> > Line 305 (original), 305 (patched)
> > <https://reviews.apache.org/r/62247/diff/1/?file=1820197#file1820197line305>
> >
> >     This comment is misleading now and can be removed.

Carl fixed this before committing. Thanks, Carl!


- Anthony


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185212
-----------------------------------------------------------


On 九月 12, 2017, 10:43 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> -----------------------------------------------------------
> 
> (Updated 九月 12, 2017, 10:43 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
>     https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -----
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/2/
> 
> 
> Testing
> -------
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

Posted by Ratandeep Ratti <rd...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185212
-----------------------------------------------------------




serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
Line 305 (original), 305 (patched)
<https://reviews.apache.org/r/62247/#comment261498>

    This comment is misleading now and can be removed.


- Ratandeep Ratti


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> -----------------------------------------------------------
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
>     https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -----
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> -------
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

Posted by Ratandeep Ratti <rd...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185210
-----------------------------------------------------------


Ship it!




LGTM

- Ratandeep Ratti


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> -----------------------------------------------------------
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
>     https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -----
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> -------
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/#review185195
-----------------------------------------------------------


Ship it!




+1

- Carl Steinbach


On Sept. 12, 2017, 3:04 p.m., Anthony Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62247/
> -----------------------------------------------------------
> 
> (Updated Sept. 12, 2017, 3:04 p.m.)
> 
> 
> Review request for hive, Carl Steinbach and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-17394
>     https://issues.apache.org/jira/browse/HIVE-17394
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.
> 
> In our testing, we found this improved count() queries by 2x.
> 
> 
> Diffs
> -----
> 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 
> 
> 
> Diff: https://reviews.apache.org/r/62247/diff/1/
> 
> 
> Testing
> -------
> 
> `mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.
> 
> 
> Thanks,
> 
> Anthony Hsu
> 
>


Re: Review Request 62247: HIVE-17394: AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

Posted by Anthony Hsu via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62247/
-----------------------------------------------------------

(Updated 九月 12, 2017, 10:43 p.m.)


Review request for hive, Carl Steinbach and Ratandeep Ratti.


Changes
-------

Addressed Ratandeep's comment.


Bugs: HIVE-17394
    https://issues.apache.org/jira/browse/HIVE-17394


Repository: hive-git


Description
-------

Previously, when Avro found a nullable union in the reader schema, it would regenerate the TypeInfo for the field for every record. This patch reuses the same TypeInfo that only needs to be calculated once.

In our testing, we found this improved count() queries by 2x.


Diffs (updated)
-----

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ecfe15f59dac04bda3f8f1275babebf736608a6b 


Diff: https://reviews.apache.org/r/62247/diff/2/

Changes: https://reviews.apache.org/r/62247/diff/1-2/


Testing
-------

`mvn clean package -DskipTests -Dmaven.javadoc.skip=true` succeeded.


Thanks,

Anthony Hsu