You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Gunther Hagleitner <gh...@hortonworks.com> on 2014/03/01 00:26:56 UTC

Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/#review35866
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
<https://reviews.apache.org/r/18230/#comment66657>

    doesn't look like keyobject is used?



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
<https://reviews.apache.org/r/18230/#comment66658>

    looks like unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
<https://reviews.apache.org/r/18230/#comment66659>

    this still violates the coding standard as far as i can tell.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
<https://reviews.apache.org/r/18230/#comment66660>

    same here.



serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java
<https://reviews.apache.org/r/18230/#comment66663>

    this doesn't seem to belong here. it's not a general purpose serde method... in the vectorizedreducesink we seem to just break the row group into rows and serialize with the unchanged serde. can we do this here too?



serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java
<https://reviews.apache.org/r/18230/#comment66662>

    this doesn't seem to belong in the serde. this is a helper for the map join key only. (e.g.: field < 8, etc) you should be able to just use the existing public interface, right?


- Gunther Hagleitner


On Feb. 28, 2014, 10:04 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18230/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2014, 10:04 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jitendra Pandey.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See JIRA
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 3cfaacf 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 8b25300 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 2ac0928 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 0279f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 581046e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 2466a3b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 997202f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java d17b656 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java a103a51 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 40bf006 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 22eca50 
>   serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java fcded96 
>   serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java 7bfe473 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 67cb1e8 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java f9b4031 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java c583ae2 
> 
> Diff: https://reviews.apache.org/r/18230/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

Posted by Sergey Shelukhin <se...@hortonworks.com>.

> On Feb. 28, 2014, 11:26 p.m., Gunther Hagleitner wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java, line 309
> > <https://reviews.apache.org/r/18230/diff/10/?file=507272#file507272line309>
> >
> >     this doesn't seem to belong in the serde. this is a helper for the map join key only. (e.g.: field < 8, etc) you should be able to just use the existing public interface, right?

I will have to add at least one static method. But yeah, made it a simple pass-thru to already existing private static method; moved all the key-specific stuff to keys


> On Feb. 28, 2014, 11:26 p.m., Gunther Hagleitner wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java, line 236
> > <https://reviews.apache.org/r/18230/diff/10/?file=507272#file507272line236>
> >
> >     this doesn't seem to belong here. it's not a general purpose serde method... in the vectorizedreducesink we seem to just break the row group into rows and serialize with the unchanged serde. can we do this here too?

I can see if it works... looks convoluted from perf perspective, writable is created, then writer does bunch of stuff to get back raw value. If it works I guess we can keep it and speed up by getting raw value later, if needed


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/#review35866
-----------------------------------------------------------


On Feb. 28, 2014, 10:04 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18230/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2014, 10:04 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jitendra Pandey.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See JIRA
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 3cfaacf 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 8b25300 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 2ac0928 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 0279f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 581046e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 2466a3b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 997202f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java d17b656 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java a103a51 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java 40bf006 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 22eca50 
>   serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java fcded96 
>   serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java 7bfe473 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 67cb1e8 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java f9b4031 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java c583ae2 
> 
> Diff: https://reviews.apache.org/r/18230/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>