You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Antal Sinkovits via Review Board <no...@reviews.apache.org> on 2018/08/22 13:15:03 UTC

Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------

Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.


Repository: hive-git


Description
-------

I've modified the SmallTableCache to use guava cache, with soft references. 
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 


Diff: https://reviews.apache.org/r/68474/diff/1/


Testing
-------


Thanks,

Antal Sinkovits


Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209130
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Line 60 (original), 69 (patched)
<https://reviews.apache.org/r/68474/#comment293420>

    keep the explicit cache method and call it in `MapJoinOperator#closeOp`. This way when a task finishes, we still keep the small table around for at least 30 seconds, which gives any tasks scheduled in the future a chance to re-use the small table.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 75 (patched)
<https://reviews.apache.org/r/68474/#comment293419>

    can u add some javadocs to this class explaining what it is doing



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 82 (patched)
<https://reviews.apache.org/r/68474/#comment293416>

    rename to something like `cleanupService`



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 90 (patched)
<https://reviews.apache.org/r/68474/#comment293417>

    nit: make `INTEGER_ONE` a static import



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 91 (patched)
<https://reviews.apache.org/r/68474/#comment293415>

    "SmallTableCache maintenance thread" -> "SmallTableCache Cleanup Thread"



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment293418>

    replace with `cacheL1.get(key, valueLoader)` where `valueLoader` loads from `cacheL2`


- Sahil Takiar


On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------

(Updated nov. 7, 2018, 2:38 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.


Repository: hive-git


Description
-------

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.


Diffs (updated)
-----

  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java PRE-CREATION 
  ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/5/

Changes: https://reviews.apache.org/r/68474/diff/4-5/


Testing
-------


Thanks,

Antal Sinkovits


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------

(Updated nov. 6, 2018, 12:28 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.


Repository: hive-git


Description
-------

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.


Diffs (updated)
-----

  ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/4/

Changes: https://reviews.apache.org/r/68474/diff/3-4/


Testing
-------


Thanks,

Antal Sinkovits


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.

> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.

MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, but the test code is really complex. And the eviction happens at the HivePairFlatMapFunction level. For every Map/Reduce the cache is reinitialized. If we are in a new query the cache gets evicted.


- Antal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------


On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated nov. 6, 2018, 12:28 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.

> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.
> 
> Antal Sinkovits wrote:
>     MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, but the test code is really complex. And the eviction happens at the HivePairFlatMapFunction level. For every Map/Reduce the cache is reinitialized. If we are in a new query the cache gets evicted.

I've added a new test, to check this.


- Antal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------


On nov. 7, 2018, 2:38 du, Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated nov. 7, 2018, 2:38 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java PRE-CREATION 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------



Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.

- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.

> On okt. 16, 2018, 2:50 du, Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
> > Lines 131 (patched)
> > <https://reviews.apache.org/r/68474/diff/3/?file=2095733#file2095733line135>
> >
> >     why do we run the action just for the l2 cache?

L2 contains all the elements from L1, so running through L2 is enough.


- Antal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
-----------------------------------------------------------


On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated nov. 6, 2018, 12:28 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment294162>

    nit: if you want to leave the `@return` section empty, then just remove it entirely



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 127 (patched)
<https://reviews.apache.org/r/68474/#comment294163>

    nit: same as above



ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
Lines 178-190 (patched)
<https://reviews.apache.org/r/68474/#comment294161>

    what about changing this to something like `getKey()` and just returning a `String`. I don't think the interface needs to be tied to reading data to a folder on HDFS.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 131 (patched)
<https://reviews.apache.org/r/68474/#comment294165>

    why do we run the action just for the l2 cache?


- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------

(Updated okt. 10, 2018, 1:20 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.


Summary (updated)
-----------------

HIVE-20440


Repository: hive-git


Description
-------

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.


Diffs (updated)
-----

  ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/3/

Changes: https://reviews.apache.org/r/68474/diff/2-3/


Testing
-------


Thanks,

Antal Sinkovits


Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

Posted by denys kuzmenko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review208793
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 94 (patched)
<https://reviews.apache.org/r/68474/#comment292990>

    Just a remark, note from google documentation:
    "Because of the performance implications of using soft references, we generally recommend using the more predictable maximum cache size instead."


- denys kuzmenko


On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
> 
> 
> Diffs
> -----
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>


Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------

(Updated szept. 19, 2018, 11:14 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.


Repository: hive-git


Description (updated)
-------

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.


Diffs (updated)
-----

  ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/2/

Changes: https://reviews.apache.org/r/68474/diff/1-2/


Testing
-------


Thanks,

Antal Sinkovits