You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Antal Sinkovits via Review Board <no...@reviews.apache.org> on 2018/08/22 13:15:03 UTC
Review Request 68474: HIVE-20440: Create better cache eviction policy
for SmallTableCache
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------
Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
Repository: hive-git
Description
-------
I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
Diff: https://reviews.apache.org/r/68474/diff/1/
Testing
-------
Thanks,
Antal Sinkovits
Re: Review Request 68474: HIVE-20440: Create better cache eviction
policy for SmallTableCache
Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209130
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Line 60 (original), 69 (patched)
<https://reviews.apache.org/r/68474/#comment293420>
keep the explicit cache method and call it in `MapJoinOperator#closeOp`. This way when a task finishes, we still keep the small table around for at least 30 seconds, which gives any tasks scheduled in the future a chance to re-use the small table.
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 75 (patched)
<https://reviews.apache.org/r/68474/#comment293419>
can u add some javadocs to this class explaining what it is doing
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 82 (patched)
<https://reviews.apache.org/r/68474/#comment293416>
rename to something like `cleanupService`
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 90 (patched)
<https://reviews.apache.org/r/68474/#comment293417>
nit: make `INTEGER_ONE` a static import
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 91 (patched)
<https://reviews.apache.org/r/68474/#comment293415>
"SmallTableCache maintenance thread" -> "SmallTableCache Cleanup Thread"
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment293418>
replace with `cacheL1.get(key, valueLoader)` where `valueLoader` loads from `cacheL2`
- Sahil Takiar
On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated Sept. 19, 2018, 11:14 p.m.)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/2/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------
(Updated nov. 7, 2018, 2:38 du)
Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
Repository: hive-git
Description
-------
I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
Diffs (updated)
-----
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java PRE-CREATION
ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
Diff: https://reviews.apache.org/r/68474/diff/5/
Changes: https://reviews.apache.org/r/68474/diff/4-5/
Testing
-------
Thanks,
Antal Sinkovits
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------
(Updated nov. 6, 2018, 12:28 du)
Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
Repository: hive-git
Description
-------
I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
Diffs (updated)
-----
ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
Diff: https://reviews.apache.org/r/68474/diff/4/
Changes: https://reviews.apache.org/r/68474/diff/3-4/
Testing
-------
Thanks,
Antal Sinkovits
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.
MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, but the test code is really complex. And the eviction happens at the HivePairFlatMapFunction level. For every Map/Reduce the cache is reinitialized. If we are in a new query the cache gets evicted.
- Antal
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------
On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated nov. 6, 2018, 12:28 du)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/4/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.
>
> Antal Sinkovits wrote:
> MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, but the test code is really complex. And the eviction happens at the HivePairFlatMapFunction level. For every Map/Reduce the cache is reinitialized. If we are in a new query the cache gets evicted.
I've added a new test, to check this.
- Antal
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------
On nov. 7, 2018, 2:38 du, Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated nov. 7, 2018, 2:38 du)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java PRE-CREATION
> ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/5/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
-----------------------------------------------------------
Could we add some more E2E integration tests? I'm thinking they could at the granularity of a `MapJoinOperator`? For example, confirm that starting a new query actually evicts everything from the cache? We want to make sure we aren't accidentally leaking small tables.
- Sahil Takiar
On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated Oct. 10, 2018, 1:20 p.m.)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/3/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
> On okt. 16, 2018, 2:50 du, Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
> > Lines 131 (patched)
> > <https://reviews.apache.org/r/68474/diff/3/?file=2095733#file2095733line135>
> >
> > why do we run the action just for the l2 cache?
L2 contains all the elements from L1, so running through L2 is enough.
- Antal
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
-----------------------------------------------------------
On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated nov. 6, 2018, 12:28 du)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/4/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Sahil Takiar <ta...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment294162>
nit: if you want to leave the `@return` section empty, then just remove it entirely
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 127 (patched)
<https://reviews.apache.org/r/68474/#comment294163>
nit: same as above
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
Lines 178-190 (patched)
<https://reviews.apache.org/r/68474/#comment294161>
what about changing this to something like `getKey()` and just returning a `String`. I don't think the interface needs to be tied to reading data to a folder on HDFS.
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 131 (patched)
<https://reviews.apache.org/r/68474/#comment294165>
why do we run the action just for the l2 cache?
- Sahil Takiar
On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated Oct. 10, 2018, 1:20 p.m.)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/3/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------
(Updated okt. 10, 2018, 1:20 du)
Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
Summary (updated)
-----------------
HIVE-20440
Repository: hive-git
Description
-------
I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
Diffs (updated)
-----
ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java da1dd426c9155290e30fd1e3ae7f19a5479a8967
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java 9e65fd98d6e4451421641b1429ccf334fe9a9586
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java 54377428eafdb79e1bbdc8a182eafb46f8febd23
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 0e4b8df036724bd83e85fc3cc70f534272dab4c4
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 24b8fea33815867ce544fd284437c4d02a21f1a3
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java e8dcbf18cb09b190536f920a53d6e9fa870ce33b
ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
Diff: https://reviews.apache.org/r/68474/diff/3/
Changes: https://reviews.apache.org/r/68474/diff/2-3/
Testing
-------
Thanks,
Antal Sinkovits
Re: Review Request 68474: HIVE-20440: Create better cache eviction
policy for SmallTableCache
Posted by denys kuzmenko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review208793
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 94 (patched)
<https://reviews.apache.org/r/68474/#comment292990>
Just a remark, note from google documentation:
"Because of the performance implications of using soft references, we generally recommend using the more predictable maximum cache size instead."
- denys kuzmenko
On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> -----------------------------------------------------------
>
> (Updated Sept. 19, 2018, 11:14 p.m.)
>
>
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
>
>
> Diffs
> -----
>
> ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68474/diff/2/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Antal Sinkovits
>
>
Re: Review Request 68474: HIVE-20440: Create better cache eviction
policy for SmallTableCache
Posted by Antal Sinkovits via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
-----------------------------------------------------------
(Updated szept. 19, 2018, 11:14 du)
Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu Zhang.
Repository: hive-git
Description (updated)
-------
I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the intern-ed string of the path.
Diffs (updated)
-----
ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java cf27e92bafdc63096ec0fa8c3106657bab52f370
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 3293100af96dc60408c53065fa89143ead98f818
ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java PRE-CREATION
Diff: https://reviews.apache.org/r/68474/diff/2/
Changes: https://reviews.apache.org/r/68474/diff/1-2/
Testing
-------
Thanks,
Antal Sinkovits