You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/24 14:46:15 UTC

[GitHub] [hive] kasakrisz opened a new pull request #2013: HIVE-24820: MaterializedViewCache enables adding multiple entries of the same Materialization instance

kasakrisz opened a new pull request #2013:
URL: https://github.com/apache/hive/pull/2013


   ### What changes were proposed in this pull request?
   Replace `ConcurrentMap.compute` to `computeIfAbsent` in `MaterializedViewsCache.putIfAbsent`
   
   ### Why are the changes needed?
   Using `ConcurrentMap.compute` enabled adding multiple entries of the same Materialization instance to the sqlText -> materialization map.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestMaterializedViewsCache -pl ql
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz closed pull request #2013: HIVE-24820: MaterializedViewCache enables adding multiple entries of the same Materialization instance

Posted by GitBox <gi...@apache.org>.
kasakrisz closed pull request #2013:
URL: https://github.com/apache/hive/pull/2013


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #2013: HIVE-24820: MaterializedViewCache enables adding multiple entries of the same Materialization instance

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #2013:
URL: https://github.com/apache/hive/pull/2013#discussion_r583351336



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java
##########
@@ -80,52 +80,11 @@ public void putIfAbsent(Table materializedViewTable, HiveRelOptMaterialization m
     return dbMap;
   }
 
-  public void refresh(
-          Table oldMaterializedViewTable, Table materializedViewTable, HiveRelOptMaterialization newMaterialization) {
-    ConcurrentMap<String, HiveRelOptMaterialization> dbMap = ensureDbMap(materializedViewTable);
-
-    dbMap.compute(materializedViewTable.getTableName(), (mvTableName, existingMaterialization) -> {
-      List<HiveRelOptMaterialization> optMaterializationList = sqlToMaterializedView.computeIfAbsent(
-              materializedViewTable.getViewExpandedText(), s -> new ArrayList<>());
+  public void refresh(Table materializedViewTable, HiveRelOptMaterialization newMaterialization) {
+    remove(materializedViewTable.getDbName(), materializedViewTable.getTableName());

Review comment:
       What happens with concurrency in this case, e.g., an MV is refreshed concurrently by another user? I think that was the main reason to have the logic within a single operation.
   We were also checking whether the old version was actually the same and only replacing in that case, but we do not do that anymore. Shouldn't we keep that check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2013: HIVE-24820: MaterializedViewCache enables adding multiple entries of the same Materialization instance

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2013:
URL: https://github.com/apache/hive/pull/2013#discussion_r583517959



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java
##########
@@ -80,52 +80,11 @@ public void putIfAbsent(Table materializedViewTable, HiveRelOptMaterialization m
     return dbMap;
   }
 
-  public void refresh(
-          Table oldMaterializedViewTable, Table materializedViewTable, HiveRelOptMaterialization newMaterialization) {
-    ConcurrentMap<String, HiveRelOptMaterialization> dbMap = ensureDbMap(materializedViewTable);
-
-    dbMap.compute(materializedViewTable.getTableName(), (mvTableName, existingMaterialization) -> {
-      List<HiveRelOptMaterialization> optMaterializationList = sqlToMaterializedView.computeIfAbsent(
-              materializedViewTable.getViewExpandedText(), s -> new ArrayList<>());
+  public void refresh(Table materializedViewTable, HiveRelOptMaterialization newMaterialization) {
+    remove(materializedViewTable.getDbName(), materializedViewTable.getTableName());

Review comment:
       I updated the implementation of `refresh` to have the logic within a single operation.
   
   What is the reason of checking the old and the cached version are the same?
   
   Please consider the following usecase: we have three threads and two HS2 instances
   0. T1 and T2 are runing on the same HS2 instance Both are executing the same query which can be rewritten to scan the same mat1 materialized view which is already in the cache and up to date.
   1. Both threads generates the new plan. During plan generation both threads got a reference to the V1 version of mat1 form the cache.
   2. Parallelly a T3 thread rebuilds the MV from an other HS2 instance. -> mat1 has a V2 version in Metastore.
   3. T1 thread runs 'Hive.validateMaterializedViewsFromRegistry' and refresh the cache. It can do it because the T1 holds the V1 version of mat1 and the cache also has V1 version. After refresh the cache has V2 of mat1.
   4. T3 thread rebuilds the MV again from an other HS2 instance. -> mat1 has a V3 version in Metastore.
   5. T2 thread runs 'Hive.validateMaterializedViewsFromRegistry' and tries to refresh the cache. It can not do it because T2 holds the V1 version of mat1 and the cache has V2 version. -> The cache still has V2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on pull request #2013: HIVE-24820: MaterializedViewCache enables adding multiple entries of the same Materialization instance

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on pull request #2013:
URL: https://github.com/apache/hive/pull/2013#issuecomment-787769905


   Created a new patch with a different approach: #2030 
   Closing this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org