You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/11/03 22:09:26 UTC

[GitHub] [hive] scarlin-cloudera opened a new pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

scarlin-cloudera opened a new pull request #2763:
URL: https://github.com/apache/hive/pull/2763


   … a query
   
   RelOptHiveTable currently fetches the Table information for all
   referential constraint tables. However, it only needs to fetch the tables
   that are used in the query.
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Changing RelOptHiveTable to only fetch referential constraint tables that are used in the query
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   To improve compilation performance.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Regression testing (no new tests added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2763:
URL: https://github.com/apache/hive/pull/2763#discussion_r745425625



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java
##########
@@ -333,12 +360,20 @@ public boolean isKey(ImmutableBitSet columns) {
           parentTableQualifiedName.add(parentTableName);
           qualifiedName = parentTableName;
         }
+        if (!tablesCache.containsKey(qualifiedName)) {
+          // Table doesn't exist in the cache, so we don't need to track
+          // these referential constraints. But we do need to keep track
+          // of the table in case the tableCache gets populated later, though
+          // in theory, this should never happen based on how this is called.

Review comment:
       Trying to expect the unexpected is very hard to do :)
   Can't we roll some rules/conventions/etc and/or throw exception if that happens instead of trying to cover a case we don't even know will happen?
   It could probably make things more straight as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2763:
URL: https://github.com/apache/hive/pull/2763#discussion_r745746867



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java
##########
@@ -333,12 +360,20 @@ public boolean isKey(ImmutableBitSet columns) {
           parentTableQualifiedName.add(parentTableName);
           qualifiedName = parentTableName;
         }
+        if (!tablesCache.containsKey(qualifiedName)) {
+          // Table doesn't exist in the cache, so we don't need to track
+          // these referential constraints. But we do need to keep track
+          // of the table in case the tableCache gets populated later, though
+          // in theory, this should never happen based on how this is called.

Review comment:
       I was thinking about this a little more.  I'll have to review the code to see if it's possible, but I think I can at least put in one check (that isn't foolproof) that willl prevent a user calling for an incomplete list of referential constraints.  I can do this by ensuring that a second call retrieving them doesn't have any missing.  
   
   Of course, if the list is incomplete and a second call isn't made, then this won't be caught.
   
   But I think will add this shortly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk merged pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

Posted by GitBox <gi...@apache.org>.
kgyrtkirk merged pull request #2763:
URL: https://github.com/apache/hive/pull/2763


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2763:
URL: https://github.com/apache/hive/pull/2763#discussion_r747601749



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java
##########
@@ -333,12 +360,20 @@ public boolean isKey(ImmutableBitSet columns) {
           parentTableQualifiedName.add(parentTableName);
           qualifiedName = parentTableName;
         }
+        if (!tablesCache.containsKey(qualifiedName)) {
+          // Table doesn't exist in the cache, so we don't need to track
+          // these referential constraints. But we do need to keep track
+          // of the table in case the tableCache gets populated later, though
+          // in theory, this should never happen based on how this is called.

Review comment:
       Ok, in my most recent push, I created a hash map wrapper.  This wrapper allows the caller to mark the fact that all tables have been parsed.  
   
   The call for getting the referential constraints is now only allowed when the table map is marked as parsed so we can be assured that the list of tables used in the query is complete when fetching the referential constraints.  I think I did this with minimal intrusion on the original code, though some of that still does need rewriting.
   
   Thanks for the review comments and making me think about this a bit more!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2763: HIVE-25670: Avoid getTable() calls for foreign key tables not used in…

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2763:
URL: https://github.com/apache/hive/pull/2763#discussion_r745659469



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java
##########
@@ -333,12 +360,20 @@ public boolean isKey(ImmutableBitSet columns) {
           parentTableQualifiedName.add(parentTableName);
           qualifiedName = parentTableName;
         }
+        if (!tablesCache.containsKey(qualifiedName)) {
+          // Table doesn't exist in the cache, so we don't need to track
+          // these referential constraints. But we do need to keep track
+          // of the table in case the tableCache gets populated later, though
+          // in theory, this should never happen based on how this is called.

Review comment:
       Here's the reason why I did it this way:
   
   RelOptHiveTable takes in a hash map containing the table name with the table object.  In my perfect world of coding, this hash map would be immutable.  It makes coding much easier when the objects are static at initialization time.
   
   This is the proper way to code it, in my opinion.  That would make this test unnecessary.
   
   However, this hash map can't be immutable because it is being populated while each RelOptHiveTable is being constructed.  
   
   Once this code is rewritten, we can then remove this extra code.  But I think defensive coding here is best.  By coding it this way, the RelOptHiveTable will stay internally consistent no matter how it is called, while that table hash map stays mutable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org