You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2010/11/11 21:15:13 UTC
[jira] Created: (HIVE-1783) CommonJoinOperator optimize the case
that 1:1 join
CommonJoinOperator optimize the case that 1:1 join
--------------------------------------------------
Key: HIVE-1783
URL: https://issues.apache.org/jira/browse/HIVE-1783
Project: Hive
Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
1. handle null cases for outer joins
2. handle the case of duplicated keys from one join party
We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933516#action_12933516 ]
Namit Jain commented on HIVE-1783:
----------------------------------
+1
running tests
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Attachment: HIVE-1783.4.patch
with hive.outerjoin.supports.filters=true and false;
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Attachment: HIVE-1783.1.patch
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933239#action_12933239 ]
Siying Dong commented on HIVE-1783:
-----------------------------------
The problem is, with hive.outerjoin.supports.filters=false. Some code paths generating different kind of empty rows are not covered by the test.
With hive.outerjoin.supports.filters=true, although results don't seem quite right, we are sure with the patch we don't change any behavior.
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Summary: CommonJoinOperator optimize the case of 1:1 join (was: CommonJoinOperator optimize the case that 1:1 join)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1783:
-----------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Siying
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Status: Patch Available (was: Open)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932570#action_12932570 ]
Namit Jain commented on HIVE-1783:
----------------------------------
Let us get it after HIVE-1642
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Status: Patch Available (was: Open)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Attachment: HIVE-1783.3.patch
after previous patches.
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1783:
-----------------------------
Status: Open (was: Patch Available)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931306#action_12931306 ]
He Yongqiang commented on HIVE-1783:
------------------------------------
namit, can you hold this for Liyin's patch?
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931328#action_12931328 ]
Namit Jain commented on HIVE-1783:
----------------------------------
Actually, on second thoughts, Siying, can you add more tests ?
set hive.join.emit.interval to a very small value (say 1), and add couple
of small data files to check all variants of join - single key followed by null followed
by single key. Try to get the coverage on all those 'if' conditions
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931311#action_12931311 ]
Namit Jain commented on HIVE-1783:
----------------------------------
sure - let me know when you are done
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Status: Patch Available (was: Open)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933240#action_12933240 ]
Namit Jain commented on HIVE-1783:
----------------------------------
can we have the test with both ?
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931303#action_12931303 ]
Namit Jain commented on HIVE-1783:
----------------------------------
+1
running tests
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1783:
------------------------------
Attachment: HIVE-1783.2.patch
Added a unit test.
I ran the test with and without the patch applied and the test results are identical.
(the result doesn't seem to right though. Even it they are wrong, it is a totally separated issue)
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of
1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1783:
-----------------------------
Status: Open (was: Patch Available)
Can you refresh the patch ? HIVE-1642 has been committed, so this is good to go
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case
of 1:1 join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933232#action_12933232 ]
Namit Jain commented on HIVE-1783:
----------------------------------
<property>
<name>hive.outerjoin.supports.filters</name>
<value>false</value>
<description> Whether hive should correctly not push the filters for outer joins </description>
</property>
Can you set the above property to false, and run the same tests as above - I mean double the tests
> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
> Key: HIVE-1783
> URL: https://issues.apache.org/jira/browse/HIVE-1783
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.