You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2010/11/11 21:15:13 UTC

[jira] Created: (HIVE-1783) CommonJoinOperator optimize the case that 1:1 join

CommonJoinOperator optimize the case that 1:1 join
--------------------------------------------------

                 Key: HIVE-1783
                 URL: https://issues.apache.org/jira/browse/HIVE-1783
             Project: Hive
          Issue Type: Improvement
            Reporter: Siying Dong
            Assignee: Siying Dong
            Priority: Minor


CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
1. handle null cases for outer joins
2. handle the case of duplicated keys from one join party
We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933516#action_12933516 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

+1

running tests

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Attachment: HIVE-1783.4.patch

with hive.outerjoin.supports.filters=true and false;

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Attachment: HIVE-1783.1.patch

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933239#action_12933239 ] 

Siying Dong commented on HIVE-1783:
-----------------------------------

The problem is, with hive.outerjoin.supports.filters=false. Some code paths generating different kind of empty rows are not covered by the test.
With hive.outerjoin.supports.filters=true, although results don't seem quite right, we are sure with the patch we don't change any behavior.



> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Summary: CommonJoinOperator optimize the case of 1:1 join  (was: CommonJoinOperator optimize the case that 1:1 join)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1783:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Siying

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Status: Patch Available  (was: Open)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932570#action_12932570 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

Let us get it after HIVE-1642

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Status: Patch Available  (was: Open)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Attachment: HIVE-1783.3.patch

after previous patches.

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1783:
-----------------------------

    Status: Open  (was: Patch Available)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931306#action_12931306 ] 

He Yongqiang commented on HIVE-1783:
------------------------------------

namit, can you hold this for Liyin's patch?

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931328#action_12931328 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

Actually, on second thoughts, Siying, can you add more tests ?

set hive.join.emit.interval to a very small value (say 1), and add couple
of small data files to check all variants of join - single key followed by null followed
by single key. Try to get the coverage on all those 'if' conditions

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931311#action_12931311 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

sure - let me know when you are done

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Status: Patch Available  (was: Open)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933240#action_12933240 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

can we have the test with both ?


> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931303#action_12931303 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

+1

running tests

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1783:
------------------------------

    Attachment: HIVE-1783.2.patch

Added a unit test.
I ran the test with and without the patch applied and the test results are identical.
(the result doesn't seem to right though. Even it they are wrong, it is a totally separated issue)

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1783:
-----------------------------

    Status: Open  (was: Patch Available)

Can you refresh the patch ? HIVE-1642 has been committed, so this is good to go

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933232#action_12933232 ] 

Namit Jain commented on HIVE-1783:
----------------------------------

<property>
  <name>hive.outerjoin.supports.filters</name>
  <value>false</value>
  <description> Whether hive should correctly not push the filters for outer joins </description>
</property>

Can you set the above property to false,  and run the same tests as above - I mean double the tests 

> CommonJoinOperator optimize the case of 1:1 join
> ------------------------------------------------
>
>                 Key: HIVE-1783
>                 URL: https://issues.apache.org/jira/browse/HIVE-1783
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch
>
>
> CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
> 1. handle null cases for outer joins
> 2. handle the case of duplicated keys from one join party
> We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.