You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/02/24 19:45:30 UTC
[jira] Created: (HIVE-1194) sorted merge join
sorted merge join
-----------------
Key: HIVE-1194
URL: https://issues.apache.org/jira/browse/HIVE-1194
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
Fix For: 0.6.0
If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840877#action_12840877 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Reviewed with Yongqiang online -
MapJoinProcessor.java:convertMapJoin: Also check if the tables are sorted.
(check it later in SMBJoinOptimizer)
Add a negative test for the same.
Also, can you add a simple test with 2 buckets ?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841550#action_12841550 ]
Namit Jain commented on HIVE-1194:
----------------------------------
+1
looks good - will commit if the tests pass
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840463#action_12840463 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
hive-1194-2010-3-2.2.patch fixed a bug in outer joins with more than 2 tables.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-3-3-2.patch
a new one added the reportProgress
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841109#action_12841109 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Verified problem 2. above again - in the first query
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838191#action_12838191 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
Thanks Zheng. Yes, we should do that.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839932#action_12839932 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
yes, we can do that. there are two problems need to resolve;
(1) serialize and deserialize the mapping. We generate the mapping at compile time, and the operator instance is different then the one in runtime.
(2) the fetchOperators need to be accessed in SMBMapJoinOperator. need to pass these from exec-mapper to SMBMapJoinOperator
I just made a small changes,
i added a new method initializeLocalWork() in Operator. In exec-mapper, the mapoperator's initializeLocalWork() is called, and triggered all its children's initializeLocalWork().
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839922#action_12839922 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Had a quick comment - dont you need a operator->fetcoperator mapping in mapredlocalwork ?
currently, you are implicitly assuming that mapjoins are the only operators doing so.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-02-28.patch
for early review only.
I will test it more and add more testcases.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840812#action_12840812 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
will take a look now.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840806#action_12840806 ]
Namit Jain commented on HIVE-1194:
----------------------------------
PREHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key \
= b.key join smb_bucket_3 c on b.key=c.key
PREHOOK: type: QUERY
PREHOOK: Input: default@smb_bucket_2
PREHOOK: Input: default@smb_bucket_3
PREHOOK: Input: default@smb_bucket_1
PREHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03-\
02_16-29-05_320_5840475035790004401/10000
POSTHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key\
= b.key join smb_bucket_3 c on b.key=c.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket_2
POSTHOOK: Input: default@smb_bucket_3
POSTHOOK: Input: default@smb_bucket_1
POSTHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03\
-02_16-29-05_320_5840475035790004401/10000
Why is this giving a empty result ?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840807#action_12840807 ]
Namit Jain commented on HIVE-1194:
----------------------------------
PREHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key \
= b.key right outer join smb_bucket_3 c on b.key=c.key
PREHOOK: type: QUERY
PREHOOK: Input: default@smb_bucket_2
PREHOOK: Input: default@smb_bucket_3
PREHOOK: Input: default@smb_bucket_1
PREHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03-\
02_16-29-16_626_5515675647620051128/10000
POSTHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key\
= b.key right outer join smb_bucket_3 c on b.key=c.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket_2
POSTHOOK: Input: default@smb_bucket_3
POSTHOOK: Input: default@smb_bucket_1
POSTHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03\
-02_16-29-16_626_5515675647620051128/10000
NULL NULL NULL NULL 4 val_4
NULL NULL NULL NULL 10 val_10
NULL NULL NULL NULL 17 val_17
NULL NULL NULL NULL 19 val_19
NULL NULL NULL NULL 20 val_20
NULL NULL NULL NULL 23 val_23
Even this one looks wrong - can you take a look in detail ?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840805#action_12840805 ]
Namit Jain commented on HIVE-1194:
----------------------------------
POSTHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a left outer join smb_bucket_2 b on a.key \
= b.key full outer join smb_bucket_3 c on b.key=c.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket_2
POSTHOOK: Input: default@smb_bucket_3
POSTHOOK: Input: default@smb_bucket_1
POSTHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03\
-02_16-28-56_475_666795559542199348/10000
1 val_1 NULL NULL NULL NULL
3 val_3 NULL NULL NULL NULL
4 val_4 NULL NULL NULL NULL
NULL NULL NULL NULL 4 val_4
5 val_5 NULL NULL NULL NULL
10 val_10 NULL NULL NULL NULL
NULL NULL NULL NULL 10 val_10
NULL NULL NULL NULL 17 val_17
NULL NULL NULL NULL 19 val_19
NULL NULL NULL NULL 20 val_20
NULL NULL NULL NULL 23 val_23
same as above
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841414#action_12841414 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
@namit,
498's join results is in the results:
496 val_496 496 val_496
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
5 val_5 5 val_5
9 val_9 9 val_9
I will add a automatic check query in the test and upload a new one.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839935#action_12839935 ]
Namit Jain commented on HIVE-1194:
----------------------------------
There is a operator id which is unique - so the problem of different operator instance can be solved
Each operator will access its local work. Currently, only map join operators will need them.
MapJoinOperator will get the complete small table in the beginning, whereas SMBJoinOperator reads it
row by row.
ExecMapper does nothing
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-3-3.patch
A new patch integrates Namit and Siying's comments. Thanks Namit and Siying!
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840841#action_12840841 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
checked with mysql,
for query:
select /+mapjoin(a,b)/ * from smb_bucket_1 a left outer join smb_bucket_2 b on a.key \
= b.key left outer join smb_bucket_3 c on b.key=c.key
the result is consistent.
i did not check the second query because mysql does not support full outer join
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840802#action_12840802 ]
Namit Jain commented on HIVE-1194:
----------------------------------
smb_mapjoin4.q:
POSTHOOK: query: select /*+mapjoin(a,b)*/ * from smb_bucket_1 a left outer join smb_bucket_2 b on a.key \
= b.key left outer join smb_bucket_3 c on b.key=c.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket_2
POSTHOOK: Input: default@smb_bucket_3
POSTHOOK: Input: default@smb_bucket_1
POSTHOOK: Output: file:/Users/heyongqiang/Documents/workspace/Hive-Test/build/ql/scratchdir/hive_2010-03\
-02_16-28-42_346_3202067314016412424/10000
1 val_1 NULL NULL NULL NULL
3 val_3 NULL NULL NULL NULL
4 val_4 NULL NULL NULL NULL
5 val_5 NULL NULL NULL NULL
10 val_10 NULL NULL NULL NULL
I am not sure if the above semantics are correct - this may be a existing bug in the code, can you check the semantics of mysql and oracle ?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838113#action_12838113 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Based on a offline discussion with Yongqiang, we were thinking of the following:
There will be a new mapping in MapredWork ->
Operator -> MapredLocalWork
This will be populated for SortMergeJoinOperator only.
SortMergeJoinOperator is a new operator which extends MapJoinOperator, and has the
same name as a MapJoinOperator.
MapJoinProcessor needs to create a SortMergeJoinOperator instead of a MapJoinOperator
when it sees the new configuration parameter.
MapJoinFactory methods need to change to create Operator->MapredLocalWork instead of
MapredLocalWork in MapredWork.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838122#action_12838122 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
Yes. It does not need those storage.
The main reason of letting it extend mapjoinop is because with that we can reuse the code for mapjoinop doing optimization and task generation.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841442#action_12841442 ]
Namit Jain commented on HIVE-1194:
----------------------------------
I know - the log file is correct, but when I run the tests, I get a diff.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840999#action_12840999 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Need to report progress for sort-merge join
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840825#action_12840825 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
@namit, are u looking at patch hive-1194-2010-3-2.2.patch?
For the last two queries you mentioned above,
select /+mapjoin(a,b)/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key \
= b.key join smb_bucket_3 c on b.key=c.key
and
select /+mapjoin(a,b)/ * from smb_bucket_1 a right outer join smb_bucket_2 b on a.key \
= b.key right outer join smb_bucket_3 c on b.key=c.key
The results look good to me.
Results:
NULL NULL 20 val_20 20 val_20
NULL NULL 23 val_23 23 val_23
and
NULL NULL NULL NULL 4 val_4
NULL NULL NULL NULL 10 val_10
NULL NULL NULL NULL 17 val_17
NULL NULL NULL NULL 19 val_19
NULL NULL 20 val_20 20 val_20
NULL NULL 23 val_23 23 val_23
Will check oracle and mysql about the semantics of the first two queries you commented.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-3-2.2.patch
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840895#action_12840895 ]
Siying Dong commented on HIVE-1194:
-----------------------------------
Yongqiang, can you add a test case that the "big table" is generated from "select * from XXX where XXX" and make sure the 3-way joining query works well?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838130#action_12838130 ]
Namit Jain commented on HIVE-1194:
----------------------------------
A new optimization step will be created which will convert the mapjoin to a sortmergejoin
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838120#action_12838120 ]
Zheng Shao commented on HIVE-1194:
----------------------------------
Why does SortMergeJoinOperator extends MapJoinOperator?
It seems to me that SortMergeJoinOperator does NOTneed the in-memory/disk-backed HashMap that MapJoinOperator has, correct?
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838132#action_12838132 ]
Zheng Shao commented on HIVE-1194:
----------------------------------
If it does not inherit any methods, shall we add an AbstractMapJoinOperator as the common parent?
That AbstractMapJoinOperator can be converted to MapJoinOperator (or HashBasedMapJoinOperator, to be accurate) or SortMergeJoinOperator depending on the configuration/table properties.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841099#action_12841099 ]
Namit Jain commented on HIVE-1194:
----------------------------------
There is a problem in smb_mapjoin_6.q - the checked in results seem OK - but I am getting a diff.
Can you investigate ?
There are 2 problems:
1. order not deterministic.
2. Bigger problem: 498 missing from the results for the first query
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-3-2.patch
a new patch added more testcases and fixed some bugs.
@namit,
I agree, that will make the code more clear. can we do that in a followup jira, because it requires a code refactoring which may break existing mapjoin etc.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840829#action_12840829 ]
He Yongqiang commented on HIVE-1194:
------------------------------------
btw, i just checked the results without map join hints. The results are consistent.
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840969#action_12840969 ]
Siying Dong commented on HIVE-1194:
-----------------------------------
Turns out to be, we also need to support sub query for "small table" like:
select /* mapjoin(t) */ from (select * from a where ...) t join ....
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain resolved HIVE-1194.
------------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed. Thanks Yongqiang
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1194) sorted merge join
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1194:
-------------------------------
Attachment: hive-1194-2010-3-4.patch
attached a new patch
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838121#action_12838121 ]
Namit Jain commented on HIVE-1194:
----------------------------------
Yes, but it happens on the mapper. It is a special type of mapjoin.
It will end up overwriting all the functions of map-join, but keeping it this way keeps the hierarchy correct
> sorted merge join
> -----------------
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
> If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed map joins also.
> Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.