You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/07/23 10:02:34 UTC

[jira] [Created] (HIVE-3289) sort merge join may not work silently

Namit Jain created HIVE-3289:
--------------------------------

             Summary: sort merge join may not work silently
                 Key: HIVE-3289
                 URL: https://issues.apache.org/jira/browse/HIVE-3289
             Project: Hive
          Issue Type: Bug
            Reporter: Namit Jain


The user does not know, if the sort-merge join is working or not.


create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
INTO 1 BUCKETS STORED AS RCFILE; 
create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
INTO 1 BUCKETS STORED AS RCFILE; 

set hive.enforce.sorting = true;

insert overwrite table table_asc select key, value from src;    
insert overwrite table table_desc select key, value from src;

set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

explain 
select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;

explain
select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;



In the above test, the sort-merge join is not obeyed as expected.
If you user explicitly asked for sort-merge join, and it is not being
obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3289:
-----------------------------

    Status: Patch Available  (was: Open)
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3289:
--------------------------------

    Affects Version/s: 0.10.0
        Fix Version/s: 0.10.0
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.10.0
>
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425330#comment-13425330 ] 

Kevin Wilfong commented on HIVE-3289:
-------------------------------------

Regarding the diff, I'm +1 on it, Carl?
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-3289:
---------------------------------

    Component/s: Query Processor
                 Configuration
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422819#comment-13422819 ] 

Carl Steinbach commented on HIVE-3289:
--------------------------------------

Two more points which are tangentially related:

* The patch is not attached to this ticket, and it looks like Phabricator stopped automatically attaching patches some time ago. Is anyone at Facebook looking into fixing this?
* Part of the agreement when we started using Phabricator was that the tool would automatically copy review comments back to JIRA. This feature hasn't worked in months, and unless it starts working soon I think we should stop using Phabricator and switch back to ReviewBoard. Is anyone looking into fixing this? If not we should probably just switch back now.
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422812#comment-13422812 ] 

Carl Steinbach commented on HIVE-3289:
--------------------------------------

-1

bq. I also am not a fan of hive.mapred.mode  If you turn it off, you may unintentionally turn off other checks, and it uses strict/nonstrict instead of true/false which is easier to validate. That's, at best, a problem for another JIRA, though, as it's fairly well established.

I agree with Kevin, but I don't think this should be postponed for another JIRA. Please add a new configuration property now instead of further overloading what is already ill-defined and poorly documented configuration property.

                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422992#comment-13422992 ] 

Namit Jain commented on HIVE-3289:
----------------------------------

I think, the discussion to use phabricator/review board/patch should be done on the dev mailing list, instead of this jira.
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3289:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed, thanks Namit.
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422808#comment-13422808 ] 

Kevin Wilfong commented on HIVE-3289:
-------------------------------------

+1 running tests
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427049#comment-13427049 ] 

Hudson commented on HIVE-3289:
------------------------------

Integrated in Hive-trunk-h0.21 #1584 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1584/])
    HIVE-3289. sort merge join may not work silently. (njain via kevinwilfong) (Revision 1368119)

     Result = FAILURE
kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368119
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
* /hive/trunk/ql/src/test/queries/clientnegative/sortmerge_mapjoin_mismatch_1.q
* /hive/trunk/ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out

                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.10.0
>
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-3289:
--------------------------------

    Assignee: Namit Jain
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-3289:
---------------------------------

    Release Note: This patch adds the configuration property 'hive.enforce.sortmergebucketmapjoin', which is set to false by default.
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3289:
-----------------------------

    Status: Patch Available  (was: Open)
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422812#comment-13422812 ] 

Carl Steinbach edited comment on HIVE-3289 at 7/26/12 1:32 AM:
---------------------------------------------------------------

-1

bq. I also am not a fan of hive.mapred.mode  If you turn it off, you may unintentionally turn off other checks, and it uses strict/nonstrict instead of true/false which is easier to validate. That's, at best, a problem for another JIRA, though, as it's fairly well established.

I agree with Kevin, but I don't think this should be postponed for another JIRA. Please add a new configuration property now instead of further overloading what is an already ill-defined and poorly documented configuration property.

                
      was (Author: cwsteinbach):
    -1

bq. I also am not a fan of hive.mapred.mode  If you turn it off, you may unintentionally turn off other checks, and it uses strict/nonstrict instead of true/false which is easier to validate. That's, at best, a problem for another JIRA, though, as it's fairly well established.

I agree with Kevin, but I don't think this should be postponed for another JIRA. Please add a new configuration property now instead of further overloading what is already ill-defined and poorly documented configuration property.

                  
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425413#comment-13425413 ] 

Carl Steinbach commented on HIVE-3289:
--------------------------------------

+1. Thanks for making these changes.

                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422988#comment-13422988 ] 

Namit Jain commented on HIVE-3289:
----------------------------------

https://reviews.facebook.net/D4377

Added a new conf. parameter
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-3289:
---------------------------------

    Status: Open  (was: Patch Available)
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3289:
-----------------------------

    Attachment: hive.3289.1.patch
    
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3289.1.patch
>
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3289) sort merge join may not work silently

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422026#comment-13422026 ] 

Namit Jain commented on HIVE-3289:
----------------------------------

https://reviews.facebook.net/D4179
                
> sort merge join may not work silently
> -------------------------------------
>
>                 Key: HIVE-3289
>                 URL: https://issues.apache.org/jira/browse/HIVE-3289
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> The user does not know, if the sort-merge join is working or not.
> create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc) 
> INTO 1 BUCKETS STORED AS RCFILE; 
> set hive.enforce.sorting = true;
> insert overwrite table table_asc select key, value from src;    
> insert overwrite table table_desc select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain 
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(a)*/ * from table_asc a join table_desc b on a.key = b.key;
> explain
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> select /*+mapjoin(b)*/ * from table_asc a join table_desc b on a.key = b.key;
> In the above test, the sort-merge join is not obeyed as expected.
> If you user explicitly asked for sort-merge join, and it is not being
> obeyed, the operation should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira