You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/08/19 00:06:17 UTC

[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

smb_mapjoin_8.q returns different results in miniMr mode
--------------------------------------------------------

                 Key: HIVE-1561
                 URL: https://issues.apache.org/jira/browse/HIVE-1561
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


follow on to HIVE-1523:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key

official results:
4 val_356 NULL  NULL
NULL  NULL  484 val_169
2000  val_169 NULL  NULL
NULL  NULL  3000  val_169
4000  val_125 NULL  NULL


in minimr mode:
2000  val_169 NULL  NULL
4 val_356 NULL  NULL
2000  val_169 NULL  NULL
4000  val_125 NULL  NULL
NULL  NULL  5000  val_125


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang reassigned HIVE-1561:
----------------------------------

    Assignee: He Yongqiang

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900178#action_12900178 ] 

Amareshwari Sriramadasu commented on HIVE-1561:
-----------------------------------------------

When I tried SMB join on local machine (pseudo distributed mode) I'm seeing wrong results for the join. I think if there are more than one mapper, the join logic does not work correctly.
Here is my run:
{noformat}
hive> describe extended smb_input;
OK
key     int
value   int

Detailed Table Information      Table(tableName:smb_input, dbName:default, owner:amarsri, createTime:1282026968, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:int, comment:null)], location:hdfs://localhost:19000/user/hive/warehouse/smb_input, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[key], sortCols:[Order(col:key, order:1)], parameters:{}), partitionKeys:[], parameters:{SORTBUCKETCOLSPREFIX=TRUE, transient_lastDdlTime=1282027032}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.05 seconds

hive> select * from smb_input;
OK
12      35
48      40
100     100
Time taken: 0.343 seconds

hive> set hive.optimize.bucketmapjoin = true;
hive> set hive.optimize.bucketmapjoin.sortedmerge = true;

hive> select /*+ MAPJOIN(a) */ * from smb_input a join smb_input b on a.key=b.key;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201008031340_0170, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201008031340_0170
Kill Command = /home/amarsri/workspace/Yahoo20/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:19101 -kill job_201008031340_0170
2010-08-19 11:04:00,040 Stage-1 map = 0%,  reduce = 0%
2010-08-19 11:04:10,253 Stage-1 map = 50%,  reduce = 0%
2010-08-19 11:04:13,271 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:13,636 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:19,664 Stage-1 map = 50%,  reduce = 0%
2010-08-19 11:05:25,733 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:28,762 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201008031340_0170
OK
12      35      12      35
48      40      48      40
Time taken: 100.056 seconds

Expected output:
12      35      12      35
48      40      48      40
100     100     100     100
{noformat} 

The MapReduce Job launched for the join has 2 maps. Second map's first attempt (attempt_201008031340_0170_m_000001_0)  fails with following expetion:
{noformat}
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: replace taskId from execContext 
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_0
2010-08-19 11:04:07,196 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 forwarded 5 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 Close done
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows
2010-08-19 11:05:08,656 ERROR ExecMapper: Hit error while closing operators - failing tree
2010-08-19 11:05:08,658 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Hive Runtime Error while closing operators
	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:253)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:329)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:219)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021)
	at org.apache.hadoop.mapred.Child.main(Child.java:213)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output to: hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:179)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:98)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:636)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:540)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:230)
	... 8 more
{noformat}

And second attempt(attempt_201008031340_0170_m_000001_1) passes :
{noformat}
2010-08-19 11:05:21,384 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_1
2010-08-19 11:05:21,384 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_1
2010-08-19 11:05:21,385 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_1
2010-08-19 11:05:21,385 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_1
{noformat}

attempt_201008031340_0170_m_000000_0 passes and output goes to :
{noformat}
2010-08-19 11:04:06,198 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_0
2010-08-19 11:04:06,198 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:04:06,199 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_0
2010-08-19 11:04:06,199 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
{noformat}

I think the problem is both the attempt's attempt_201008031340_0170_m_000000_0 and attempt_201008031340_0170_m_000001_0 are trying to write to the same location. Also, though attempt_201008031340_0170_m_000001_1 writes to file 000000_1, it is not read? 
Should attempt_201008031340_0170_m_000001_0 write to file 000001_0?

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900525#action_12900525 ] 

Namit Jain commented on HIVE-1561:
----------------------------------

+1

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>         Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900138#action_12900138 ] 

Namit Jain commented on HIVE-1561:
----------------------------------

Looked at the data in detail:

The tables should be:

smb_bucket4_1 

4             v356
2000      v169
4000     v125

smb_bucket4_2

484      v169
3000    v169
5000    v125


So, the above query should result in 6 rows - both the results are wrong

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900384#action_12900384 ] 

He Yongqiang commented on HIVE-1561:
------------------------------------

Amareshwari, did you use BucketizedHiveInputFormat for your query? SMBJoin can only work with BucketizedHiveInputFormat.

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1561:
-----------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed. Thanks Yongqiang

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>         Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1561:
-------------------------------

    Status: Patch Available  (was: Open)

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>         Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900161#action_12900161 ] 

He Yongqiang commented on HIVE-1561:
------------------------------------

This is the complete result from Hive's smb_mapjoin_8.q.out, it's correct:
{noformat}
POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket4_2
POSTHOOK: Input: default@smb_bucket4_1
POSTHOOK: Output: file:/tmp/jssarma/hive_2010-07-21_12-02-34_137_8141051139723931378/10000
POSTHOOK: Lineage: smb_bucket4_1.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_1.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ]
4	val_356	NULL	NULL
NULL	NULL	484	val_169
2000	val_169	NULL	NULL
NULL	NULL	3000	val_169
4000	val_125	NULL	NULL
NULL	NULL	5000	val_125
{noformat}




> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900176#action_12900176 ] 

Namit Jain commented on HIVE-1561:
----------------------------------

My bad, I did not see the entire results - so, based on what Joy is saying, it does not work in minimr mode

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1561:
-------------------------------

    Attachment: hive-1561.1.patch

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>         Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900577#action_12900577 ] 

Amareshwari Sriramadasu commented on HIVE-1561:
-----------------------------------------------

bq. Amareshwari, did you use BucketizedHiveInputFormat for your query? 
No, I did not use BucketizedHiveInputFormat. After using it, I see correct results now. Thanks.

> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
>                 Key: HIVE-1561
>                 URL: https://issues.apache.org/jira/browse/HIVE-1561
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: He Yongqiang
>         Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.