You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/08/19 00:06:17 UTC
[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
smb_mapjoin_8.q returns different results in miniMr mode
--------------------------------------------------------
Key: HIVE-1561
URL: https://issues.apache.org/jira/browse/HIVE-1561
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: Joydeep Sen Sarma
follow on to HIVE-1523:
ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
official results:
4 val_356 NULL NULL
NULL NULL 484 val_169
2000 val_169 NULL NULL
NULL NULL 3000 val_169
4000 val_125 NULL NULL
in minimr mode:
2000 val_169 NULL NULL
4 val_356 NULL NULL
2000 val_169 NULL NULL
4000 val_125 NULL NULL
NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang reassigned HIVE-1561:
----------------------------------
Assignee: He Yongqiang
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900178#action_12900178 ]
Amareshwari Sriramadasu commented on HIVE-1561:
-----------------------------------------------
When I tried SMB join on local machine (pseudo distributed mode) I'm seeing wrong results for the join. I think if there are more than one mapper, the join logic does not work correctly.
Here is my run:
{noformat}
hive> describe extended smb_input;
OK
key int
value int
Detailed Table Information Table(tableName:smb_input, dbName:default, owner:amarsri, createTime:1282026968, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:int, comment:null)], location:hdfs://localhost:19000/user/hive/warehouse/smb_input, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[key], sortCols:[Order(col:key, order:1)], parameters:{}), partitionKeys:[], parameters:{SORTBUCKETCOLSPREFIX=TRUE, transient_lastDdlTime=1282027032}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.05 seconds
hive> select * from smb_input;
OK
12 35
48 40
100 100
Time taken: 0.343 seconds
hive> set hive.optimize.bucketmapjoin = true;
hive> set hive.optimize.bucketmapjoin.sortedmerge = true;
hive> select /*+ MAPJOIN(a) */ * from smb_input a join smb_input b on a.key=b.key;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201008031340_0170, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201008031340_0170
Kill Command = /home/amarsri/workspace/Yahoo20/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:19101 -kill job_201008031340_0170
2010-08-19 11:04:00,040 Stage-1 map = 0%, reduce = 0%
2010-08-19 11:04:10,253 Stage-1 map = 50%, reduce = 0%
2010-08-19 11:04:13,271 Stage-1 map = 100%, reduce = 0%
2010-08-19 11:05:13,636 Stage-1 map = 100%, reduce = 0%
2010-08-19 11:05:19,664 Stage-1 map = 50%, reduce = 0%
2010-08-19 11:05:25,733 Stage-1 map = 100%, reduce = 0%
2010-08-19 11:05:28,762 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201008031340_0170
OK
12 35 12 35
48 40 48 40
Time taken: 100.056 seconds
Expected output:
12 35 12 35
48 40 48 40
100 100 100 100
{noformat}
The MapReduce Job launched for the join has 2 maps. Second map's first attempt (attempt_201008031340_0170_m_000001_0) fails with following expetion:
{noformat}
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: replace taskId from execContext
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_0
2010-08-19 11:04:07,196 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 finished. closing...
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 forwarded 5 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 5 Close done
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing...
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing...
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows
2010-08-19 11:05:08,656 ERROR ExecMapper: Hit error while closing operators - failing tree
2010-08-19 11:05:08,658 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:253)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:329)
at org.apache.hadoop.mapred.Child$4.run(Child.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021)
at org.apache.hadoop.mapred.Child.main(Child.java:213)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output to: hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:179)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:98)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:636)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:540)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:549)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:230)
... 8 more
{noformat}
And second attempt(attempt_201008031340_0170_m_000001_1) passes :
{noformat}
2010-08-19 11:05:21,384 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_1
2010-08-19 11:05:21,384 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_1
2010-08-19 11:05:21,385 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_1
2010-08-19 11:05:21,385 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_1
{noformat}
attempt_201008031340_0170_m_000000_0 passes and output goes to :
{noformat}
2010-08-19 11:04:06,198 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: new taskId: FS 000000_0
2010-08-19 11:04:06,198 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
2010-08-19 11:04:06,199 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.000000_0
2010-08-19 11:04:06,199 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/000000_0
{noformat}
I think the problem is both the attempt's attempt_201008031340_0170_m_000000_0 and attempt_201008031340_0170_m_000001_0 are trying to write to the same location. Also, though attempt_201008031340_0170_m_000001_1 writes to file 000000_1, it is not read?
Should attempt_201008031340_0170_m_000001_0 write to file 000001_0?
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900525#action_12900525 ]
Namit Jain commented on HIVE-1561:
----------------------------------
+1
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
> Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900138#action_12900138 ]
Namit Jain commented on HIVE-1561:
----------------------------------
Looked at the data in detail:
The tables should be:
smb_bucket4_1
4 v356
2000 v169
4000 v125
smb_bucket4_2
484 v169
3000 v169
5000 v125
So, the above query should result in 6 rows - both the results are wrong
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900384#action_12900384 ]
He Yongqiang commented on HIVE-1561:
------------------------------------
Amareshwari, did you use BucketizedHiveInputFormat for your query? SMBJoin can only work with BucketizedHiveInputFormat.
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1561:
-----------------------------
Status: Resolved (was: Patch Available)
Hadoop Flags: [Reviewed]
Resolution: Fixed
Committed. Thanks Yongqiang
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
> Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1561:
-------------------------------
Status: Patch Available (was: Open)
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
> Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900161#action_12900161 ]
He Yongqiang commented on HIVE-1561:
------------------------------------
This is the complete result from Hive's smb_mapjoin_8.q.out, it's correct:
{noformat}
POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@smb_bucket4_2
POSTHOOK: Input: default@smb_bucket4_1
POSTHOOK: Output: file:/tmp/jssarma/hive_2010-07-21_12-02-34_137_8141051139723931378/10000
POSTHOOK: Lineage: smb_bucket4_1.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_1.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.key SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.value SIMPLE [(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, comment:from deserializer), ]
4 val_356 NULL NULL
NULL NULL 484 val_169
2000 val_169 NULL NULL
NULL NULL 3000 val_169
4000 val_125 NULL NULL
NULL NULL 5000 val_125
{noformat}
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900176#action_12900176 ]
Namit Jain commented on HIVE-1561:
----------------------------------
My bad, I did not see the entire results - so, based on what Joy is saying, it does not work in minimr mode
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1561:
-------------------------------
Attachment: hive-1561.1.patch
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
> Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different
results in miniMr mode
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900577#action_12900577 ]
Amareshwari Sriramadasu commented on HIVE-1561:
-----------------------------------------------
bq. Amareshwari, did you use BucketizedHiveInputFormat for your query?
No, I did not use BucketizedHiveInputFormat. After using it, I see correct results now. Thanks.
> smb_mapjoin_8.q returns different results in miniMr mode
> --------------------------------------------------------
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: He Yongqiang
> Attachments: hive-1561.1.patch
>
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL NULL
> NULL NULL 484 val_169
> 2000 val_169 NULL NULL
> NULL NULL 3000 val_169
> 4000 val_125 NULL NULL
> in minimr mode:
> 2000 val_169 NULL NULL
> 4 val_356 NULL NULL
> 2000 val_169 NULL NULL
> 4000 val_125 NULL NULL
> NULL NULL 5000 val_125
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.