You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zhang Xinyu (JIRA)" <ji...@apache.org> on 2012/08/01 08:52:34 UTC
[jira] [Created] (HIVE-3326) plan for multiple mapjoin followed by
a normal join is wrong
Zhang Xinyu created HIVE-3326:
---------------------------------
Summary: plan for multiple mapjoin followed by a normal join is wrong
Key: HIVE-3326
URL: https://issues.apache.org/jira/browse/HIVE-3326
Project: Hive
Issue Type: Bug
Components: SQL
Affects Versions: 0.8.1
Environment: OS X 10.8; java 1.6.0_33
Reporter: Zhang Xinyu
example queries:
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
in explain mode, I got this:
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
b
<Not Important>
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
a
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
c
<Not Important>
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
d
TableScan
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator
Reduce Operator Tree:
<Not Important>
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null) && (parTasks != null)) {
taskTmpDir = mjCtx.getTaskTmpDir();
tt_desc = mjCtx.getTTDesc();
rootOp = mjCtx.getRootMapJoinOp();
}
} else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed
by a normal join is wrong
Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427270#comment-13427270 ]
Zhang Xinyu commented on HIVE-3326:
-----------------------------------
cool!Thx
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
> Key: HIVE-3326
> URL: https://issues.apache.org/jira/browse/HIVE-3326
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 0.8.1
> Environment: OS X 10.8; java 1.6.0_33
> Reporter: Zhang Xinyu
> Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
> Stage-8 is a root stage
> Stage-2 depends on stages: Stage-8
> Stage-7 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-7
> Stage-1 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-8
> Map Reduce Local Work
> Alias -> Map Local Tables:
> b
> <Not Important>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> a
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> c
> <Not Important>
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> d
> TableScan
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> Select Operator
> Reduce Operator Tree:
> <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
> if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
> || local || (oldTask != null) && (parTasks != null)) {
> taskTmpDir = mjCtx.getTaskTmpDir();
> tt_desc = mjCtx.getTTDesc();
> rootOp = mjCtx.getRootMapJoinOp();
> }
> } else {
> GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
> assert oldMjCtx != null;
> taskTmpDir = oldMjCtx.getTaskTmpDir();
> tt_desc = oldMjCtx.getTTDesc();
> rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by
a normal join is wrong
Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhang Xinyu updated HIVE-3326:
------------------------------
Description:
example queries:
{code}
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
{code}
in explain mode, I got this:
{code}
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
b
<Not Important>
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
a
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
c
<Not Important>
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
d
TableScan
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator
Reduce Operator Tree:
<Not Important>
{code}
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null) && (parTasks != null)) {
taskTmpDir = mjCtx.getTaskTmpDir();
tt_desc = mjCtx.getTTDesc();
rootOp = mjCtx.getRootMapJoinOp();
}
} else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
was:
example queries:
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
in explain mode, I got this:
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
b
<Not Important>
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
a
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
c
<Not Important>
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
d
TableScan
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator
Reduce Operator Tree:
<Not Important>
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null) && (parTasks != null)) {
taskTmpDir = mjCtx.getTaskTmpDir();
tt_desc = mjCtx.getTTDesc();
rootOp = mjCtx.getRootMapJoinOp();
}
} else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
> Key: HIVE-3326
> URL: https://issues.apache.org/jira/browse/HIVE-3326
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 0.8.1
> Environment: OS X 10.8; java 1.6.0_33
> Reporter: Zhang Xinyu
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
> Stage-8 is a root stage
> Stage-2 depends on stages: Stage-8
> Stage-7 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-7
> Stage-1 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-8
> Map Reduce Local Work
> Alias -> Map Local Tables:
> b
> <Not Important>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> a
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> c
> <Not Important>
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> d
> TableScan
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> Select Operator
> Reduce Operator Tree:
> <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
> if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
> || local || (oldTask != null) && (parTasks != null)) {
> taskTmpDir = mjCtx.getTaskTmpDir();
> tt_desc = mjCtx.getTTDesc();
> rootOp = mjCtx.getRootMapJoinOp();
> }
> } else {
> GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
> assert oldMjCtx != null;
> taskTmpDir = oldMjCtx.getTaskTmpDir();
> tt_desc = oldMjCtx.getTTDesc();
> rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by
a normal join is wrong
Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhang Xinyu updated HIVE-3326:
------------------------------
Description:
example queries:
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
in explain mode, I got this:
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
b
<Not Important>
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
a
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
c
<Not Important>
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
d
TableScan
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator
Reduce Operator Tree:
<Not Important>
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null) && (parTasks != null)) {
taskTmpDir = mjCtx.getTaskTmpDir();
tt_desc = mjCtx.getTTDesc();
rootOp = mjCtx.getRootMapJoinOp();
}
} else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
was:
example queries:
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
in explain mode, I got this:
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
b
<Not Important>
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
a
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
c
<Not Important>
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
<Not Important>
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
d
TableScan
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator
Reduce Operator Tree:
<Not Important>
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null) && (parTasks != null)) {
taskTmpDir = mjCtx.getTaskTmpDir();
tt_desc = mjCtx.getTTDesc();
rootOp = mjCtx.getRootMapJoinOp();
}
} else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
> Key: HIVE-3326
> URL: https://issues.apache.org/jira/browse/HIVE-3326
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 0.8.1
> Environment: OS X 10.8; java 1.6.0_33
> Reporter: Zhang Xinyu
>
> example queries:
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> in explain mode, I got this:
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
> Stage-8 is a root stage
> Stage-2 depends on stages: Stage-8
> Stage-7 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-7
> Stage-1 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-8
> Map Reduce Local Work
> Alias -> Map Local Tables:
> b
> <Not Important>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> a
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> c
> <Not Important>
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> d
> TableScan
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> Select Operator
> Reduce Operator Tree:
> <Not Important>
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
> if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
> || local || (oldTask != null) && (parTasks != null)) {
> taskTmpDir = mjCtx.getTaskTmpDir();
> tt_desc = mjCtx.getTTDesc();
> rootOp = mjCtx.getRootMapJoinOp();
> }
> } else {
> GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
> assert oldMjCtx != null;
> taskTmpDir = oldMjCtx.getTaskTmpDir();
> tt_desc = oldMjCtx.getTTDesc();
> rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed
by a normal join is wrong
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427154#comment-13427154 ]
Navis commented on HIVE-3326:
-----------------------------
I think the condition,
{code}
if (oldMapJoin == null) {
{code}
should be changed to
{code}
if (oldMapJoin == null || !opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(oldMapJoin)) {
{code}
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
> Key: HIVE-3326
> URL: https://issues.apache.org/jira/browse/HIVE-3326
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 0.8.1
> Environment: OS X 10.8; java 1.6.0_33
> Reporter: Zhang Xinyu
> Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
> Stage-8 is a root stage
> Stage-2 depends on stages: Stage-8
> Stage-7 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-7
> Stage-1 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-8
> Map Reduce Local Work
> Alias -> Map Local Tables:
> b
> <Not Important>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> a
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> c
> <Not Important>
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> d
> TableScan
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> Select Operator
> Reduce Operator Tree:
> <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
> if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
> || local || (oldTask != null) && (parTasks != null)) {
> taskTmpDir = mjCtx.getTaskTmpDir();
> tt_desc = mjCtx.getTTDesc();
> rootOp = mjCtx.getRootMapJoinOp();
> }
> } else {
> GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
> assert oldMjCtx != null;
> taskTmpDir = oldMjCtx.getTaskTmpDir();
> tt_desc = oldMjCtx.getTTDesc();
> rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by
a normal join is wrong
Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhang Xinyu updated HIVE-3326:
------------------------------
Attachment: patch.diff
this is my patch for this bug. I use an unused argument to control my logic whether goes into 'else' block
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
> Key: HIVE-3326
> URL: https://issues.apache.org/jira/browse/HIVE-3326
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 0.8.1
> Environment: OS X 10.8; java 1.6.0_33
> Reporter: Zhang Xinyu
> Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
> Stage-8 is a root stage
> Stage-2 depends on stages: Stage-8
> Stage-7 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-7
> Stage-1 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-8
> Map Reduce Local Work
> Alias -> Map Local Tables:
> b
> <Not Important>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> a
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> c
> <Not Important>
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> <Not Important>
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> d
> TableScan
> file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
> Select Operator
> Reduce Operator Tree:
> <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
> if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
> || local || (oldTask != null) && (parTasks != null)) {
> taskTmpDir = mjCtx.getTaskTmpDir();
> tt_desc = mjCtx.getTTDesc();
> rootOp = mjCtx.getRootMapJoinOp();
> }
> } else {
> GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
> assert oldMjCtx != null;
> taskTmpDir = oldMjCtx.getTaskTmpDir();
> tt_desc = oldMjCtx.getTTDesc();
> rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira