You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zhang Xinyu (JIRA)" <ji...@apache.org> on 2012/08/01 08:52:34 UTC

[jira] [Created] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Zhang Xinyu created HIVE-3326:
---------------------------------

             Summary: plan for multiple mapjoin followed by a normal join is wrong
                 Key: HIVE-3326
                 URL: https://issues.apache.org/jira/browse/HIVE-3326
             Project: Hive
          Issue Type: Bug
          Components: SQL
    Affects Versions: 0.8.1
         Environment: OS X 10.8; java 1.6.0_33
            Reporter: Zhang Xinyu


example queries:

create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;

in explain mode, I got this:

hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
  Stage-8 is a root stage
  Stage-2 depends on stages: Stage-8
  Stage-7 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-7
  Stage-1 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b
        <Not Important>
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        a
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
        <Not Important>
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
           file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        d
          TableScan

        file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

      Reduce Operator Tree:
      <Not Important>

You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
        if (oldMapJoin == null) {
          if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
              || local || (oldTask != null) && (parTasks != null)) {
            taskTmpDir = mjCtx.getTaskTmpDir();
            tt_desc = mjCtx.getTTDesc();
            rootOp = mjCtx.getRootMapJoinOp();
          }
        } else {
          GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
          assert oldMjCtx != null;
          taskTmpDir = oldMjCtx.getTaskTmpDir();
          tt_desc = oldMjCtx.getTTDesc();
          rootOp = oldMjCtx.getRootMapJoinOp();
        }
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427270#comment-13427270 ] 

Zhang Xinyu commented on HIVE-3326:
-----------------------------------

cool!Thx
                
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.8.1
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>         Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhang Xinyu updated HIVE-3326:
------------------------------

    Description: 
example queries:
{code}
create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
{code}
in explain mode, I got this:
{code}
hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
  Stage-8 is a root stage
  Stage-2 depends on stages: Stage-8
  Stage-7 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-7
  Stage-1 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b
        <Not Important>
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        a
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
        <Not Important>
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
           file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        d
          TableScan

        file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

      Reduce Operator Tree:
      <Not Important>
{code}
You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
  if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
      || local || (oldTask != null) && (parTasks != null)) {
    taskTmpDir = mjCtx.getTaskTmpDir();
    tt_desc = mjCtx.getTTDesc();
    rootOp = mjCtx.getRootMapJoinOp();
    }
} else {
  GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
  assert oldMjCtx != null;
  taskTmpDir = oldMjCtx.getTaskTmpDir();
  tt_desc = oldMjCtx.getTTDesc();
  rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

  was:
example queries:

create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;

in explain mode, I got this:

hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
  Stage-8 is a root stage
  Stage-2 depends on stages: Stage-8
  Stage-7 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-7
  Stage-1 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b
        <Not Important>
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        a
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
        <Not Important>
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
           file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        d
          TableScan

        file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

      Reduce Operator Tree:
      <Not Important>

You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
  if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
      || local || (oldTask != null) && (parTasks != null)) {
    taskTmpDir = mjCtx.getTaskTmpDir();
    tt_desc = mjCtx.getTTDesc();
    rootOp = mjCtx.getRootMapJoinOp();
    }
} else {
  GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
  assert oldMjCtx != null;
  taskTmpDir = oldMjCtx.getTaskTmpDir();
  tt_desc = oldMjCtx.getTTDesc();
  rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

    
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.8.1
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhang Xinyu updated HIVE-3326:
------------------------------

    Description: 
example queries:

create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;

in explain mode, I got this:

hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
  Stage-8 is a root stage
  Stage-2 depends on stages: Stage-8
  Stage-7 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-7
  Stage-1 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b
        <Not Important>
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        a
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
        <Not Important>
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
           file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        d
          TableScan

        file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

      Reduce Operator Tree:
      <Not Important>

You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
{code:title=GenMapRedUtils.java}
if (oldMapJoin == null) {
  if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
      || local || (oldTask != null) && (parTasks != null)) {
    taskTmpDir = mjCtx.getTaskTmpDir();
    tt_desc = mjCtx.getTTDesc();
    rootOp = mjCtx.getRootMapJoinOp();
    }
} else {
  GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
  assert oldMjCtx != null;
  taskTmpDir = oldMjCtx.getTaskTmpDir();
  tt_desc = oldMjCtx.getTTDesc();
  rootOp = oldMjCtx.getRootMapJoinOp();
}
{code}
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

  was:
example queries:

create table yudi(c1 int, c2 int, c3 int, c4 int);
create table wangmu(c1 int, c2 int, c3 int, c4 int);
select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;

in explain mode, I got this:

hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
OK
STAGE DEPENDENCIES:
  Stage-8 is a root stage
  Stage-2 depends on stages: Stage-8
  Stage-7 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-7
  Stage-1 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b
        <Not Important>
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        a
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        c
        <Not Important>
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
           file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
        <Not Important>
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        d
          TableScan

        file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

      Reduce Operator Tree:
      <Not Important>

You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
        if (oldMapJoin == null) {
          if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
              || local || (oldTask != null) && (parTasks != null)) {
            taskTmpDir = mjCtx.getTaskTmpDir();
            tt_desc = mjCtx.getTTDesc();
            rootOp = mjCtx.getRootMapJoinOp();
          }
        } else {
          GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
          assert oldMjCtx != null;
          taskTmpDir = oldMjCtx.getTaskTmpDir();
          tt_desc = oldMjCtx.getTTDesc();
          rootOp = oldMjCtx.getRootMapJoinOp();
        }
my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

    
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.8.1
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>
> example queries:
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> in explain mode, I got this:
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427154#comment-13427154 ] 

Navis commented on HIVE-3326:
-----------------------------

I think the condition,
{code}
if (oldMapJoin == null) {
{code}
should be changed to 
{code}
if (oldMapJoin == null || !opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(oldMapJoin)) {
{code}
                
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.8.1
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>         Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong

Posted by "Zhang Xinyu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhang Xinyu updated HIVE-3326:
------------------------------

    Attachment: patch.diff

this is my patch for this bug. I use an unused argument to control my logic whether goes into 'else' block
                
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.8.1
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>         Attachments: patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira