You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2011/08/31 02:33:09 UTC

[jira] [Created] (HIVE-2422) remove the intermediate dir of one hive query when it finish

remove the intermediate dir of one hive query when it finish 
-------------------------------------------------------------

                 Key: HIVE-2422
                 URL: https://issues.apache.org/jira/browse/HIVE-2422
             Project: Hive
          Issue Type: Bug
            Reporter: He Yongqiang


right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2422) remove the intermediate dir of one hive query when it finish

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang reassigned HIVE-2422:
----------------------------------

    Assignee: He Yongqiang

> remove the intermediate dir of one hive query when it finish 
> -------------------------------------------------------------
>
>                 Key: HIVE-2422
>                 URL: https://issues.apache.org/jira/browse/HIVE-2422
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2422) remove the intermediate dir when the hive query finish

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-2422:
-------------------------------

    Summary: remove the intermediate dir when the hive query finish   (was: remove the intermediate dir of one hive query when it finish )

> remove the intermediate dir when the hive query finish 
> -------------------------------------------------------
>
>                 Key: HIVE-2422
>                 URL: https://issues.apache.org/jira/browse/HIVE-2422
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2422) remove the intermediate dir when the hive query finish

Posted by "Anurag Tangri (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400063#comment-13400063 ] 

Anurag Tangri commented on HIVE-2422:
-------------------------------------

Looks like scratch did is not being cleaned at lot of locations. Another such location:

1. ExecDriver.java's execute() function.

Here, if it is created before launching a job and there is error in job launch, it is not cleaned in exception before returning :



    try {
      if (ctx == null) {
        ctx = new Context(job);
        ctxCreated = true;
      }

      emptyScratchDirStr = ctx.getMRTmpFileURI();
      emptyScratchDir = new Path(emptyScratchDirStr);
      FileSystem fs = emptyScratchDir.getFileSystem(job);
      fs.mkdirs(emptyScratchDir);
    } catch (IOException e) {
      e.printStackTrace();
      console.printError("Error launching map-reduce job", "\n"
          + org.apache.hadoop.util.StringUtils.stringifyException(e));
      return 5;
    }


Here, ctx.clear() needs to be called in exception.

-Anurag Tangri
                
> remove the intermediate dir when the hive query finish 
> -------------------------------------------------------
>
>                 Key: HIVE-2422
>                 URL: https://issues.apache.org/jira/browse/HIVE-2422
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2422) remove the intermediate dir when the hive query finish

Posted by "Priyadarshini (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160067#comment-13160067 ] 

Priyadarshini commented on HIVE-2422:
-------------------------------------

I have executed this query.

select a.rollNo,b.rollNo from student a join student b on a.rollNo=b.rollNo group by a.rollNo,b.rollNo;

The above query has spawned 2 MR jobs.
After the execution of the query, org.apache.hadoop.hive.ql.Context.clear() method is deleting the ScratchDir of the query.
                
> remove the intermediate dir when the hive query finish 
> -------------------------------------------------------
>
>                 Key: HIVE-2422
>                 URL: https://issues.apache.org/jira/browse/HIVE-2422
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>
> right now if one hive query got compiled to 2 mr jobs, and the first job's output feed the second job. When the query finish, the first job's output should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira