You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Kevin Wilfong (JIRA)" <ji...@apache.org> on 2011/08/04 19:48:27 UTC

[jira] [Created] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Make Hadoop Job ID available after task finishes executing
----------------------------------------------------------

                 Key: HIVE-2347
                 URL: https://issues.apache.org/jira/browse/HIVE-2347
             Project: Hive
          Issue Type: Improvement
            Reporter: Kevin Wilfong
            Assignee: Kevin Wilfong


After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081088#comment-13081088 ] 

Kevin Wilfong commented on HIVE-2347:
-------------------------------------

It looks like it's just white space differences in the .q.out file.  The final diff in issue HIVE-1735 has the spacing that matches the output I see, but the commit, for some reason, does not.

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081357#comment-13081357 ] 

Kevin Wilfong commented on HIVE-2347:
-------------------------------------

The .out files for those two unit tests were updated today, and they are passing for me now.

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081079#comment-13081079 ] 

Kevin Wilfong commented on HIVE-2347:
-------------------------------------

I checked out a fresh copy of the source without my changes, and I'm still seeing those tests fail.

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079549#comment-13079549 ] 

jiraposter@reviews.apache.org commented on HIVE-2347:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1296/
-----------------------------------------------------------

Review request for hive and Ning Zhang.


Summary
-------

I added a field for the Hadoop Job ID to the Task class.  This will make it accessible to the Driver and hence to the hooks for logging/debugging purposes.  By including it in the Task, we only need to check that the type of the task is MAPRED, before getting the job ID.

I considered adding it to several places:

as separate fields in ExecDriver and BlockMergeTask: this would require duplicating code, require conditions to determine the type of a task and casting to either ExecDriver or BlockMergeTask in order to get the JobID from them

in the MapRedWork: this would require modifying a field in MapRedWork in the execute function, and I could not find a precedent for this


This addresses bug HIVE-2347.
    https://issues.apache.org/jira/browse/HIVE-2347


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1153966 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1153966 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1153966 

Diff: https://reviews.apache.org/r/1296/diff


Testing
-------

Ran the TestCliDriver and TestNegativeCliDriver test suites and verified they passed.

Also, created a sample post exec hook which simply logged the JobID for every map reduce task, and verified it.


Thanks,

Kevin



> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082010#comment-13082010 ] 

Hudson commented on HIVE-2347:
------------------------------

Integrated in Hive-trunk-h0.21 #884 (See [https://builds.apache.org/job/Hive-trunk-h0.21/884/])
    HIVE-2347. Make Hadoop Job ID available after task finish executing (Kevin Wilfong via Ning Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155493
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java


> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080289#comment-13080289 ] 

Ning Zhang commented on HIVE-2347:
----------------------------------

+1. Will commit if tests pass. 

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081041#comment-13081041 ] 

Ning Zhang commented on HIVE-2347:
----------------------------------

Kevin, there are 2 failed unit tests: udtf_explode.q and udf_explode.q, both are in TestCliDriver. Can you take a look?

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang resolved HIVE-2347.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0
     Hadoop Flags: [Reviewed]

Committed. Thanks Kevin!

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079548#comment-13079548 ] 

Kevin Wilfong commented on HIVE-2347:
-------------------------------------

https://reviews.apache.org/r/1296/

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2347) Make Hadoop Job ID available after task finishes executing

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-2347:
--------------------------------

    Attachment: HIVE-2347.1.patch.txt

> Make Hadoop Job ID available after task finishes executing
> ----------------------------------------------------------
>
>                 Key: HIVE-2347
>                 URL: https://issues.apache.org/jira/browse/HIVE-2347
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2347.1.patch.txt
>
>
> After Map Reduce tasks finish the execute method (ExecDriver and BlockMergeTask) the Hadoop Job ID is inaccessible to the Driver, and hence the hooks it runs.  Expose this information could help to improve logging, debugging, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira