You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Adam Kramer (JIRA)" <ji...@apache.org> on 2009/05/17 23:35:45 UTC

[jira] Created: (HIVE-492) Allow for output inspection in realtime; perhaps in log files, but somewhere?

Allow for output inspection in realtime; perhaps in log files, but somewhere?
-----------------------------------------------------------------------------

                 Key: HIVE-492
                 URL: https://issues.apache.org/jira/browse/HIVE-492
             Project: Hadoop Hive
          Issue Type: Wish
          Components: Logging
            Reporter: Adam Kramer


Many queries take a long time to complete, and then fail (either because the job fails or because the output data is not what was desired).

This is almost always traceable to, of course, an error in a mapper or a reducer, which we can check or verify via multiple methods, most often running the query piece-by-piece and seeing where the "wrong" output is. This process is time-consuming and requires a decent amount of load on the system (e.g., repeating big queries while trying to debug transformers/syntax). This problem is a bigger deal when a single query uses multiple transforms and several mapreduce steps.

One way to potentially reduce the amount of overhead in debugging would be to provide actual output in some logging mechanism. Specifically, I mean to have EVERY mapper and/or reducer write the first five lines of output to some user-readable file. This would allow a user to see what each part of the system is doing, and to potentially locate, in ONE failed query statement, where the user error is.

Of course, 5 lines * 20000 mappers * 300 reducers is a lot of overhead; making this user-configurable and/or estimated beforehand (at least 5 lines from at least 5 mappers and at least 5 reducers) would be fine, as would making these output logs auto-delete after some timeframe (a day, perhaps).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.