You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/11/25 07:48:48 UTC

[Pig Wiki] Update of "PigErrorHandlingFunctionalSpecification" by SanthoshSrinivasan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by SanthoshSrinivasan:
http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification

New page:
#format wiki
#language en

[[Navigation(children)]]
[[TableOfContents]]

This document describes the functional specification for the Error Handling feature in Pig.


== Error types and mechanism to handle errors ==

The [#cookbook cook book] discusses the classification of errors within Pig and proposes a guideline for exceptions that are to be used by developers. A reclassification of the errors is presented below.

=== Frontend errors ===
	The front-end consists of multiple components - parser, type checker, optimizer, translators, etc. These errors usually occur at the client side before the execution begins in Hadoop. All the errors from these components can be categorized as front-end errors. Components that are part of the front end will throw specific exceptions that capture the context. For example, the parser throws a `ParseException`, the type checker will throw a `TypeCheckerException`, the optimizer will throw a `LogicalOptimizerException`, etc.

=== Backend errors ===
	The execution pipeline, the operators that form the pipeline and the map reduce classes fall into the back-end. The errors that occur in the back-end are generally at run-time. Exceptions such as `ExecException` and `RunTimeException` fall into this category. These errors will be propagated to the user facing system and an appropriate error message indicating the source of the error will be displayed.

=== Internal errors ===
	Any error that is not reported via an explicit exception is indicative of a bug in the system. Such errors are flagged as internal errors and will be reported as possible bugs.

While the aforementioned errors describe a developer's viewpoint of errors, the user is interested in the source of the errors. A classification of the source of errors is given below.

   1. User Input - Sources of user input error are syntax error, semantic error, etc.
   2. Bug - An internal error in the Pig code and not related to the user's input
   3. User Environment - The client side environment
   4. Remote Environment - The Hadoop execution environment

== Error codes ==

Error codes are categorized into ranges depending on the nature of the error. The following table indicates the ranges for the error types in Pig.

|| '''Error type'''      	|| '''Range''' ||
|| User Input          	|| 1 - 149 ||
|| Bug             	|| 1 - 149 ||
|| User Environment     || 150 - 200 ||
|| Remote Environment   || 201 - 255 ||
|| Reserved for future use || 200 - 255 ||

Programmatic access via Java APIs can query if exceptions are retriable or not. For external processes that rely on the return code of the process, if an error is retriable the error code will be negative, if not it will be positive. 


== Additional command line switches ==

In order to support the ability to turn on/off warning message aggregation, log error messages to client side logs and specify the location of the client side log, the following switches will be added to and/or extended in Pig.

   1. -wagg to turn on warning aggregation; by default warning aggregation is turned off.
   2. -v to include printing error messages on screen; by default error messages will be written to client side log. Using -v will also print the messages on the screen
   3. -l directory where the client side log is stored; by default, logs will be stored in the current working directory and named pig_<pid>.log

== Requirement on UDF authors ==

In order to enable warning message aggregation, UDF authors should use Hadoop counters to report warnings. UDF authors should use Pig's warning enumeration constant UDF_WARNING in order to aggregate UDF warning messages.

'''Note'''

Java does not allow enum types to be extended. Due to this limitation, UDF_WARNING is used as a placeholder for aggregating UDF warning messages. As a result, all warning messages reported by UDFs are treated uniformly.

== References ==

 1. [[Anchor(cookbook)]] "Pig Developer Cookbook" October 21, 2008, http://wiki.apache.org/pig/PigDeveloperCookbook