You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Santhosh Srinivasan <sm...@yahoo-inc.com> on 2008/10/21 00:29:08 UTC
Requirements for Pig Error Handling
Dear Users,
The requirements document for error handling in Pig is now published at:
http://wiki.apache.org/pig/PigErrorHandling
Please take a look and feel free to provide feedback.
Thanks,
Santhosh
RE: Requirements for Pig Error Handling
Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
After incorporating the feedback that I received, I have updated the
document. The final document is at:
http://wiki.apache.org/pig/PigErrorHandling
Thanks,
Santhosh
-----Original Message-----
From: Santhosh Srinivasan
Sent: Wednesday, October 22, 2008 6:17 PM
To: 'pig-user@incubator.apache.org'
Subject: RE: Requirements for Pig Error Handling
Hi Alan,
Thanks for the detailed comments.
1. I incorporated your comment. All error messages will have an error
code.
2. Its already mentioned in the section on Error Handling.
3. I have an example for the semantic error. Will add a runtime Hadoop
error.
4. Hadoop error will have different information indicating Hadoop as the
source.
5. I have added some examples to explain this point.
6. Only aggregation will be turned off. Probably we might want to add a
switch to turn off warnings completely.
Thanks,
Santhosh
-----Original Message-----
From: Alan Gates [mailto:gates@yahoo-inc.com]
Sent: Tuesday, October 21, 2008 11:48 AM
To: pig-user@incubator.apache.org
Subject: Re: Requirements for Pig Error Handling
Comments/questions:
1) "Error codes will be devised for common error messages". All
errors should have codes. We will probably need a catch all category
(like "internal error" or something). Giving all error messages
codes makes it much easier to write user manuals.
2) I think you are assuming that the stack traces etc. that is
currently output will be written to a log, but I don't see that spell
out. You mention that users are responsible for purging it. You
also need to specify where the log will be located.
3) A few explicit examples of how things will look would be helpful.
For example, if you showed a semantic error and a runtime hadoop
error, what was printed to the screen in each case, and what was
written to the log in each case.
4) How will errors from hadoop be shown differently than errors from
pig? Do you mean they'll have a different error code? Will they
contain different information? Will they be written to different
locations?
5) What does warning aggregation look like? Will the user get
something like: "This query had 500 warnings, see logs for details"
or will it be "The warning "divide by 0" was seen 498 times and the
warning "my udf flopped" was seen 2 times" (that is summary of all
warnings or summary by warning type)? Will the full warning info be
written to the logs, or only the summary?
6) When users turn off warning aggregation, does that mean that the
warnings are thrown away or that they are printed to the screen
individually? That is, does it turn off warnings or turn off
aggregation?
Alan.
On Oct 20, 2008, at 3:29 PM, Santhosh Srinivasan wrote:
> Dear Users,
>
> The requirements document for error handling in Pig is now
> published at:
> http://wiki.apache.org/pig/PigErrorHandling
> Please take a look and feel free to provide feedback.
>
> Thanks,
> Santhosh
Re: Requirements for Pig Error Handling
Posted by Alan Gates <ga...@yahoo-inc.com>.
Currently, in the types branch, we emit a null and continue. In
released code it is an error and the entire job comes to a stop.
Alan.
On Oct 23, 2008, at 3:21 PM, pi song wrote:
> How do we currently deal with runtime errors like "divide by 0" ?
> Do we skip
> the records or just redirect to error output file?
> Pi
> On Thu, Oct 23, 2008 at 1:16 PM, Santhosh Srinivasan <sms@yahoo-
> inc.com>wrote:
>
>> Hi Alan,
>>
>> Thanks for the detailed comments.
>>
>> 1. I incorporated your comment. All error messages will have an error
>> code.
>>
>> 2. Its already mentioned in the section on Error Handling.
>>
>> 3. I have an example for the semantic error. Will add a runtime
>> Hadoop
>> error.
>>
>> 4. Hadoop error will have different information indicating Hadoop
>> as the
>> source.
>>
>> 5. I have added some examples to explain this point.
>>
>> 6. Only aggregation will be turned off. Probably we might want to
>> add a
>> switch to turn off warnings completely.
>>
>> Thanks,
>> Santhosh
>>
>> -----Original Message-----
>> From: Alan Gates [mailto:gates@yahoo-inc.com]
>> Sent: Tuesday, October 21, 2008 11:48 AM
>> To: pig-user@incubator.apache.org
>> Subject: Re: Requirements for Pig Error Handling
>>
>> Comments/questions:
>>
>> 1) "Error codes will be devised for common error messages". All
>> errors should have codes. We will probably need a catch all category
>> (like "internal error" or something). Giving all error messages
>> codes makes it much easier to write user manuals.
>>
>> 2) I think you are assuming that the stack traces etc. that is
>> currently output will be written to a log, but I don't see that spell
>> out. You mention that users are responsible for purging it. You
>> also need to specify where the log will be located.
>>
>> 3) A few explicit examples of how things will look would be helpful.
>> For example, if you showed a semantic error and a runtime hadoop
>> error, what was printed to the screen in each case, and what was
>> written to the log in each case.
>>
>> 4) How will errors from hadoop be shown differently than errors from
>> pig? Do you mean they'll have a different error code? Will they
>> contain different information? Will they be written to different
>> locations?
>>
>> 5) What does warning aggregation look like? Will the user get
>> something like: "This query had 500 warnings, see logs for details"
>> or will it be "The warning "divide by 0" was seen 498 times and the
>> warning "my udf flopped" was seen 2 times" (that is summary of all
>> warnings or summary by warning type)? Will the full warning info be
>> written to the logs, or only the summary?
>>
>> 6) When users turn off warning aggregation, does that mean that the
>> warnings are thrown away or that they are printed to the screen
>> individually? That is, does it turn off warnings or turn off
>> aggregation?
>>
>> Alan.
>>
>> On Oct 20, 2008, at 3:29 PM, Santhosh Srinivasan wrote:
>>
>>> Dear Users,
>>>
>>> The requirements document for error handling in Pig is now
>>> published at:
>>> http://wiki.apache.org/pig/PigErrorHandling
>>> Please take a look and feel free to provide feedback.
>>>
>>> Thanks,
>>> Santhosh
>>
>>
Re: Requirements for Pig Error Handling
Posted by pi song <pi...@gmail.com>.
How do we currently deal with runtime errors like "divide by 0" ? Do we skip
the records or just redirect to error output file?
Pi
On Thu, Oct 23, 2008 at 1:16 PM, Santhosh Srinivasan <sm...@yahoo-inc.com>wrote:
> Hi Alan,
>
> Thanks for the detailed comments.
>
> 1. I incorporated your comment. All error messages will have an error
> code.
>
> 2. Its already mentioned in the section on Error Handling.
>
> 3. I have an example for the semantic error. Will add a runtime Hadoop
> error.
>
> 4. Hadoop error will have different information indicating Hadoop as the
> source.
>
> 5. I have added some examples to explain this point.
>
> 6. Only aggregation will be turned off. Probably we might want to add a
> switch to turn off warnings completely.
>
> Thanks,
> Santhosh
>
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Tuesday, October 21, 2008 11:48 AM
> To: pig-user@incubator.apache.org
> Subject: Re: Requirements for Pig Error Handling
>
> Comments/questions:
>
> 1) "Error codes will be devised for common error messages". All
> errors should have codes. We will probably need a catch all category
> (like "internal error" or something). Giving all error messages
> codes makes it much easier to write user manuals.
>
> 2) I think you are assuming that the stack traces etc. that is
> currently output will be written to a log, but I don't see that spell
> out. You mention that users are responsible for purging it. You
> also need to specify where the log will be located.
>
> 3) A few explicit examples of how things will look would be helpful.
> For example, if you showed a semantic error and a runtime hadoop
> error, what was printed to the screen in each case, and what was
> written to the log in each case.
>
> 4) How will errors from hadoop be shown differently than errors from
> pig? Do you mean they'll have a different error code? Will they
> contain different information? Will they be written to different
> locations?
>
> 5) What does warning aggregation look like? Will the user get
> something like: "This query had 500 warnings, see logs for details"
> or will it be "The warning "divide by 0" was seen 498 times and the
> warning "my udf flopped" was seen 2 times" (that is summary of all
> warnings or summary by warning type)? Will the full warning info be
> written to the logs, or only the summary?
>
> 6) When users turn off warning aggregation, does that mean that the
> warnings are thrown away or that they are printed to the screen
> individually? That is, does it turn off warnings or turn off
> aggregation?
>
> Alan.
>
> On Oct 20, 2008, at 3:29 PM, Santhosh Srinivasan wrote:
>
> > Dear Users,
> >
> > The requirements document for error handling in Pig is now
> > published at:
> > http://wiki.apache.org/pig/PigErrorHandling
> > Please take a look and feel free to provide feedback.
> >
> > Thanks,
> > Santhosh
>
>
RE: Requirements for Pig Error Handling
Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
Hi Alan,
Thanks for the detailed comments.
1. I incorporated your comment. All error messages will have an error
code.
2. Its already mentioned in the section on Error Handling.
3. I have an example for the semantic error. Will add a runtime Hadoop
error.
4. Hadoop error will have different information indicating Hadoop as the
source.
5. I have added some examples to explain this point.
6. Only aggregation will be turned off. Probably we might want to add a
switch to turn off warnings completely.
Thanks,
Santhosh
-----Original Message-----
From: Alan Gates [mailto:gates@yahoo-inc.com]
Sent: Tuesday, October 21, 2008 11:48 AM
To: pig-user@incubator.apache.org
Subject: Re: Requirements for Pig Error Handling
Comments/questions:
1) "Error codes will be devised for common error messages". All
errors should have codes. We will probably need a catch all category
(like "internal error" or something). Giving all error messages
codes makes it much easier to write user manuals.
2) I think you are assuming that the stack traces etc. that is
currently output will be written to a log, but I don't see that spell
out. You mention that users are responsible for purging it. You
also need to specify where the log will be located.
3) A few explicit examples of how things will look would be helpful.
For example, if you showed a semantic error and a runtime hadoop
error, what was printed to the screen in each case, and what was
written to the log in each case.
4) How will errors from hadoop be shown differently than errors from
pig? Do you mean they'll have a different error code? Will they
contain different information? Will they be written to different
locations?
5) What does warning aggregation look like? Will the user get
something like: "This query had 500 warnings, see logs for details"
or will it be "The warning "divide by 0" was seen 498 times and the
warning "my udf flopped" was seen 2 times" (that is summary of all
warnings or summary by warning type)? Will the full warning info be
written to the logs, or only the summary?
6) When users turn off warning aggregation, does that mean that the
warnings are thrown away or that they are printed to the screen
individually? That is, does it turn off warnings or turn off
aggregation?
Alan.
On Oct 20, 2008, at 3:29 PM, Santhosh Srinivasan wrote:
> Dear Users,
>
> The requirements document for error handling in Pig is now
> published at:
> http://wiki.apache.org/pig/PigErrorHandling
> Please take a look and feel free to provide feedback.
>
> Thanks,
> Santhosh
Re: Requirements for Pig Error Handling
Posted by Alan Gates <ga...@yahoo-inc.com>.
Comments/questions:
1) "Error codes will be devised for common error messages". All
errors should have codes. We will probably need a catch all category
(like "internal error" or something). Giving all error messages
codes makes it much easier to write user manuals.
2) I think you are assuming that the stack traces etc. that is
currently output will be written to a log, but I don't see that spell
out. You mention that users are responsible for purging it. You
also need to specify where the log will be located.
3) A few explicit examples of how things will look would be helpful.
For example, if you showed a semantic error and a runtime hadoop
error, what was printed to the screen in each case, and what was
written to the log in each case.
4) How will errors from hadoop be shown differently than errors from
pig? Do you mean they'll have a different error code? Will they
contain different information? Will they be written to different
locations?
5) What does warning aggregation look like? Will the user get
something like: "This query had 500 warnings, see logs for details"
or will it be "The warning "divide by 0" was seen 498 times and the
warning "my udf flopped" was seen 2 times" (that is summary of all
warnings or summary by warning type)? Will the full warning info be
written to the logs, or only the summary?
6) When users turn off warning aggregation, does that mean that the
warnings are thrown away or that they are printed to the screen
individually? That is, does it turn off warnings or turn off
aggregation?
Alan.
On Oct 20, 2008, at 3:29 PM, Santhosh Srinivasan wrote:
> Dear Users,
>
> The requirements document for error handling in Pig is now
> published at:
> http://wiki.apache.org/pig/PigErrorHandling
> Please take a look and feel free to provide feedback.
>
> Thanks,
> Santhosh