You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jeremy Hanna <je...@gmail.com> on 2011/06/25 18:56:39 UTC

Debugging pig scripts

I was just wondering if the following was a common scenario for others and whether things could be done in a more debug friendly way under the covers.

Currently we've found that developing with pig is enormously helpful because it's a scripting language that does a lot of the heavy lifting for us with joins and other constructs.  When trying to track down errors with Pig, we've noticed a pattern where it's easiest to find problems by looking at the data at various points - dumping output to see if something has gone awry.

We wondered if the following was possible or is possible but we haven't figured out how to do it yet:
- when outputting an error, give as much information as possible - like the row in the relation that caused the error with complete details about that row including values.  This would be extremely helpful to us and would help us track down problems much faster.
- output as much information about the part of the script where the error occurs.  I've seen in places where it outputs a line number of the script.  Would it be possible to output the complete line of the original script or is that information lost by runtime?

Anyway, just trying to see if there are ways to help with debugging.  I've yet to try 0.9 and Penny and am excited to - in the meantime, I may just be missing great functionality that's already in there.

Thanks!

Jeremy



Re: Debugging pig scripts

Posted by Jeremy Hanna <je...@gmail.com>.
Answering my own question. Penny with 0.9 does this. Wahoo :)

Thanks for telling me Ashutosh.

On Jun 25, 2011, at 9:56 AM, Jeremy Hanna <je...@gmail.com> wrote:

> I was just wondering if the following was a common scenario for others and whether things could be done in a more debug friendly way under the covers.
>
> Currently we've found that developing with pig is enormously helpful because it's a scripting language that does a lot of the heavy lifting for us with joins and other constructs.  When trying to track down errors with Pig, we've noticed a pattern where it's easiest to find problems by looking at the data at various points - dumping output to see if something has gone awry.
>
> We wondered if the following was possible or is possible but we haven't figured out how to do it yet:
> - when outputting an error, give as much information as possible - like the row in the relation that caused the error with complete details about that row including values.  This would be extremely helpful to us and would help us track down problems much faster.
> - output as much information about the part of the script where the error occurs.  I've seen in places where it outputs a line number of the script.  Would it be possible to output the complete line of the original script or is that information lost by runtime?
>
> Anyway, just trying to see if there are ways to help with debugging.  I've yet to try 0.9 and Penny and am excited to - in the meantime, I may just be missing great functionality that's already in there.
>
> Thanks!
>
> Jeremy
>
>