You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/11/30 07:22:50 UTC

Pig9 will fail on bad schema specification, but in a difficult to debug way

In pig9, if you have a UDF which specifies its outputschema and that output
schema is wrong, then you with high probability will get an exception such
as:

java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
	at java.lang.Integer.compareTo(Integer.java:37)

Errors like this are rare, but didn't seem to come up in Pig8, but do
in Pig9 and the opaque error messages can be hard to read.

In this case, there was a UDF that said it was outputting a Long, but
was in fact outputting an Int. At some point, it tried to cast it over
and failed.

That said, I wonder if it might be possible to add a runtime check
that checks the output of say the first output of your EvalFunc, and
if the type does not match up with the declared OutputSchema, it will
give you a warning (I don't think it should fail, but it should at
least warn you to aid in debugging). I don't think this would be too
hard and would add minimal overhead (compared to the run time of a
job). We could optionally add a flag or something for a "strict" mode
viz. schema.

Related to this, when jobs die in opaque ways, I wonder if there might
be a way to give a clearer sense of where in the pipeline it dies? You
can check pig.alias and try to figure it out by where in the map or
reduce it was, but that's tough. I know that pipelining and
optimizations could make this tough, but having a clearer sense of
what's going on would help debugging along.

Thoughts?

Re: Pig9 will fail on bad schema specification, but in a difficult to debug way

Posted by Jonathan Coveney <jc...@gmail.com>.
Hmm, I tested it and it does exist in pig8. I must have been running a
fixed version.

I think the other point stands though...we can make it easier to understand
these sorts of problems.

2011/12/1 Daniel Dai <da...@hortonworks.com>

> Why the problem not exist in Pig 8?
>
> Daniel
>
> On Tue, Nov 29, 2011 at 10:22 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > In pig9, if you have a UDF which specifies its outputschema and that
> output
> > schema is wrong, then you with high probability will get an exception
> such
> > as:
> >
> > java.lang.ClassCastException: java.lang.Long cannot be cast to
> > java.lang.Integer
> >        at java.lang.Integer.compareTo(Integer.java:37)
> >
> > Errors like this are rare, but didn't seem to come up in Pig8, but do
> > in Pig9 and the opaque error messages can be hard to read.
> >
> > In this case, there was a UDF that said it was outputting a Long, but
> > was in fact outputting an Int. At some point, it tried to cast it over
> > and failed.
> >
> > That said, I wonder if it might be possible to add a runtime check
> > that checks the output of say the first output of your EvalFunc, and
> > if the type does not match up with the declared OutputSchema, it will
> > give you a warning (I don't think it should fail, but it should at
> > least warn you to aid in debugging). I don't think this would be too
> > hard and would add minimal overhead (compared to the run time of a
> > job). We could optionally add a flag or something for a "strict" mode
> > viz. schema.
> >
> > Related to this, when jobs die in opaque ways, I wonder if there might
> > be a way to give a clearer sense of where in the pipeline it dies? You
> > can check pig.alias and try to figure it out by where in the map or
> > reduce it was, but that's tough. I know that pipelining and
> > optimizations could make this tough, but having a clearer sense of
> > what's going on would help debugging along.
> >
> > Thoughts?
> >
>

Re: Pig9 will fail on bad schema specification, but in a difficult to debug way

Posted by Daniel Dai <da...@hortonworks.com>.
Why the problem not exist in Pig 8?

Daniel

On Tue, Nov 29, 2011 at 10:22 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> In pig9, if you have a UDF which specifies its outputschema and that output
> schema is wrong, then you with high probability will get an exception such
> as:
>
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> java.lang.Integer
>        at java.lang.Integer.compareTo(Integer.java:37)
>
> Errors like this are rare, but didn't seem to come up in Pig8, but do
> in Pig9 and the opaque error messages can be hard to read.
>
> In this case, there was a UDF that said it was outputting a Long, but
> was in fact outputting an Int. At some point, it tried to cast it over
> and failed.
>
> That said, I wonder if it might be possible to add a runtime check
> that checks the output of say the first output of your EvalFunc, and
> if the type does not match up with the declared OutputSchema, it will
> give you a warning (I don't think it should fail, but it should at
> least warn you to aid in debugging). I don't think this would be too
> hard and would add minimal overhead (compared to the run time of a
> job). We could optionally add a flag or something for a "strict" mode
> viz. schema.
>
> Related to this, when jobs die in opaque ways, I wonder if there might
> be a way to give a clearer sense of where in the pipeline it dies? You
> can check pig.alias and try to figure it out by where in the map or
> reduce it was, but that's tough. I know that pipelining and
> optimizations could make this tough, but having a clearer sense of
> what's going on would help debugging along.
>
> Thoughts?
>