You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2007/11/06 18:59:57 UTC
Re: Type spec -- what happens where there are cast errors during
load?
As before, this will be determined by the loader. I agree that it should
not be an error. You should not break an entire job over one bad row.
I haven't specified the appropriate behavior for PigLoader in this case.
My thinking is that the best solution is to emit a null and issue a
warning. If the user wants to throw the rows out he can add a filter
immediately after the load the removes rows will nulls. As a later
enhancement we could also allow the user to specify throwing rows with
bad conversions, though this could be tricky. Is it per column or any
column?
When I clarify the loaders in the type spec I'll put this in too.
Alan.
David (Ciemo) Ciemiewicz wrote:
>
> Alan,
>
> When reading files with load, what happens if a user tries to load a
> file that has string data in a field expected to be numeric?
>
> I couldn’t find it described in the spec.
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
>
> My concern is that this will throw an error. I don’t think this is an
> acceptable outcome.
>
> “Bad” data rows are inevitable.
>
> For some prior art - Oracle loader functions allow you to ignore these
> errant rows. They also permit logging the data row to an error file so
> that you can go back and diagnose whether there’s a bug or just a data
> error.
>
> I think it would be useful to control whether the data is discarded in
> the case of a cast failure or to opt to make the data NULL.
>
> --- Ciemo
>