You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2007/11/06 18:59:57 UTC

Re: Type spec -- what happens where there are cast errors during load?

As before, this will be determined by the loader. I agree that it should 
not be an error. You should not break an entire job over one bad row.

I haven't specified the appropriate behavior for PigLoader in this case. 
My thinking is that the best solution is to emit a null and issue a 
warning. If the user wants to throw the rows out he can add a filter 
immediately after the load the removes rows will nulls. As a later 
enhancement we could also allow the user to specify throwing rows with 
bad conversions, though this could be tricky. Is it per column or any 
column?

When I clarify the loaders in the type spec I'll put this in too.

Alan.

David (Ciemo) Ciemiewicz wrote:
>
> Alan,
>
> When reading files with load, what happens if a user tries to load a 
> file that has string data in a field expected to be numeric?
>
> I couldn’t find it described in the spec. 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
>
> My concern is that this will throw an error. I don’t think this is an 
> acceptable outcome.
>
> “Bad” data rows are inevitable.
>
> For some prior art - Oracle loader functions allow you to ignore these 
> errant rows. They also permit logging the data row to an error file so 
> that you can go back and diagnose whether there’s a bug or just a data 
> error.
>
> I think it would be useful to control whether the data is discarded in 
> the case of a cast failure or to opt to make the data NULL.
>
> --- Ciemo
>