You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kris Coward <kr...@melon.org> on 2011/02/03 21:51:25 UTC

Casting unclean data.

Hey there,

I've just started butting heads against a problem where I'm trying to
cast bytearrays in customer-provided data to integers. The overwhelming
majority of the time, we seem to get actual integers, but I just had a
job choke when one of these should-be-integers wasn't. Is there some
sort of "is a number" test that I could use to filter the data before
trying to cast it, or do I have to write a UDF or a little program to
stream the data through in order to get this sort of data cleaning.

Thanks,
Kris

-- 
Kris Coward					http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3

Re: Casting unclean data.

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
You have to write little checking udfs.

In the future you'll just use the error handling feature (see wiki) to
deal with this :)

D

On Thu, Feb 3, 2011 at 12:51 PM, Kris Coward <kr...@melon.org> wrote:
> Hey there,
>
> I've just started butting heads against a problem where I'm trying to
> cast bytearrays in customer-provided data to integers. The overwhelming
> majority of the time, we seem to get actual integers, but I just had a
> job choke when one of these should-be-integers wasn't. Is there some
> sort of "is a number" test that I could use to filter the data before
> trying to cast it, or do I have to write a UDF or a little program to
> stream the data through in order to get this sort of data cleaning.
>
> Thanks,
> Kris
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>