You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Rauan Maemirov <ra...@maemirov.com> on 2011/11/08 11:18:53 UTC

Working with date converter

Hi, all. I've got custom log (csv delimited by comma) with iso dates,
sometimes log writing lags and I'm having exceptions with wrong iso date
format.
Here's exception: https://gist.github.com/1347406. (Date is the last
"parameter" in the row, and it's incorrectly overwritten at the end by
another string).

The question is how can I filter out all wrong dates or at least force pig
to ignore them instead of failing?

Re: Working with date converter

Posted by pablomar <pa...@gmail.com>.
sorry for the delay !!!

it must be better option, but I wrote a simple loader, extending PigStorage
(I re-used/took a lot of code from PigStorage, specially its parse/split
method)
you need to complete the method 'process' to take the field/fields you need
to convert your date and then set the right field ( 0? )

to compile it, you have to put in your classpath pig-core.jar and
hadoop-code.jar
something like:

javac -cp /usr/lib/pig/pig-core.jar:/usr/lib/hadoop/hadoop-core.jar
myPackage/MyLoader.java

any doubt, just let me know

On Tue, Nov 8, 2011 at 7:31 AM, pablomar <pa...@gmail.com>wrote:

> sorry, I read custom log and I thought you have a custom loader
> you can extend PigStorage and do the field replacement in its putNext
> method
>
> I'll do an example later
>
> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> > Yes, you understand my task right. What is putNext? I'm new to pig, and
> > didn't customize udfs.
> >
> > 2011/11/8 pablomar <pa...@gmail.com>
> >
> >> sorry, I didn't understand completely
> >>
> >> do you want to read a line, if the date is invalid (performing a
> >> IsoToUnix directly and not a regex before) you want to skip it ? it
> >> that ?
> >> if yes, you can replace the field with your converted date (unix
> >> format), and if it fails put a null or nothing
> >>
> >> I mean, in your overridden putNext, you have you individual columns,
> >> you can try to convert the date in there and put in the output your
> >> unix date.
> >>
> >> sorry if I misunderstood again your problem
> >>
> >> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> >> > Sure, but now I'm just omiting the rows _after_ regex matching.
> >> > What I want to do is to avoid additional filtering by regex and ignore
> >> > invalid rows right after unsuccessful IsoToUnix().
> >> >
> >> > 2011/11/8 pablomar <pa...@gmail.com>
> >> >
> >> >> can you write something else (a null, for example) in your putNext
> >> >> method for that field when the date is invalid ?
> >> >>
> >> >> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> >> >> > Well, I solved this issue via regex matching, but I wonder if it's
> >> >> > too
> >> >> > costful.
> >> >> > Is there anyway the way to ignore exceptions and move on just by
> >> omiting
> >> >> > the wrong tuples?
> >> >> >
> >> >> > 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
> >> >> >
> >> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
> >> >> >> dates,
> >> >> >> sometimes log writing lags and I'm having exceptions with wrong
> iso
> >> >> >> date
> >> >> >> format.
> >> >> >> Here's exception: https://gist.github.com/1347406. (Date is the
> last
> >> >> >> "parameter" in the row, and it's incorrectly overwritten at the
> end
> >> by
> >> >> >> another string).
> >> >> >>
> >> >> >> The question is how can I filter out all wrong dates or at least
> >> force
> >> >> pig
> >> >> >> to ignore them instead of failing?
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Working with date converter

Posted by pablomar <pa...@gmail.com>.
sorry, I read custom log and I thought you have a custom loader
you can extend PigStorage and do the field replacement in its putNext method

I'll do an example later

On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> Yes, you understand my task right. What is putNext? I'm new to pig, and
> didn't customize udfs.
>
> 2011/11/8 pablomar <pa...@gmail.com>
>
>> sorry, I didn't understand completely
>>
>> do you want to read a line, if the date is invalid (performing a
>> IsoToUnix directly and not a regex before) you want to skip it ? it
>> that ?
>> if yes, you can replace the field with your converted date (unix
>> format), and if it fails put a null or nothing
>>
>> I mean, in your overridden putNext, you have you individual columns,
>> you can try to convert the date in there and put in the output your
>> unix date.
>>
>> sorry if I misunderstood again your problem
>>
>> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
>> > Sure, but now I'm just omiting the rows _after_ regex matching.
>> > What I want to do is to avoid additional filtering by regex and ignore
>> > invalid rows right after unsuccessful IsoToUnix().
>> >
>> > 2011/11/8 pablomar <pa...@gmail.com>
>> >
>> >> can you write something else (a null, for example) in your putNext
>> >> method for that field when the date is invalid ?
>> >>
>> >> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
>> >> > Well, I solved this issue via regex matching, but I wonder if it's
>> >> > too
>> >> > costful.
>> >> > Is there anyway the way to ignore exceptions and move on just by
>> omiting
>> >> > the wrong tuples?
>> >> >
>> >> > 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
>> >> >
>> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso
>> >> >> dates,
>> >> >> sometimes log writing lags and I'm having exceptions with wrong iso
>> >> >> date
>> >> >> format.
>> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last
>> >> >> "parameter" in the row, and it's incorrectly overwritten at the end
>> by
>> >> >> another string).
>> >> >>
>> >> >> The question is how can I filter out all wrong dates or at least
>> force
>> >> pig
>> >> >> to ignore them instead of failing?
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Working with date converter

Posted by Rauan Maemirov <ra...@maemirov.com>.
Yes, you understand my task right. What is putNext? I'm new to pig, and
didn't customize udfs.

2011/11/8 pablomar <pa...@gmail.com>

> sorry, I didn't understand completely
>
> do you want to read a line, if the date is invalid (performing a
> IsoToUnix directly and not a regex before) you want to skip it ? it
> that ?
> if yes, you can replace the field with your converted date (unix
> format), and if it fails put a null or nothing
>
> I mean, in your overridden putNext, you have you individual columns,
> you can try to convert the date in there and put in the output your
> unix date.
>
> sorry if I misunderstood again your problem
>
> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> > Sure, but now I'm just omiting the rows _after_ regex matching.
> > What I want to do is to avoid additional filtering by regex and ignore
> > invalid rows right after unsuccessful IsoToUnix().
> >
> > 2011/11/8 pablomar <pa...@gmail.com>
> >
> >> can you write something else (a null, for example) in your putNext
> >> method for that field when the date is invalid ?
> >>
> >> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> >> > Well, I solved this issue via regex matching, but I wonder if it's too
> >> > costful.
> >> > Is there anyway the way to ignore exceptions and move on just by
> omiting
> >> > the wrong tuples?
> >> >
> >> > 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
> >> >
> >> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
> >> >> sometimes log writing lags and I'm having exceptions with wrong iso
> >> >> date
> >> >> format.
> >> >> Here's exception: https://gist.github.com/1347406. (Date is the last
> >> >> "parameter" in the row, and it's incorrectly overwritten at the end
> by
> >> >> another string).
> >> >>
> >> >> The question is how can I filter out all wrong dates or at least
> force
> >> pig
> >> >> to ignore them instead of failing?
> >> >>
> >> >
> >>
> >
>

Re: Working with date converter

Posted by pablomar <pa...@gmail.com>.
sorry, I didn't understand completely

do you want to read a line, if the date is invalid (performing a
IsoToUnix directly and not a regex before) you want to skip it ? it
that ?
if yes, you can replace the field with your converted date (unix
format), and if it fails put a null or nothing

I mean, in your overridden putNext, you have you individual columns,
you can try to convert the date in there and put in the output your
unix date.

sorry if I misunderstood again your problem

On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> Sure, but now I'm just omiting the rows _after_ regex matching.
> What I want to do is to avoid additional filtering by regex and ignore
> invalid rows right after unsuccessful IsoToUnix().
>
> 2011/11/8 pablomar <pa...@gmail.com>
>
>> can you write something else (a null, for example) in your putNext
>> method for that field when the date is invalid ?
>>
>> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
>> > Well, I solved this issue via regex matching, but I wonder if it's too
>> > costful.
>> > Is there anyway the way to ignore exceptions and move on just by omiting
>> > the wrong tuples?
>> >
>> > 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
>> >
>> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
>> >> sometimes log writing lags and I'm having exceptions with wrong iso
>> >> date
>> >> format.
>> >> Here's exception: https://gist.github.com/1347406. (Date is the last
>> >> "parameter" in the row, and it's incorrectly overwritten at the end by
>> >> another string).
>> >>
>> >> The question is how can I filter out all wrong dates or at least force
>> pig
>> >> to ignore them instead of failing?
>> >>
>> >
>>
>

Re: Working with date converter

Posted by Rauan Maemirov <ra...@maemirov.com>.
Sure, but now I'm just omiting the rows _after_ regex matching.
What I want to do is to avoid additional filtering by regex and ignore
invalid rows right after unsuccessful IsoToUnix().

2011/11/8 pablomar <pa...@gmail.com>

> can you write something else (a null, for example) in your putNext
> method for that field when the date is invalid ?
>
> On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> > Well, I solved this issue via regex matching, but I wonder if it's too
> > costful.
> > Is there anyway the way to ignore exceptions and move on just by omiting
> > the wrong tuples?
> >
> > 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
> >
> >> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
> >> sometimes log writing lags and I'm having exceptions with wrong iso date
> >> format.
> >> Here's exception: https://gist.github.com/1347406. (Date is the last
> >> "parameter" in the row, and it's incorrectly overwritten at the end by
> >> another string).
> >>
> >> The question is how can I filter out all wrong dates or at least force
> pig
> >> to ignore them instead of failing?
> >>
> >
>

Re: Working with date converter

Posted by pablomar <pa...@gmail.com>.
can you write something else (a null, for example) in your putNext
method for that field when the date is invalid ?

On 11/8/11, Rauan Maemirov <ra...@maemirov.com> wrote:
> Well, I solved this issue via regex matching, but I wonder if it's too
> costful.
> Is there anyway the way to ignore exceptions and move on just by omiting
> the wrong tuples?
>
> 2011/11/8 Rauan Maemirov <ra...@maemirov.com>
>
>> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
>> sometimes log writing lags and I'm having exceptions with wrong iso date
>> format.
>> Here's exception: https://gist.github.com/1347406. (Date is the last
>> "parameter" in the row, and it's incorrectly overwritten at the end by
>> another string).
>>
>> The question is how can I filter out all wrong dates or at least force pig
>> to ignore them instead of failing?
>>
>

Re: Working with date converter

Posted by Rauan Maemirov <ra...@maemirov.com>.
Well, I solved this issue via regex matching, but I wonder if it's too
costful.
Is there anyway the way to ignore exceptions and move on just by omiting
the wrong tuples?

2011/11/8 Rauan Maemirov <ra...@maemirov.com>

> Hi, all. I've got custom log (csv delimited by comma) with iso dates,
> sometimes log writing lags and I'm having exceptions with wrong iso date
> format.
> Here's exception: https://gist.github.com/1347406. (Date is the last
> "parameter" in the row, and it's incorrectly overwritten at the end by
> another string).
>
> The question is how can I filter out all wrong dates or at least force pig
> to ignore them instead of failing?
>