You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Bob Briski <bb...@raybeam.com> on 2012/07/03 02:37:46 UTC
How to find ranges between a field in a set of records
Hi,
I need to determine the number of days between dates on a running list
of records. The records associated with each key will be small (less
than 100) I should be able to do it in one reducer. The data would
look something like this:
Say the headers are:
player_id, date, other_stuff
values would be:
2, 6/1/2012 ...
2, 6/3/2012 ...
2, 6/10/2012 ...
I want to add the number of days between the this and the previous
record to get:
player_id, date, range, other_stuff
2,6/1/2012,NULL, ...
2,6/3/2012,2, ...
2,6/10/2012,7, ...
Is there an easy way to do this in PIG? If not, is it something that
can be handled with a UDF?
Thanks,
Bob
Re: How to find ranges between a field in a set of records
Posted by Jonathan Coveney <jc...@gmail.com>.
There is not a way to do this in straight pig, but it is easy with a UDF
(ideally an accumulative UDF though if there are <100 records per key it
doesn't really matter). You'll do a nested sort in a foreach block then
pass the dates to the UDF. The docs should have an example of this.
2012/7/2 Bob Briski <bb...@raybeam.com>
> Hi,
>
> I need to determine the number of days between dates on a running list
> of records. The records associated with each key will be small (less
> than 100) I should be able to do it in one reducer. The data would
> look something like this:
>
> Say the headers are:
> player_id, date, other_stuff
>
> values would be:
> 2, 6/1/2012 ...
> 2, 6/3/2012 ...
> 2, 6/10/2012 ...
>
> I want to add the number of days between the this and the previous
> record to get:
> player_id, date, range, other_stuff
>
> 2,6/1/2012,NULL, ...
> 2,6/3/2012,2, ...
> 2,6/10/2012,7, ...
>
> Is there an easy way to do this in PIG? If not, is it something that
> can be handled with a UDF?
>
> Thanks,
> Bob
>