You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Bob Briski <bb...@raybeam.com> on 2012/07/03 02:37:46 UTC

How to find ranges between a field in a set of records

Hi,

I need to determine the number of days between dates on a running list
of records.  The records associated with each key will be small (less
than 100) I should be able to do it in one reducer.  The data would
look something like this:

Say the headers are:
player_id, date, other_stuff

values would be:
2, 6/1/2012 ...
2, 6/3/2012 ...
2, 6/10/2012 ...

I want to add the number of days between the this and the previous
record to get:
player_id, date, range, other_stuff

2,6/1/2012,NULL, ...
2,6/3/2012,2, ...
2,6/10/2012,7, ...

Is there an easy way to do this in PIG?  If not, is it something that
can be handled with a UDF?

Thanks,
Bob

Re: How to find ranges between a field in a set of records

Posted by Jonathan Coveney <jc...@gmail.com>.
There is not a way to do this in straight pig, but it is easy with a UDF
(ideally an accumulative UDF though if there are <100 records per key it
doesn't really matter). You'll do a nested sort in a foreach block then
pass the dates to the UDF. The docs should have an example of this.

2012/7/2 Bob Briski <bb...@raybeam.com>

> Hi,
>
> I need to determine the number of days between dates on a running list
> of records.  The records associated with each key will be small (less
> than 100) I should be able to do it in one reducer.  The data would
> look something like this:
>
> Say the headers are:
> player_id, date, other_stuff
>
> values would be:
> 2, 6/1/2012 ...
> 2, 6/3/2012 ...
> 2, 6/10/2012 ...
>
> I want to add the number of days between the this and the previous
> record to get:
> player_id, date, range, other_stuff
>
> 2,6/1/2012,NULL, ...
> 2,6/3/2012,2, ...
> 2,6/10/2012,7, ...
>
> Is there an easy way to do this in PIG?  If not, is it something that
> can be handled with a UDF?
>
> Thanks,
> Bob
>