You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Eric Yang <er...@gmail.com> on 2011/01/01 21:19:30 UTC

Re: How to calculate delta in a column?

You are right in my example, there should be a timestamp column.
Thanks, I will look into writing the UDF.

regards,
Eric

On Fri, Dec 31, 2010 at 1:16 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> Can't without a way of ordering the data for the same key.
>
> If you do have a way to do this (a timestamp or some such), you can group by
> key, inside the foreach order the resulting group, and then run through a
> UDF (you can even make this udf accumulative).
>
> grouped = group data by key;
> deltas = foreach grouped {
>    ordered_tuples = order grouped by ordinal;
>    generate key, FLATTEN(calculateDeltas(ordered_tuples));
> }
>
>
> -D
>
>
> On Thu, Dec 30, 2010 at 10:12 PM, Eric Yang <er...@gmail.com> wrote:
>
>> Hi,
>>
>> What is the most efficient method to calculate delta of columns?  Consider
>> this:
>>
>> (key1, 1, 2, 3)
>> (key1, 2, 4, 5)
>> (key2, 1, 2, 4)
>> (key1, 3, 6, 9)
>> (key2, 2, 4, 6)
>>
>> The expected transformation output should look like this:
>>
>> (key1, 1, 2, 2)
>> (key1, 1, 2, 4)
>> (key2, 1, 2, 2)
>>
>> The idea is to group by f0, and compute f1 (current value) - f1
>> (previous value).  How to write this in pig?
>>
>> if there is a underflow value, it should reset to 0, for example:
>>
>> (key1, 1, 2, 3)
>> (key1, 0, 0, 0)
>> (key1, 2, 3, 4)
>>
>> The output should be:
>>
>> (key1, 0, 0, 0)
>> (key1, 2, 3, 4)
>>
>> I haven't been able to find a solution from google.  Anyone?
>>
>> regards,
>> Eric
>>
>