You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pranjal rajput <fi...@gmail.com> on 2013/03/15 18:03:39 UTC
Aggregation for chronologically ordered dataset
Hi,
I am new to Pig.
I have a dataset from a time-tracker application.
It records the the time that users spend on various activities.
For example:
UserId | Activity | Tool | BeginTime | EndTime | DurationMinute
1 | development | tool1 | 10:00 | 10:15 | 15
1 | development | tool2 | 10:15 | 10:30 | 15
1 | other | tool3 | 10:30 | 11:00 | 30
1 | development | tool1 | 11:00 | 11:20 | 20
1 | other | tool4 | 11:20 | 12:00 | 40
1 | development | tool1 | 12:00 | 12:15 | 15
2 | other | tool3 | 10:00 | 11:00 | 60
2 | development | tool1 | 11:00 | 11:20 | 20
2 | development | tool2 | 11:20 | 11:30 | 10
I wish to find out, un-interrupted time slots spent on
Activity=development. like this:
UserId | Activity | SumDurationMinutes
1 | development | 30 /*notice tht two slots are summed*/
1 | other | 30
1 | development | 20
1 | other | 40
1 | development | 15
2 | other | 60
2 | development | 30 /*again sum*/
How can this be done in pig?
I am open to writing a UDF for the same, or any other work around.
Thanks in anticipation,
--
Best Regards
Pranjal Rajput
Re: Aggregation for chronologically ordered dataset
Posted by Vitalii Tymchyshyn <ti...@gmail.com>.
I'd use rank function to join previous and next row, then filter out middle
rows, then join first to last and calculate time.
15 бер. 2013 19:04, "pranjal rajput" <fi...@gmail.com> напис.
> Hi,
> I am new to Pig.
> I have a dataset from a time-tracker application.
> It records the the time that users spend on various activities.
> For example:
> UserId | Activity | Tool | BeginTime | EndTime | DurationMinute
> 1 | development | tool1 | 10:00 | 10:15 | 15
> 1 | development | tool2 | 10:15 | 10:30 | 15
> 1 | other | tool3 | 10:30 | 11:00 | 30
> 1 | development | tool1 | 11:00 | 11:20 | 20
> 1 | other | tool4 | 11:20 | 12:00 | 40
> 1 | development | tool1 | 12:00 | 12:15 | 15
> 2 | other | tool3 | 10:00 | 11:00 | 60
> 2 | development | tool1 | 11:00 | 11:20 | 20
> 2 | development | tool2 | 11:20 | 11:30 | 10
>
> I wish to find out, un-interrupted time slots spent on
> Activity=development. like this:
>
> UserId | Activity | SumDurationMinutes
> 1 | development | 30 /*notice tht two slots are summed*/
> 1 | other | 30
> 1 | development | 20
> 1 | other | 40
> 1 | development | 15
> 2 | other | 60
> 2 | development | 30 /*again sum*/
>
> How can this be done in pig?
> I am open to writing a UDF for the same, or any other work around.
> Thanks in anticipation,
>
> --
> Best Regards
> Pranjal Rajput
>
Re: Aggregation for chronologically ordered dataset
Posted by pranjal rajput <fi...@gmail.com>.
Hello everyone,
its like a local SUM operation.
any pointers, hints would be much appreciated.
let me know if any additional info is required.
thanks,
On Fri, Mar 15, 2013 at 10:33 PM, pranjal rajput <fighterjockey246@gmail.com
> wrote:
> Hi,
> I am new to Pig.
> I have a dataset from a time-tracker application.
> It records the the time that users spend on various activities.
> For example:
> UserId | Activity | Tool | BeginTime | EndTime | DurationMinute
> 1 | development | tool1 | 10:00 | 10:15 | 15
> 1 | development | tool2 | 10:15 | 10:30 | 15
> 1 | other | tool3 | 10:30 | 11:00 | 30
> 1 | development | tool1 | 11:00 | 11:20 | 20
> 1 | other | tool4 | 11:20 | 12:00 | 40
> 1 | development | tool1 | 12:00 | 12:15 | 15
> 2 | other | tool3 | 10:00 | 11:00 | 60
> 2 | development | tool1 | 11:00 | 11:20 | 20
> 2 | development | tool2 | 11:20 | 11:30 | 10
>
> I wish to find out, un-interrupted time slots spent on
> Activity=development. like this:
>
> UserId | Activity | SumDurationMinutes
> 1 | development | 30 /*notice tht two slots are summed*/
> 1 | other | 30
> 1 | development | 20
> 1 | other | 40
> 1 | development | 15
> 2 | other | 60
> 2 | development | 30 /*again sum*/
>
> How can this be done in pig?
> I am open to writing a UDF for the same, or any other work around.
> Thanks in anticipation,
>
> --
> Best Regards
> Pranjal Rajput
>
>
--
Best Regards
Pranjal Rajput
+91-81090-71747