You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Pradeep Kanchgar <Pr...@lntinfotech.com> on 2014/07/30 09:02:30 UTC

Row level operation in pig

Hi,
Below is sample table.
INPUT
id  col1    start     End          totaltime
1   1       9:13:22   9:43:11      00:30:11
2    0.5    9:23:22   9:43:11      00:19:49
1    1      9:45:20   10:45:11    ......
2   0.5     10:50:44 11:30:01    ......
Here "start" column is sorted one. I need to compare start of current row to end of previous row. If the condition is satisfied then the value in th column "col1" need to be added for both rows.
something like this.
col1
1.5
Is this row level operation possible in Pig, In SQL row level calculations are possible with self join, LAG and LEAD functions. what are the options in Pig, or else need to write UDF for this? This is a sample data, actual data is around 40 million rows.
Please suggest.
If UDF is the only way then, please give me some pointers on constructing one for above scenario?

Regards,
Pradeep Kanchgar

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"

Re: Row level operation in pig

Posted by Nitin Pawar <ni...@gmail.com>.
check windowing function usage

http://linuxandryan.wordpress.com/2014/03/17/windowing-functions-in-pig/


On Wed, Jul 30, 2014 at 12:32 PM, Pradeep Kanchgar <
Pradeep.Kanchgar@lntinfotech.com> wrote:

> Hi,
> Below is sample table.
> INPUT
> id  col1    start     End          totaltime
> 1   1       9:13:22   9:43:11      00:30:11
> 2    0.5    9:23:22   9:43:11      00:19:49
> 1    1      9:45:20   10:45:11    ......
> 2   0.5     10:50:44 11:30:01    ......
> Here "start" column is sorted one. I need to compare start of current row
> to end of previous row. If the condition is satisfied then the value in th
> column "col1" need to be added for both rows.
> something like this.
> col1
> 1.5
> Is this row level operation possible in Pig, In SQL row level calculations
> are possible with self join, LAG and LEAD functions. what are the options
> in Pig, or else need to write UDF for this? This is a sample data, actual
> data is around 40 million rows.
> Please suggest.
> If UDF is the only way then, please give me some pointers on constructing
> one for above scenario?
>
> Regards,
> Pradeep Kanchgar
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>



-- 
Nitin Pawar