You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Ambastha, Abhishek" <aa...@transunion.com> on 2013/09/10 07:07:17 UTC

Create Table + Join + Max 'String' Date

Hi,

I want to write a pig query for the below SQL:

                CREATE TABLE MAXDATE AS
                SELECT SERIAL_NUM, MAX(REPORTING_DT) AS REPORTING_DT
                FROM Table_1
                WHERE REPORTING_DT <= '01APR2013' AND
                SERIAL_NUM IN (SELECT SERIAL_NUM FROM TABLE_2) AND
                SERIAL_NUM NOT IN (SELECT SERIAL_NUM FROM TABLE_3)
                GROUP BY SERIAL_NUM;

REPORTING_DT is a string but I want Maximum Date. Probably, I could use CustomFormatToISO for Date conversion and get the Maximum Value.

Join is on Table_1, Table_2 and Table_3.

Please suggest how to do this.

Regards,
Abhishek


Re: Create Table + Join + Max 'String' Date

Posted by Jacob Perkins <ja...@gmail.com>.
Abhishek,

The cogroup operator and a filter should get you what you want:

--
-- First, filter table1
--
t1_filtered = filter table1 by reporting_dt <= '01APR2013';

--
-- Cogroup, this results in 'bags' you can check
--
grouped = cogroup
            t1_filtered by serial_num,
            table2        by serial_num,
            table3        by serial_num;

--
-- Do outer join and max in same projection
--
maxdate = foreach (filter grouped by not IsEmpty(table2) and IsEmpty(table3)) generate
            group as serial_num,
            MAX(t1_filtered.reporting_dt) as reporting_dt;


Note that I haven't actually ran this yet but it's pretty straightforward pig.

--jacob
@thedatachef

On Sep 10, 2013, at 12:07 AM, "Ambastha, Abhishek" <aa...@transunion.com> wrote:

> Hi,
> 
> I want to write a pig query for the below SQL:
> 
>                CREATE TABLE MAXDATE AS
>                SELECT SERIAL_NUM, MAX(REPORTING_DT) AS REPORTING_DT
>                FROM Table_1
>                WHERE REPORTING_DT <= '01APR2013' AND
>                SERIAL_NUM IN (SELECT SERIAL_NUM FROM TABLE_2) AND
>                SERIAL_NUM NOT IN (SELECT SERIAL_NUM FROM TABLE_3)
>                GROUP BY SERIAL_NUM;
> 
> REPORTING_DT is a string but I want Maximum Date. Probably, I could use CustomFormatToISO for Date conversion and get the Maximum Value.
> 
> Join is on Table_1, Table_2 and Table_3.
> 
> Please suggest how to do this.
> 
> Regards,
> Abhishek
>