You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Avram Aelony <Av...@eharmony.com> on 2009/02/19 00:19:53 UTC

date treatment & date level aggregations

Hello,

I have a question regarding treatment of dates with PIG.  

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss' format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I want to aggregate to day-level relying on extracting the date portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I have been experimenting with the tokenize function but I am unclear how to accomplish an aggregation by date.  

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
describe A;
B = foreach A generate group, tokenize(A.v2) as (date,time); --fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3);
dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
        at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: group in A: (v1, v2, v3, v4 )
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
        at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more

2009-02-18 15:11:44,279 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

RE: date treatment & date level aggregations

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Use TOKENIZE instead of tokenize (the name is case sensitive).


-----Original Message-----
From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
Sent: Thursday, February 19, 2009 10:51 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Unfortunately, step B of the solution you proposed fails for me.  Any
thoughts on how to remedy?


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
2009-02-19 10:47:11,142 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot
instantiate:tokenize
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
ser.java:233)
        at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Cannot instantiate:tokenize
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryPa
rser.java:2818)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
rser.java:2354)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
rser.java:2230)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
r.java:2175)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
ueryParser.java:2106)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
rser.java:2038)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
r.java:2006)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
m(QueryParser.java:1955)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
mList(QueryParser.java:1894)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
eryParser.java:1862)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
ser.java:1604)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
arser.java:1569)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
.java:711)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
a:512)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
va:362)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
lder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more
Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
        at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
        at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:5
06)
        at
org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:
528)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryPa
rser.java:2815)
        ... 21 more
Caused by: java.io.IOException: Could not resolve tokenize using
imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
org.apache.pig.impl.builtin.]
        at
org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java
:34)
        at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
        at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
        ... 24 more
Caused by: java.lang.ClassNotFoundException: Could not resolve tokenize
using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
org.apache.pig.impl.builtin.]
        at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
        ... 25 more

2009-02-19 10:47:11,143 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot
instantiate:tokenize
grunt>


thanks,
Avram


-----Original Message-----
From: Alan Gates [mailto:gates@yahoo-inc.com] 
Sent: Thursday, February 19, 2009 9:49 AM
To: pig-user@hadoop.apache.org
Subject: Re: date treatment & date level aggregations

Date is not a separate type in pig.

If you want to group on date, I think what you want is this:

A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate tokenize(A.v2) as (date,time), v3;
C = foreach B generate date, v3;
D = group C by date;
E = foreach D generate group, SUM(C.v3);
dump E;

This script will first tokenize the datestamp into date and time, then  
project just the date and data you're going to sum, and then do the  
grouping.

Alan.

On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

> Hello,
>
> I have a question regarding treatment of dates with PIG.
>
> My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I  
> want to aggregate to day-level relying on extracting the date  
> portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I  
> have been experimenting with the tokenize function but I am unclear  
> how to accomplish an aggregation by date.
>
> What am I doing wrong? How can I get a date-level aggregation?
> Is there a 'Date' data type?
>
>
> Here are the details:
>
>
> Input Data:
>
> 4,20090201 23:59:56,8,1
> 3,20090202 23:59:56,101,1
> 4,20090201 23:59:56,114,1
> 5,20090202 23:59:56,29,1
>
> Desired Output:
> 20090201, 122
> 20090202, 130
>
> --My attempt in Pig:
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> describe A;
> B = foreach A generate group, tokenize(A.v2) as (date,time); --fails  
> here.
> describe B;
> C = group B by B.date;
> describe C;
> D = foreach C generate B.date, SUM(A.v3);
> dump D;
>
>
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> 2009-02-18 15:11:44,278 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>        at  
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> 475)
>        at  
> org 
> .apache 
> .pig 
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> 233)
>        at  
> org 
> .apache 
> .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>        at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> 3301)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> 3225)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> 2236)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> 2175)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> 2106)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> 2038)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> 2006)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> 1862)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> 1604)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> 1569)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> 711)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> 47)
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>        ... 5 more
>
> 2009-02-18 15:11:44,279 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
> grunt>
>
>
> Thanks in advance,
> Avram


RE: date treatment & date level aggregations

Posted by Avram Aelony <Av...@eharmony.com>.
Hi Olga,

Thanks for your message.  I will have fairly particular needs, so I will take a leap into learning what it takes to develop needed UDFs.  If it works out well and they are generic enough that they might be useful to others, I will see if I can get authorization to contribute back to piggybank.

-Avram


-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
Sent: Thursday, February 19, 2009 11:28 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

TOKENIZE is not broken. It has particular semantics that might not work
for this query but are used in other contexts.

If a function with different semantics is needed, it can be written and
contributed to piggybank.

Olga

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> Sent: Thursday, February 19, 2009 11:08 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
>
> Thanks for identifying that the TOKENIZE builtin needs a fix
> and filing the bug report.
> I should have noted in my original email that I had tried
> uppercase and that uppercase had also failed.
>
> Thanks for everyone's help & I look forward to the fix.
>
> Regards,
> -Avram
>
>
> -----Original Message-----
> From: Santhosh Srinivasan [mailto:sms@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 11:01 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
>
> Hi Avram,
>
> A few things to note:
>
> 1. The builtin functions in Pig are Java UDFs, making them
> case sensitive. You should use TOKENIZE instead of tokenize
> 2. It looks like the builtin TOKENIZE has to be fixed to
> support your current usage. I have a filed a bug report to
> track this : PIG-683
> (https://issues.apache.org/jira/browse/PIG-683)
>
> When PIG-683 is fixed, you should then be able to do the following:
>
>
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> B = foreach A generate flatten(TOKENIZE(v2)) as (date,time),
> v3; C = foreach B generate date, v3; D = group C by date; E =
> foreach D generate group, SUM(C.v3); dump E;
>
> Thanks,
> Santhosh
>
> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> Sent: Thursday, February 19, 2009 10:59 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
>
> I tried the capitalized version, that still leads to an
> error. Now it appears to be a problem with the alias.
>
>
>
> grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
> 2009-02-19 10:56:05,075 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
> alias: A in A: (v1, v2, v3, v4 )
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptPar
> ser.java:233)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java
> :91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
> Invalid alias: A in A: (v1, v2, v3, v4 )
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasField
> OrSpec(Que
> ryParser.java:3301)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(
> QueryParse
> r.java:3225)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryPa
> rser.java:2236)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParse
> r.java:2175)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(Q
> ueryParser.java:2106)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryPa
> rser.java:2038)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParse
> r.java:2006)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsIt
> em(QueryPa
> rser.java:2456)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(Q
> ueryParser
> .java:2397)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryPa
> rser.java:2356)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryPa
> rser.java:2230)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParse
> r.java:2175)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(Q
> ueryParser.java:2106)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryPa
> rser.java:2038)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParse
> r.java:2006)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateIte
> m(QueryParser.java:1955)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateIte
> mList(QueryParser.java:1894)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(Qu
> eryParser.java:1862)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryPar
> ser.java:1604)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryP
> arser.java:1569)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser
> .java:711)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.jav
> a:512)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.ja
> va:362)
>         at
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBui
> lder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
>
>
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 10:54 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
>
> Functions in pig are case sensitive. The function name is TOKENIZE.
> Please, refer to PigLatin Manula for details:
> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
nts/plrm.h
> tm.
>
> Olga
>
> > -----Original Message-----
> > From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> > Sent: Thursday, February 19, 2009 10:51 AM
> > To: pig-user@hadoop.apache.org
> > Subject: RE: date treatment & date level aggregations
> >
> > Unfortunately, step B of the solution you proposed fails
> for me.  Any
> > thoughts on how to remedy?
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> > 2009-02-19 10:47:11,142 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Cannot instantiate:tokenize
> >         at
> org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> > java:475)
> >         at
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> > gScriptParser.java:233)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> > arser.java:91)
> >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >         at org.apache.pig.Main.main(Main.java:270)
> > Caused by:
> > org.apache.pig.impl.logicalLayer.parser.ParseException:
> > Cannot instantiate:tokenize
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> > on(QueryParser.java:2818)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> > ec(QueryParser.java:2354)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> > ec(QueryParser.java:2230)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> > QueryParser.java:2175)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> > tiveExpr(QueryParser.java:2106)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> > pr(QueryParser.java:2038)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> > QueryParser.java:2006)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> > enerateItem(QueryParser.java:1955)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> > enerateItemList(QueryParser.java:1894)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> > atement(QueryParser.java:1862)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> > k(QueryParser.java:1604)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> > use(QueryParser.java:1569)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> > ueryParser.java:711)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> > Parser.java:512)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> > yParser.java:362)
> >         at
> > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> > calPlanBuilder.java:47)
> >         at
> org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >         ... 5 more
> > Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
> >         at
> > org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> > ext.java:506)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> > text.java:528)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> > on(QueryParser.java:2815)
> >         ... 21 more
> > Caused by: java.io.IOException: Could not resolve tokenize using
> > imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
> > org.apache.pig.impl.builtin.]
> >         at
> > org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> > ption.java:34)
> >         at
> > org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
> >         ... 24 more
> > Caused by: java.lang.ClassNotFoundException: Could not resolve
> > tokenize using imports: [, org.apache.pig.builtin.,
> > com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
> >         at
> > org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
> >         ... 25 more
> >
> > 2009-02-19 10:47:11,143 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Cannot instantiate:tokenize
> > grunt>
> >
> >
> > thanks,
> > Avram
> >
> >
> > -----Original Message-----
> > From: Alan Gates [mailto:gates@yahoo-inc.com]
> > Sent: Thursday, February 19, 2009 9:49 AM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: date treatment & date level aggregations
> >
> > Date is not a separate type in pig.
> >
> > If you want to group on date, I think what you want is this:
> >
> > A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); B =
> > foreach A generate tokenize(A.v2) as (date,time), v3; C = foreach B
> > generate date, v3; D = group C by date; E = foreach D
> generate group,
> > SUM(C.v3); dump E;
> >
> > This script will first tokenize the datestamp into date and
> time, then
> > project just the date and data you're going to sum, and then do the
> > grouping.
> >
> > Alan.
> >
> > On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
> >
> > > Hello,
> > >
> > > I have a question regarding treatment of dates with PIG.
> > >
> > > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
> > > format (e.g. 20090201 14:42:00 ) within a comma delimited
> file.  I
> > > want to aggregate to day-level relying on extracting the
> > date portion
> > > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I
> > have been
> > > experimenting with the tokenize function but I am unclear how to
> > > accomplish an aggregation by date.
> > >
> > > What am I doing wrong? How can I get a date-level aggregation?
> > > Is there a 'Date' data type?
> > >
> > >
> > > Here are the details:
> > >
> > >
> > > Input Data:
> > >
> > > 4,20090201 23:59:56,8,1
> > > 3,20090202 23:59:56,101,1
> > > 4,20090201 23:59:56,114,1
> > > 5,20090202 23:59:56,29,1
> > >
> > > Desired Output:
> > > 20090201, 122
> > > 20090202, 130
> > >
> > > --My attempt in Pig:
> > > A = load 'atest.csv' using PigStorage(',') as
> > (v1,v2,v3,v4); describe
> > > A; B = foreach A generate group, tokenize(A.v2) as (date,time);
> > > --fails here.
> > > describe B;
> > > C = group B by B.date;
> > > describe C;
> > > D = foreach C generate B.date, SUM(A.v3); dump D;
> > >
> > >
> > > grunt> A = load 'atest.csv' using PigStorage(',') as
> (v1,v2,v3,v4);
> > > grunt> describe A;
> > > A: (v1, v2, v3, v4 )
> > > grunt> B = foreach A generate group, tokenize(A.v2) as
> (date,time);
> > > 2009-02-18 15:11:44,278 [main] ERROR
> > > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > >        at
> org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> > >        at
> > >
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
> > > 475)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
> > > 233)
> > >        at
> > > org
> > > .apache
> > > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> > >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> > >        at org.apache.pig.Main.main(Main.java:270)
> > > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > >
> .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
> > > 3301)
> > >        at
> > > org
> > > .apache
> > >
> >
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
> > > 3225)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
> > > 2236)
> > >        at
> > > org
> > > .apache
> > >
> >
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
> > > 2175)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > >
> >
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
> > > 2106)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
> > > 2038)
> > >        at
> > > org
> > > .apache
> > >
> >
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
> > > 2006)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > > .logicalLayer
> > > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > > .logicalLayer
> > >
> .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > >
> > .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
> > > 1862)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
> > > 1604)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> >
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
> > > 1569)
> > >        at
> > > org
> > > .apache
> > >
> > .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
> > > 711)
> > >        at
> > > org
> > > .apache
> > >
> .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> > >        at
> > > org
> > > .apache
> > >
> >
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
> > > 47)
> > >        at
> org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> > >        ... 5 more
> > >
> > > 2009-02-18 15:11:44,279 [main] ERROR
> > > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > > grunt>
> > >
> > >
> > > Thanks in advance,
> > > Avram
> >
> >
>

RE: date treatment & date level aggregations

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
TOKENIZE is not broken. It has particular semantics that might not work
for this query but are used in other contexts.

If a function with different semantics is needed, it can be written and
contributed to piggybank.

Olga

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
> Sent: Thursday, February 19, 2009 11:08 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Thanks for identifying that the TOKENIZE builtin needs a fix 
> and filing the bug report.
> I should have noted in my original email that I had tried 
> uppercase and that uppercase had also failed.
> 
> Thanks for everyone's help & I look forward to the fix.
> 
> Regards,
> -Avram
> 
> 
> -----Original Message-----
> From: Santhosh Srinivasan [mailto:sms@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 11:01 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Hi Avram,
> 
> A few things to note:
> 
> 1. The builtin functions in Pig are Java UDFs, making them 
> case sensitive. You should use TOKENIZE instead of tokenize 
> 2. It looks like the builtin TOKENIZE has to be fixed to 
> support your current usage. I have a filed a bug report to 
> track this : PIG-683
> (https://issues.apache.org/jira/browse/PIG-683)
> 
> When PIG-683 is fixed, you should then be able to do the following:
> 
> 
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> B = foreach A generate flatten(TOKENIZE(v2)) as (date,time), 
> v3; C = foreach B generate date, v3; D = group C by date; E = 
> foreach D generate group, SUM(C.v3); dump E;
> 
> Thanks,
> Santhosh
> 
> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> Sent: Thursday, February 19, 2009 10:59 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> I tried the capitalized version, that still leads to an 
> error. Now it appears to be a problem with the alias.
> 
> 
> 
> grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
> 2009-02-19 10:56:05,075 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
> alias: A in A: (v1, v2, v3, v4 )
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptPar
> ser.java:233)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java
> :91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
> Invalid alias: A in A: (v1, v2, v3, v4 )
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasField
> OrSpec(Que
> ryParser.java:3301)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(
> QueryParse
> r.java:3225)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryPa
> rser.java:2236)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParse
> r.java:2175)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(Q
> ueryParser.java:2106)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryPa
> rser.java:2038)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParse
> r.java:2006)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsIt
> em(QueryPa
> rser.java:2456)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(Q
> ueryParser
> .java:2397)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryPa
> rser.java:2356)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryPa
> rser.java:2230)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParse
> r.java:2175)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(Q
> ueryParser.java:2106)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryPa
> rser.java:2038)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParse
> r.java:2006)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateIte
> m(QueryParser.java:1955)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateIte
> mList(QueryParser.java:1894)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(Qu
> eryParser.java:1862)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryPar
> ser.java:1604)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryP
> arser.java:1569)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser
> .java:711)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.jav
> a:512)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.ja
> va:362)
>         at
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBui
> lder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
> 
> 
> -----Original Message-----
> From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 10:54 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Functions in pig are case sensitive. The function name is TOKENIZE.
> Please, refer to PigLatin Manula for details:
> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
nts/plrm.h
> tm.
> 
> Olga
> 
> > -----Original Message-----
> > From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> > Sent: Thursday, February 19, 2009 10:51 AM
> > To: pig-user@hadoop.apache.org
> > Subject: RE: date treatment & date level aggregations
> >
> > Unfortunately, step B of the solution you proposed fails 
> for me.  Any 
> > thoughts on how to remedy?
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> > 2009-02-19 10:47:11,142 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Cannot instantiate:tokenize
> >         at 
> org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> > java:475)
> >         at
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> > gScriptParser.java:233)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> > arser.java:91)
> >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >         at org.apache.pig.Main.main(Main.java:270)
> > Caused by:
> > org.apache.pig.impl.logicalLayer.parser.ParseException:
> > Cannot instantiate:tokenize
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> > on(QueryParser.java:2818)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> > ec(QueryParser.java:2354)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> > ec(QueryParser.java:2230)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> > QueryParser.java:2175)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> > tiveExpr(QueryParser.java:2106)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> > pr(QueryParser.java:2038)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> > QueryParser.java:2006)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> > enerateItem(QueryParser.java:1955)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> > enerateItemList(QueryParser.java:1894)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> > atement(QueryParser.java:1862)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> > k(QueryParser.java:1604)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> > use(QueryParser.java:1569)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> > ueryParser.java:711)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> > Parser.java:512)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> > yParser.java:362)
> >         at
> > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> > calPlanBuilder.java:47)
> >         at 
> org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >         ... 5 more
> > Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
> >         at
> > org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> > ext.java:506)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> > text.java:528)
> >         at
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> > on(QueryParser.java:2815)
> >         ... 21 more
> > Caused by: java.io.IOException: Could not resolve tokenize using 
> > imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., 
> > org.apache.pig.impl.builtin.]
> >         at
> > org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> > ption.java:34)
> >         at
> > org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
> >         at
> > org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
> >         ... 24 more
> > Caused by: java.lang.ClassNotFoundException: Could not resolve 
> > tokenize using imports: [, org.apache.pig.builtin., 
> > com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
> >         at
> > org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
> >         ... 25 more
> >
> > 2009-02-19 10:47:11,143 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Cannot instantiate:tokenize
> > grunt>
> >
> >
> > thanks,
> > Avram
> >
> >
> > -----Original Message-----
> > From: Alan Gates [mailto:gates@yahoo-inc.com]
> > Sent: Thursday, February 19, 2009 9:49 AM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: date treatment & date level aggregations
> >
> > Date is not a separate type in pig.
> >
> > If you want to group on date, I think what you want is this:
> >
> > A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); B = 
> > foreach A generate tokenize(A.v2) as (date,time), v3; C = foreach B 
> > generate date, v3; D = group C by date; E = foreach D 
> generate group, 
> > SUM(C.v3); dump E;
> >
> > This script will first tokenize the datestamp into date and 
> time, then 
> > project just the date and data you're going to sum, and then do the 
> > grouping.
> >
> > Alan.
> >
> > On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
> >
> > > Hello,
> > >
> > > I have a question regarding treatment of dates with PIG.
> > >
> > > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
> > > format (e.g. 20090201 14:42:00 ) within a comma delimited 
> file.  I 
> > > want to aggregate to day-level relying on extracting the
> > date portion
> > > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I
> > have been
> > > experimenting with the tokenize function but I am unclear how to 
> > > accomplish an aggregation by date.
> > >
> > > What am I doing wrong? How can I get a date-level aggregation?
> > > Is there a 'Date' data type?
> > >
> > >
> > > Here are the details:
> > >
> > >
> > > Input Data:
> > >
> > > 4,20090201 23:59:56,8,1
> > > 3,20090202 23:59:56,101,1
> > > 4,20090201 23:59:56,114,1
> > > 5,20090202 23:59:56,29,1
> > >
> > > Desired Output:
> > > 20090201, 122
> > > 20090202, 130
> > >
> > > --My attempt in Pig:
> > > A = load 'atest.csv' using PigStorage(',') as
> > (v1,v2,v3,v4); describe
> > > A; B = foreach A generate group, tokenize(A.v2) as (date,time); 
> > > --fails here.
> > > describe B;
> > > C = group B by B.date;
> > > describe C;
> > > D = foreach C generate B.date, SUM(A.v3); dump D;
> > >
> > >
> > > grunt> A = load 'atest.csv' using PigStorage(',') as 
> (v1,v2,v3,v4); 
> > > grunt> describe A;
> > > A: (v1, v2, v3, v4 )
> > > grunt> B = foreach A generate group, tokenize(A.v2) as 
> (date,time);
> > > 2009-02-18 15:11:44,278 [main] ERROR 
> > > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > >        at 
> org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> > >        at
> > > 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
> > > 475)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > 
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
> > > 233)
> > >        at
> > > org
> > > .apache
> > > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> > >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> > >        at org.apache.pig.Main.main(Main.java:270)
> > > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > > 
> .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
> > > 3301)
> > >        at
> > > org
> > > .apache
> > >
> > 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
> > > 3225)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
> > > 2236)
> > >        at
> > > org
> > > .apache
> > >
> > 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
> > > 2175)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > >
> > 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
> > > 2106)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
> > > 2038)
> > >        at
> > > org
> > > .apache
> > >
> > 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
> > > 2006)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > > .logicalLayer
> > > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > > .logicalLayer
> > > 
> .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > .impl
> > >
> > .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
> > > 1862)
> > >        at
> > > org
> > > .apache
> > > .pig
> > > 
> .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
> > > 1604)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
> > > 1569)
> > >        at
> > > org
> > > .apache
> > >
> > .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
> > > 711)
> > >        at
> > > org
> > > .apache
> > > 
> .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> > >        at
> > > org
> > > .apache
> > >
> > 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> > >        at
> > > org
> > > .apache
> > > .pig
> > >
> > .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
> > > 47)
> > >        at 
> org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> > >        ... 5 more
> > >
> > > 2009-02-18 15:11:44,279 [main] ERROR 
> > > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > > Invalid alias: group in A: (v1, v2, v3, v4 )
> > > grunt>
> > >
> > >
> > > Thanks in advance,
> > > Avram
> >
> >
> 

RE: date treatment & date level aggregations

Posted by Avram Aelony <Av...@eharmony.com>.
Thanks for identifying that the TOKENIZE builtin needs a fix and filing the bug report.
I should have noted in my original email that I had tried uppercase and that uppercase had also failed.

Thanks for everyone's help & I look forward to the fix.

Regards,
-Avram


-----Original Message-----
From: Santhosh Srinivasan [mailto:sms@yahoo-inc.com]
Sent: Thursday, February 19, 2009 11:01 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Hi Avram,

A few things to note:

1. The builtin functions in Pig are Java UDFs, making them case
sensitive. You should use TOKENIZE instead of tokenize
2. It looks like the builtin TOKENIZE has to be fixed to support your
current usage. I have a filed a bug report to track this : PIG-683
(https://issues.apache.org/jira/browse/PIG-683)

When PIG-683 is fixed, you should then be able to do the following:


A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate flatten(TOKENIZE(v2)) as (date,time), v3;
C = foreach B generate date, v3;
D = group C by date;
E = foreach D generate group, SUM(C.v3);
dump E;

Thanks,
Santhosh

-----Original Message-----
From: Avram Aelony [mailto:AvramAelony@eharmony.com]
Sent: Thursday, February 19, 2009 10:59 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

I tried the capitalized version, that still leads to an error. Now it
appears to be a problem with the alias.



grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
2009-02-19 10:56:05,075 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
alias: A in A: (v1, v2, v3, v4 )
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
ser.java:233)
        at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Invalid alias: A in A: (v1, v2, v3, v4 )
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(Que
ryParser.java:3301)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParse
r.java:3225)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
rser.java:2236)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
r.java:2175)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
ueryParser.java:2106)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
rser.java:2038)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
r.java:2006)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryPa
rser.java:2456)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser
.java:2397)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
rser.java:2356)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
rser.java:2230)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
r.java:2175)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
ueryParser.java:2106)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
rser.java:2038)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
r.java:2006)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
m(QueryParser.java:1955)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
mList(QueryParser.java:1894)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
eryParser.java:1862)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
ser.java:1604)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
arser.java:1569)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
.java:711)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
a:512)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
va:362)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
lder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more


-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com]
Sent: Thursday, February 19, 2009 10:54 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
tm.

Olga

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com]
> Sent: Thursday, February 19, 2009 10:51 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
>
> Unfortunately, step B of the solution you proposed fails for
> me.  Any thoughts on how to remedy?
>
>
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> 2009-02-19 10:47:11,142 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> Cannot instantiate:tokenize
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptParser.java:233)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by:
> org.apache.pig.impl.logicalLayer.parser.ParseException:
> Cannot instantiate:tokenize
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2818)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryParser.java:2354)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryParser.java:2230)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParser.java:2175)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(QueryParser.java:2106)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryParser.java:2038)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParser.java:2006)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItem(QueryParser.java:1955)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItemList(QueryParser.java:1894)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(QueryParser.java:1862)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryParser.java:1604)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryParser.java:1569)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser.java:711)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.java:512)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.java:362)
>         at
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBuilder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
> Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
>         at
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> ext.java:506)
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> text.java:528)
>         at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2815)
>         ... 21 more
> Caused by: java.io.IOException: Could not resolve tokenize
> using imports: [, org.apache.pig.builtin.,
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> ption.java:34)
>         at
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
>         at
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
>         ... 24 more
> Caused by: java.lang.ClassNotFoundException: Could not
> resolve tokenize using imports: [, org.apache.pig.builtin.,
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
>         ... 25 more
>
> 2009-02-19 10:47:11,143 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> Cannot instantiate:tokenize
> grunt>
>
>
> thanks,
> Avram
>
>
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 9:49 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: date treatment & date level aggregations
>
> Date is not a separate type in pig.
>
> If you want to group on date, I think what you want is this:
>
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
> foreach B generate date, v3; D = group C by date; E = foreach
> D generate group, SUM(C.v3); dump E;
>
> This script will first tokenize the datestamp into date and
> time, then project just the date and data you're going to
> sum, and then do the grouping.
>
> Alan.
>
> On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
>
> > Hello,
> >
> > I have a question regarding treatment of dates with PIG.
> >
> > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
> > format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I
> > want to aggregate to day-level relying on extracting the
> date portion
> > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I
> have been
> > experimenting with the tokenize function but I am unclear how to
> > accomplish an aggregation by date.
> >
> > What am I doing wrong? How can I get a date-level aggregation?
> > Is there a 'Date' data type?
> >
> >
> > Here are the details:
> >
> >
> > Input Data:
> >
> > 4,20090201 23:59:56,8,1
> > 3,20090202 23:59:56,101,1
> > 4,20090201 23:59:56,114,1
> > 5,20090202 23:59:56,29,1
> >
> > Desired Output:
> > 20090201, 122
> > 20090202, 130
> >
> > --My attempt in Pig:
> > A = load 'atest.csv' using PigStorage(',') as
> (v1,v2,v3,v4); describe
> > A; B = foreach A generate group, tokenize(A.v2) as (date,time);
> > --fails here.
> > describe B;
> > C = group B by B.date;
> > describe C;
> > D = foreach C generate B.date, SUM(A.v3); dump D;
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> > 2009-02-18 15:11:44,278 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
> > 475)
> >        at
> > org
> > .apache
> > .pig
> > .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
> > 233)
> >        at
> > org
> > .apache
> > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >        at org.apache.pig.Main.main(Main.java:270)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
> > 3301)
> >        at
> > org
> > .apache
> >
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
> > 3225)
> >        at
> > org
> > .apache
> > .pig
> >
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
> > 2236)
> >        at
> > org
> > .apache
> >
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
> > 2175)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> >
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
> > 2106)
> >        at
> > org
> > .apache
> > .pig
> >
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
> > 2038)
> >        at
> > org
> > .apache
> >
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
> > 2006)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> >
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
> > 1862)
> >        at
> > org
> > .apache
> > .pig
> > .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
> > 1604)
> >        at
> > org
> > .apache
> > .pig
> >
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
> > 1569)
> >        at
> > org
> > .apache
> >
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
> > 711)
> >        at
> > org
> > .apache
> > .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> >        at
> > org
> > .apache
> >
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> >        at
> > org
> > .apache
> > .pig
> >
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
> > 47)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >        ... 5 more
> >
> > 2009-02-18 15:11:44,279 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> > grunt>
> >
> >
> > Thanks in advance,
> > Avram
>
>

RE: date treatment & date level aggregations

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
Hi Avram,

A few things to note:

1. The builtin functions in Pig are Java UDFs, making them case
sensitive. You should use TOKENIZE instead of tokenize
2. It looks like the builtin TOKENIZE has to be fixed to support your
current usage. I have a filed a bug report to track this : PIG-683
(https://issues.apache.org/jira/browse/PIG-683)

When PIG-683 is fixed, you should then be able to do the following:


A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate flatten(TOKENIZE(v2)) as (date,time), v3;
C = foreach B generate date, v3;
D = group C by date;
E = foreach D generate group, SUM(C.v3);
dump E;

Thanks,
Santhosh 

-----Original Message-----
From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
Sent: Thursday, February 19, 2009 10:59 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

I tried the capitalized version, that still leads to an error. Now it
appears to be a problem with the alias.



grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
2009-02-19 10:56:05,075 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
alias: A in A: (v1, v2, v3, v4 )
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
ser.java:233)
        at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Invalid alias: A in A: (v1, v2, v3, v4 )
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(Que
ryParser.java:3301)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParse
r.java:3225)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
rser.java:2236)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
r.java:2175)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
ueryParser.java:2106)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
rser.java:2038)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
r.java:2006)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryPa
rser.java:2456)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser
.java:2397)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
rser.java:2356)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
rser.java:2230)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
r.java:2175)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
ueryParser.java:2106)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
rser.java:2038)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
r.java:2006)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
m(QueryParser.java:1955)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
mList(QueryParser.java:1894)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
eryParser.java:1862)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
ser.java:1604)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
arser.java:1569)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
.java:711)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
a:512)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
va:362)
        at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
lder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more


-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com] 
Sent: Thursday, February 19, 2009 10:54 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
tm.

Olga 

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
> Sent: Thursday, February 19, 2009 10:51 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Unfortunately, step B of the solution you proposed fails for 
> me.  Any thoughts on how to remedy?
> 
> 
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> 2009-02-19 10:47:11,142 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptParser.java:233)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: 
> org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2818)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryParser.java:2354)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryParser.java:2230)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParser.java:2175)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(QueryParser.java:2106)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryParser.java:2038)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParser.java:2006)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItem(QueryParser.java:1955)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItemList(QueryParser.java:1894)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(QueryParser.java:1862)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryParser.java:1604)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryParser.java:1569)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser.java:711)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.java:512)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.java:362)
>         at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBuilder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
> Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> ext.java:506)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> text.java:528)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2815)
>         ... 21 more
> Caused by: java.io.IOException: Could not resolve tokenize 
> using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> ption.java:34)
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
>         ... 24 more
> Caused by: java.lang.ClassNotFoundException: Could not 
> resolve tokenize using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
>         ... 25 more
> 
> 2009-02-19 10:47:11,143 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
> grunt>
> 
> 
> thanks,
> Avram
> 
> 
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 9:49 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: date treatment & date level aggregations
> 
> Date is not a separate type in pig.
> 
> If you want to group on date, I think what you want is this:
> 
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> B = foreach A generate tokenize(A.v2) as (date,time), v3; C = 
> foreach B generate date, v3; D = group C by date; E = foreach 
> D generate group, SUM(C.v3); dump E;
> 
> This script will first tokenize the datestamp into date and 
> time, then project just the date and data you're going to 
> sum, and then do the grouping.
> 
> Alan.
> 
> On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
> 
> > Hello,
> >
> > I have a question regarding treatment of dates with PIG.
> >
> > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> > format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I 
> > want to aggregate to day-level relying on extracting the 
> date portion 
> > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I 
> have been 
> > experimenting with the tokenize function but I am unclear how to 
> > accomplish an aggregation by date.
> >
> > What am I doing wrong? How can I get a date-level aggregation?
> > Is there a 'Date' data type?
> >
> >
> > Here are the details:
> >
> >
> > Input Data:
> >
> > 4,20090201 23:59:56,8,1
> > 3,20090202 23:59:56,101,1
> > 4,20090201 23:59:56,114,1
> > 5,20090202 23:59:56,29,1
> >
> > Desired Output:
> > 20090201, 122
> > 20090202, 130
> >
> > --My attempt in Pig:
> > A = load 'atest.csv' using PigStorage(',') as 
> (v1,v2,v3,v4); describe 
> > A; B = foreach A generate group, tokenize(A.v2) as (date,time); 
> > --fails here.
> > describe B;
> > C = group B by B.date;
> > describe C;
> > D = foreach C generate B.date, SUM(A.v3); dump D;
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> > 2009-02-18 15:11:44,278 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> > 475)
> >        at
> > org
> > .apache
> > .pig
> > .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> > 233)
> >        at
> > org
> > .apache
> > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >        at org.apache.pig.Main.main(Main.java:270)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> > 3301)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> > 3225)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> > 2236)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> > 2175)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> > 2106)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> > 2038)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> > 2006)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> > 1862)
> >        at
> > org
> > .apache
> > .pig
> > .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> > 1604)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> > 1569)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> > 711)
> >        at
> > org
> > .apache
> > .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> > 47)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >        ... 5 more
> >
> > 2009-02-18 15:11:44,279 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> > grunt>
> >
> >
> > Thanks in advance,
> > Avram
> 
> 

RE: date treatment & date level aggregations

Posted by Avram Aelony <Av...@eharmony.com>.
I tried the capitalized version, that still leads to an error. Now it appears to be a problem with the alias.



grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
2009-02-19 10:56:05,075 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: A in A: (v1, v2, v3, v4 )
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
        at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: A in A: (v1, v2, v3, v4 )
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryParser.java:2456)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser.java:2397)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryParser.java:2356)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2230)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
        at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more


-----Original Message-----
From: Olga Natkovich [mailto:olgan@yahoo-inc.com] 
Sent: Thursday, February 19, 2009 10:54 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
tm.

Olga 

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
> Sent: Thursday, February 19, 2009 10:51 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Unfortunately, step B of the solution you proposed fails for 
> me.  Any thoughts on how to remedy?
> 
> 
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> 2009-02-19 10:47:11,142 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptParser.java:233)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: 
> org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2818)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryParser.java:2354)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryParser.java:2230)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParser.java:2175)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(QueryParser.java:2106)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryParser.java:2038)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParser.java:2006)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItem(QueryParser.java:1955)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItemList(QueryParser.java:1894)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(QueryParser.java:1862)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryParser.java:1604)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryParser.java:1569)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser.java:711)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.java:512)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.java:362)
>         at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBuilder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
> Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> ext.java:506)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> text.java:528)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2815)
>         ... 21 more
> Caused by: java.io.IOException: Could not resolve tokenize 
> using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> ption.java:34)
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
>         ... 24 more
> Caused by: java.lang.ClassNotFoundException: Could not 
> resolve tokenize using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
>         ... 25 more
> 
> 2009-02-19 10:47:11,143 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
> grunt>
> 
> 
> thanks,
> Avram
> 
> 
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 9:49 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: date treatment & date level aggregations
> 
> Date is not a separate type in pig.
> 
> If you want to group on date, I think what you want is this:
> 
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> B = foreach A generate tokenize(A.v2) as (date,time), v3; C = 
> foreach B generate date, v3; D = group C by date; E = foreach 
> D generate group, SUM(C.v3); dump E;
> 
> This script will first tokenize the datestamp into date and 
> time, then project just the date and data you're going to 
> sum, and then do the grouping.
> 
> Alan.
> 
> On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
> 
> > Hello,
> >
> > I have a question regarding treatment of dates with PIG.
> >
> > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> > format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I 
> > want to aggregate to day-level relying on extracting the 
> date portion 
> > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I 
> have been 
> > experimenting with the tokenize function but I am unclear how to 
> > accomplish an aggregation by date.
> >
> > What am I doing wrong? How can I get a date-level aggregation?
> > Is there a 'Date' data type?
> >
> >
> > Here are the details:
> >
> >
> > Input Data:
> >
> > 4,20090201 23:59:56,8,1
> > 3,20090202 23:59:56,101,1
> > 4,20090201 23:59:56,114,1
> > 5,20090202 23:59:56,29,1
> >
> > Desired Output:
> > 20090201, 122
> > 20090202, 130
> >
> > --My attempt in Pig:
> > A = load 'atest.csv' using PigStorage(',') as 
> (v1,v2,v3,v4); describe 
> > A; B = foreach A generate group, tokenize(A.v2) as (date,time); 
> > --fails here.
> > describe B;
> > C = group B by B.date;
> > describe C;
> > D = foreach C generate B.date, SUM(A.v3); dump D;
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> > 2009-02-18 15:11:44,278 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> > 475)
> >        at
> > org
> > .apache
> > .pig
> > .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> > 233)
> >        at
> > org
> > .apache
> > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >        at org.apache.pig.Main.main(Main.java:270)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> > 3301)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> > 3225)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> > 2236)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> > 2175)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> > 2106)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> > 2038)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> > 2006)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> > 1862)
> >        at
> > org
> > .apache
> > .pig
> > .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> > 1604)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> > 1569)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> > 711)
> >        at
> > org
> > .apache
> > .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> > 47)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >        ... 5 more
> >
> > 2009-02-18 15:11:44,279 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> > grunt>
> >
> >
> > Thanks in advance,
> > Avram
> 
> 

RE: date treatment & date level aggregations

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
tm.

Olga 

> -----Original Message-----
> From: Avram Aelony [mailto:AvramAelony@eharmony.com] 
> Sent: Thursday, February 19, 2009 10:51 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: date treatment & date level aggregations
> 
> Unfortunately, step B of the solution you proposed fails for 
> me.  Any thoughts on how to remedy?
> 
> 
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
> 2009-02-19 10:47:11,142 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
> java:475)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
> gScriptParser.java:233)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
> arser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: 
> org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2818)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
> ec(QueryParser.java:2354)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
> ec(QueryParser.java:2230)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
> QueryParser.java:2175)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
> tiveExpr(QueryParser.java:2106)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
> pr(QueryParser.java:2038)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
> QueryParser.java:2006)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItem(QueryParser.java:1955)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
> enerateItemList(QueryParser.java:1894)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
> atement(QueryParser.java:1862)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
> k(QueryParser.java:1604)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
> use(QueryParser.java:1569)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
> ueryParser.java:711)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
> Parser.java:512)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
> yParser.java:362)
>         at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
> calPlanBuilder.java:47)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>         ... 5 more
> Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
> ext.java:506)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
> text.java:528)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
> on(QueryParser.java:2815)
>         ... 21 more
> Caused by: java.io.IOException: Could not resolve tokenize 
> using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
> ption.java:34)
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
>         at 
> org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
>         ... 24 more
> Caused by: java.lang.ClassNotFoundException: Could not 
> resolve tokenize using imports: [, org.apache.pig.builtin., 
> com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
>         at 
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
>         ... 25 more
> 
> 2009-02-19 10:47:11,143 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: 
> Cannot instantiate:tokenize
> grunt>
> 
> 
> thanks,
> Avram
> 
> 
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Thursday, February 19, 2009 9:49 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: date treatment & date level aggregations
> 
> Date is not a separate type in pig.
> 
> If you want to group on date, I think what you want is this:
> 
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> B = foreach A generate tokenize(A.v2) as (date,time), v3; C = 
> foreach B generate date, v3; D = group C by date; E = foreach 
> D generate group, SUM(C.v3); dump E;
> 
> This script will first tokenize the datestamp into date and 
> time, then project just the date and data you're going to 
> sum, and then do the grouping.
> 
> Alan.
> 
> On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:
> 
> > Hello,
> >
> > I have a question regarding treatment of dates with PIG.
> >
> > My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> > format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I 
> > want to aggregate to day-level relying on extracting the 
> date portion 
> > (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I 
> have been 
> > experimenting with the tokenize function but I am unclear how to 
> > accomplish an aggregation by date.
> >
> > What am I doing wrong? How can I get a date-level aggregation?
> > Is there a 'Date' data type?
> >
> >
> > Here are the details:
> >
> >
> > Input Data:
> >
> > 4,20090201 23:59:56,8,1
> > 3,20090202 23:59:56,101,1
> > 4,20090201 23:59:56,114,1
> > 5,20090202 23:59:56,29,1
> >
> > Desired Output:
> > 20090201, 122
> > 20090202, 130
> >
> > --My attempt in Pig:
> > A = load 'atest.csv' using PigStorage(',') as 
> (v1,v2,v3,v4); describe 
> > A; B = foreach A generate group, tokenize(A.v2) as (date,time); 
> > --fails here.
> > describe B;
> > C = group B by B.date;
> > describe C;
> > D = foreach C generate B.date, SUM(A.v3); dump D;
> >
> >
> > grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); 
> > grunt> describe A;
> > A: (v1, v2, v3, v4 )
> > grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> > 2009-02-18 15:11:44,278 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> > 475)
> >        at
> > org
> > .apache
> > .pig
> > .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> > 233)
> >        at
> > org
> > .apache
> > .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> >        at org.apache.pig.Main.main(Main.java:270)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> > 3301)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> > 3225)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> > 2236)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> > 2175)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> > 2106)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> > 2038)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> > 2006)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > .logicalLayer
> > .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
> >        at
> > org
> > .apache
> > .pig
> > .impl
> > 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> > 1862)
> >        at
> > org
> > .apache
> > .pig
> > .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> > 1604)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> > 1569)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> > 711)
> >        at
> > org
> > .apache
> > .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
> >        at
> > org
> > .apache
> > 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
> >        at
> > org
> > .apache
> > .pig
> > 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> > 47)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
> >        ... 5 more
> >
> > 2009-02-18 15:11:44,279 [main] ERROR 
> > org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
> > Invalid alias: group in A: (v1, v2, v3, v4 )
> > grunt>
> >
> >
> > Thanks in advance,
> > Avram
> 
> 

RE: date treatment & date level aggregations

Posted by Avram Aelony <Av...@eharmony.com>.
Unfortunately, step B of the solution you proposed fails for me.  Any thoughts on how to remedy?


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
2009-02-19 10:47:11,142 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot instantiate:tokenize
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
        at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Cannot instantiate:tokenize
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryParser.java:2818)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryParser.java:2354)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2230)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
        at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more
Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
        at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
        at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:506)
        at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:528)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryParser.java:2815)
        ... 21 more
Caused by: java.io.IOException: Could not resolve tokenize using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
        at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
        at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
        ... 24 more
Caused by: java.lang.ClassNotFoundException: Could not resolve tokenize using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
        ... 25 more

2009-02-19 10:47:11,143 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot instantiate:tokenize
grunt>


thanks,
Avram


-----Original Message-----
From: Alan Gates [mailto:gates@yahoo-inc.com] 
Sent: Thursday, February 19, 2009 9:49 AM
To: pig-user@hadoop.apache.org
Subject: Re: date treatment & date level aggregations

Date is not a separate type in pig.

If you want to group on date, I think what you want is this:

A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate tokenize(A.v2) as (date,time), v3;
C = foreach B generate date, v3;
D = group C by date;
E = foreach D generate group, SUM(C.v3);
dump E;

This script will first tokenize the datestamp into date and time, then  
project just the date and data you're going to sum, and then do the  
grouping.

Alan.

On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

> Hello,
>
> I have a question regarding treatment of dates with PIG.
>
> My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I  
> want to aggregate to day-level relying on extracting the date  
> portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I  
> have been experimenting with the tokenize function but I am unclear  
> how to accomplish an aggregation by date.
>
> What am I doing wrong? How can I get a date-level aggregation?
> Is there a 'Date' data type?
>
>
> Here are the details:
>
>
> Input Data:
>
> 4,20090201 23:59:56,8,1
> 3,20090202 23:59:56,101,1
> 4,20090201 23:59:56,114,1
> 5,20090202 23:59:56,29,1
>
> Desired Output:
> 20090201, 122
> 20090202, 130
>
> --My attempt in Pig:
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> describe A;
> B = foreach A generate group, tokenize(A.v2) as (date,time); --fails  
> here.
> describe B;
> C = group B by B.date;
> describe C;
> D = foreach C generate B.date, SUM(A.v3);
> dump D;
>
>
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> 2009-02-18 15:11:44,278 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>        at  
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> 475)
>        at  
> org 
> .apache 
> .pig 
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> 233)
>        at  
> org 
> .apache 
> .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>        at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> 3301)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> 3225)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> 2236)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> 2175)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> 2106)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> 2038)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> 2006)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> 1862)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> 1604)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> 1569)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> 711)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> 47)
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>        ... 5 more
>
> 2009-02-18 15:11:44,279 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
> grunt>
>
>
> Thanks in advance,
> Avram


Re: date treatment & date level aggregations

Posted by Alan Gates <ga...@yahoo-inc.com>.
Date is not a separate type in pig.

If you want to group on date, I think what you want is this:

A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate tokenize(A.v2) as (date,time), v3;
C = foreach B generate date, v3;
D = group C by date;
E = foreach D generate group, SUM(C.v3);
dump E;

This script will first tokenize the datestamp into date and time, then  
project just the date and data you're going to sum, and then do the  
grouping.

Alan.

On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

> Hello,
>
> I have a question regarding treatment of dates with PIG.
>
> My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'  
> format (e.g. 20090201 14:42:00 ) within a comma delimited file.  I  
> want to aggregate to day-level relying on extracting the date  
> portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only.  I  
> have been experimenting with the tokenize function but I am unclear  
> how to accomplish an aggregation by date.
>
> What am I doing wrong? How can I get a date-level aggregation?
> Is there a 'Date' data type?
>
>
> Here are the details:
>
>
> Input Data:
>
> 4,20090201 23:59:56,8,1
> 3,20090202 23:59:56,101,1
> 4,20090201 23:59:56,114,1
> 5,20090202 23:59:56,29,1
>
> Desired Output:
> 20090201, 122
> 20090202, 130
>
> --My attempt in Pig:
> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> describe A;
> B = foreach A generate group, tokenize(A.v2) as (date,time); --fails  
> here.
> describe B;
> C = group B by B.date;
> describe C;
> D = foreach C generate B.date, SUM(A.v3);
> dump D;
>
>
> grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
> grunt> describe A;
> A: (v1, v2, v3, v4 )
> grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
> 2009-02-18 15:11:44,278 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
>        at  
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java: 
> 475)
>        at  
> org 
> .apache 
> .pig 
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> 233)
>        at  
> org 
> .apache 
> .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>        at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java: 
> 3301)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java: 
> 3225)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java: 
> 2236)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java: 
> 2175)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java: 
> 2106)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java: 
> 2038)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java: 
> 2006)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer 
> .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
>        at  
> org 
> .apache 
> .pig 
> .impl 
> .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java: 
> 1862)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java: 
> 1604)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java: 
> 1569)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java: 
> 711)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
>        at  
> org 
> .apache 
> .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
>        at  
> org 
> .apache 
> .pig 
> .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java: 
> 47)
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
>        ... 5 more
>
> 2009-02-18 15:11:44,279 [main] ERROR  
> org.apache.pig.tools.grunt.GruntParser - java.io.IOException:  
> Invalid alias: group in A: (v1, v2, v3, v4 )
> grunt>
>
>
> Thanks in advance,
> Avram