You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by yogesh dhari <yo...@live.com> on 2012/09/30 00:02:27 UTC
how to perform GROUP BY in PIG for this case:
Hi all,
I have this data, having fields (Date, symbol, rate)
and I want it to be group by Months, and to find out the maximum rate value for each month.
like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
(2009-08-21,CLI,33.38)
(2009-08-24,CLI,33.03)
(2009-08-25,CLI,33.16)
(2009-08-26,CLI,32.78)
(2009-08-27,CLI,32.79)
(2009-08-28,CLI,33.37)
(2009-08-31,CLI,32.51)
(2009-09-11,CLI,34.08)
(2009-09-14,CLI,35.19)
(2009-09-15,CLI,35.82)
(2009-09-16,CLI,36.58)
(2009-09-24,CLI,33.98)
(2009-09-25,CLI,32.44)
(2009-09-28,CLI,33.34)
(2009-09-29,CLI,33.6)
(2009-09-30,CLI,33.24)
(2009-10-01,CLI,31.98)
(2009-10-02,CLI,31.21)
(2009-10-05,CLI,31.31)
(2009-10-21,CLI,32.86)
(2009-10-26,CLI,33.15)
(2009-10-27,CLI,32.71)
(2009-10-28,CLI,32.03)
(2009-10-29,CLI,32.05)
(2009-10-30,CLI,31.88)
(2009-11-02,CLI,31.88)
(2009-11-03,CLI,31.16)
(2009-11-04,CLI,31.47)
(2009-11-09,CLI,31.59)
(2009-11-25,CLI,30.58)
(2009-11-27,CLI,30.19)
(2009-11-30,CLI,30.86)
(2009-12-01,CLI,31.74)
(2009-12-02,CLI,32.62)
(2009-12-03,CLI,33.43)
(2009-12-04,CLI,34.12)
(2009-12-07,CLI,33.77)
(2009-12-08,CLI,33.8)
(2009-12-09,CLI,33.71)
Please help and suggest .
Thanks & Regards
Yogesh Kumar
RE: how to perform GROUP BY in PIG for this case:
Posted by yogesh dhari <yo...@live.com>.
Thanks Russell :-),
I have build the pig in /opt/pig-0.10.0/
and in
/opt/pig-0.10.0/contrib/Piggybank/java/
and the jar files present there
and registered
to
grunt> register /opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar
grunt> register /opt/pig-0.10.0/build/ivy/lib/Pig/joda-time-1.6.jar
and also defined
grunt> define CustomFormatToISO org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO() ;
grunt> define ISOToMonth org.apache.pig.piggybank.evaluation.datetime.convert.ISOToMonth();
Now I performed the query on NYSE_B.
grunt> describe NYSE_B;
NYSE_B: {exchange: chararray,symbol: chararray,date: chararray,divi: float}
ans = foreach (group NYSE_B by ISOToMonth(date)) generate group as monthh, MAX(NYSE_A.divi) as max_rt;
and again got the ERROR :-(
2012-09-30 10:25:15,821 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.datetime.convert.ISOToMonth using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
2012-09-30 10:25:15,822 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to.
2012-09-30 10:25:15,822 [main] ERROR org.apache.pig.tools.grunt.Grunt - Failed to parse: Pig script failed to parse:
<line 12, column 31> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: Cannot instantiate: org.apache.pig.piggybank.evaluation.datetime.convert.ISOToMonth
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by:
<line 12, column 31> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: Cannot instantiate: org.apache.pig.piggybank.evaluation.datetime.convert.ISOToMonth
at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:980)
at org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:7316)
at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:8857)
at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:8632)
at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:7984)
at org.apache.pig.parser.LogicalPlanGenerator.join_group_by_expr(LogicalPlanGenerator.java:12100)
at org.apache.pig.parser.LogicalPlanGenerator.join_group_by_clause(LogicalPlanGenerator.java:11921)
at org.apache.pig.parser.LogicalPlanGenerator.group_item(LogicalPlanGenerator.java:5440)
at org.apache.pig.parser.LogicalPlanGenerator.group_clause(LogicalPlanGenerator.java:5026)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1313)
at org.apache.pig.parser.LogicalPlanGenerator.inline_op(LogicalPlanGenerator.java:5739)
at org.apache.pig.parser.LogicalPlanGenerator.rel(LogicalPlanGenerator.java:5669)
at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:12350)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1577)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
... 15 more
Please suggest and help
Thanks & regards
Yogesh Kumar.
> From: russell.jurney@gmail.com
> Date: Sat, 29 Sep 2012 20:36:47 -0700
> Subject: Re: how to perform GROUP BY in PIG for this case:
> To: user@pig.apache.org
>
> You'll need to build pig. Assuming you have the source, run 'ant' in
> the base directory and in contrib/Piggybank/java
>
> Russell Jurney http://datasyndrome.com
>
> On Sep 29, 2012, at 8:19 PM, yogesh dhari <yo...@live.com> wrote:
>
> >
> >
> > Hi russell,
> >
> > I am using Pig-0.10.0 version and I checked the directory /opt/Pig-0.10.0/contrib/piggybank/java/
> >
> > there is no any jar files. :-(
> >
> > grunt> register /opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar
> >
> > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar' does not exist.
> > Details at logfile: /opt/pig-0.10.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/pig_1348974384533.log
> >
> > similarly
> >
> > there is no path /opt/build/ivy/lib/Pig/
> >
> > instead /opt/pig-0.10.0/ivy is there. but it has no /lib/Pig/
> >
> > Please suggest & help
> >
> > Thanks & regards
> > Yogesh Kumar
> >
> >
> >
> >
> >> From: russell.jurney@gmail.com
> >> Date: Sat, 29 Sep 2012 19:21:17 -0700
> >> Subject: Re: how to perform GROUP BY in PIG for this case:
> >> To: user@pig.apache.org
> >>
> >> My bad - you will need to register the Piggybank and jodatime jars. Replace
> >> /me/pig with your pig install path.
> >>
> >> register /me/pig/contrib/piggybank/java/piggybank.jar
> >> register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar
> >>
> >> define CustomFormatToISO
> >> org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
> >>
> >> define ISOToMonth
> >> org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
> >>
> >>
> >> That should take care of the error.
> >>
> >> This example may help:
> >> https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >> On Sep 29, 2012, at 4:33 PM, yogesh dhari <yo...@live.com> wrote:
> >>
> >>
> >> Thanks Russell,
> >>
> >> I am new to Pig. I have tried this command.
> >> and got this exception.
> >>
> >> 2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >> ERROR 1070: Could not resolve ISOToMonth using imports: [,
> >> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> >>
> >> Is there some thing more I need to do like import or some thing like that.
> >>
> >> Please suggest.
> >>
> >> Thanks & regards
> >> Yogesh Kumar
> >>
> >> From: russell.jurney@gmail.com
> >>
> >> Date: Sat, 29 Sep 2012 16:15:18 -0700
> >>
> >> Subject: Re: how to perform GROUP BY in PIG for this case:
> >>
> >> To: user@pig.apache.org
> >>
> >>
> >> answer = foreach (group data by ISOToMonth(Date)) generate group as
> >>
> >> month, MAX(data.rate) as max_rate;
> >>
> >>
> >> Note, you will need your date in ISO8601 format, and you can use
> >>
> >> CustomFormatToISO to convert it if it's is a string, or UnixToISO if
> >>
> >> your date is a long.
> >>
> >>
> >> See:
> >>
> >>
> >> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
> >>
> >> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
> >>
> >> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
> >>
> >>
> >>
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >>
> >> On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
> >>
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >>
> >> I have this data, having fields (Date, symbol, rate)
> >>
> >>
> >>
> >>
> >> and I want it to be group by Months, and to find out the maximum rate value
> >> for each month.
> >>
> >>
> >>
> >>
> >> like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
> >>
> >>
> >>
> >>
> >>
> >>
> >> (2009-08-21,CLI,33.38)
> >>
> >>
> >> (2009-08-24,CLI,33.03)
> >>
> >>
> >> (2009-08-25,CLI,33.16)
> >>
> >>
> >> (2009-08-26,CLI,32.78)
> >>
> >>
> >> (2009-08-27,CLI,32.79)
> >>
> >>
> >> (2009-08-28,CLI,33.37)
> >>
> >>
> >> (2009-08-31,CLI,32.51)
> >>
> >>
> >> (2009-09-11,CLI,34.08)
> >>
> >>
> >> (2009-09-14,CLI,35.19)
> >>
> >>
> >> (2009-09-15,CLI,35.82)
> >>
> >>
> >> (2009-09-16,CLI,36.58)
> >>
> >>
> >> (2009-09-24,CLI,33.98)
> >>
> >>
> >> (2009-09-25,CLI,32.44)
> >>
> >>
> >> (2009-09-28,CLI,33.34)
> >>
> >>
> >> (2009-09-29,CLI,33.6)
> >>
> >>
> >> (2009-09-30,CLI,33.24)
> >>
> >>
> >> (2009-10-01,CLI,31.98)
> >>
> >>
> >> (2009-10-02,CLI,31.21)
> >>
> >>
> >> (2009-10-05,CLI,31.31)
> >>
> >>
> >> (2009-10-21,CLI,32.86)
> >>
> >>
> >> (2009-10-26,CLI,33.15)
> >>
> >>
> >> (2009-10-27,CLI,32.71)
> >>
> >>
> >> (2009-10-28,CLI,32.03)
> >>
> >>
> >> (2009-10-29,CLI,32.05)
> >>
> >>
> >> (2009-10-30,CLI,31.88)
> >>
> >>
> >> (2009-11-02,CLI,31.88)
> >>
> >>
> >> (2009-11-03,CLI,31.16)
> >>
> >>
> >> (2009-11-04,CLI,31.47)
> >>
> >>
> >> (2009-11-09,CLI,31.59)
> >>
> >>
> >> (2009-11-25,CLI,30.58)
> >>
> >>
> >> (2009-11-27,CLI,30.19)
> >>
> >>
> >> (2009-11-30,CLI,30.86)
> >>
> >>
> >> (2009-12-01,CLI,31.74)
> >>
> >>
> >> (2009-12-02,CLI,32.62)
> >>
> >>
> >> (2009-12-03,CLI,33.43)
> >>
> >>
> >> (2009-12-04,CLI,34.12)
> >>
> >>
> >> (2009-12-07,CLI,33.77)
> >>
> >>
> >> (2009-12-08,CLI,33.8)
> >>
> >>
> >> (2009-12-09,CLI,33.71)
> >>
> >>
> >>
> >>
> >> Please help and suggest .
> >>
> >>
> >>
> >>
> >> Thanks & Regards
> >>
> >>
> >> Yogesh Kumar
> >
Re: how to perform GROUP BY in PIG for this case:
Posted by Russell Jurney <ru...@gmail.com>.
You'll need to build pig. Assuming you have the source, run 'ant' in
the base directory and in contrib/Piggybank/java
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 8:19 PM, yogesh dhari <yo...@live.com> wrote:
>
>
> Hi russell,
>
> I am using Pig-0.10.0 version and I checked the directory /opt/Pig-0.10.0/contrib/piggybank/java/
>
> there is no any jar files. :-(
>
> grunt> register /opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar
>
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar' does not exist.
> Details at logfile: /opt/pig-0.10.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/pig_1348974384533.log
>
> similarly
>
> there is no path /opt/build/ivy/lib/Pig/
>
> instead /opt/pig-0.10.0/ivy is there. but it has no /lib/Pig/
>
> Please suggest & help
>
> Thanks & regards
> Yogesh Kumar
>
>
>
>
>> From: russell.jurney@gmail.com
>> Date: Sat, 29 Sep 2012 19:21:17 -0700
>> Subject: Re: how to perform GROUP BY in PIG for this case:
>> To: user@pig.apache.org
>>
>> My bad - you will need to register the Piggybank and jodatime jars. Replace
>> /me/pig with your pig install path.
>>
>> register /me/pig/contrib/piggybank/java/piggybank.jar
>> register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar
>>
>> define CustomFormatToISO
>> org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
>>
>> define ISOToMonth
>> org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
>>
>>
>> That should take care of the error.
>>
>> This example may help:
>> https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Sep 29, 2012, at 4:33 PM, yogesh dhari <yo...@live.com> wrote:
>>
>>
>> Thanks Russell,
>>
>> I am new to Pig. I have tried this command.
>> and got this exception.
>>
>> 2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1070: Could not resolve ISOToMonth using imports: [,
>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>>
>> Is there some thing more I need to do like import or some thing like that.
>>
>> Please suggest.
>>
>> Thanks & regards
>> Yogesh Kumar
>>
>> From: russell.jurney@gmail.com
>>
>> Date: Sat, 29 Sep 2012 16:15:18 -0700
>>
>> Subject: Re: how to perform GROUP BY in PIG for this case:
>>
>> To: user@pig.apache.org
>>
>>
>> answer = foreach (group data by ISOToMonth(Date)) generate group as
>>
>> month, MAX(data.rate) as max_rate;
>>
>>
>> Note, you will need your date in ISO8601 format, and you can use
>>
>> CustomFormatToISO to convert it if it's is a string, or UnixToISO if
>>
>> your date is a long.
>>
>>
>> See:
>>
>>
>> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
>>
>> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
>>
>> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
>>
>>
>>
>>
>> Russell Jurney http://datasyndrome.com
>>
>>
>> On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
>>
>>
>>
>>
>> Hi all,
>>
>>
>>
>>
>> I have this data, having fields (Date, symbol, rate)
>>
>>
>>
>>
>> and I want it to be group by Months, and to find out the maximum rate value
>> for each month.
>>
>>
>>
>>
>> like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
>>
>>
>>
>>
>>
>>
>> (2009-08-21,CLI,33.38)
>>
>>
>> (2009-08-24,CLI,33.03)
>>
>>
>> (2009-08-25,CLI,33.16)
>>
>>
>> (2009-08-26,CLI,32.78)
>>
>>
>> (2009-08-27,CLI,32.79)
>>
>>
>> (2009-08-28,CLI,33.37)
>>
>>
>> (2009-08-31,CLI,32.51)
>>
>>
>> (2009-09-11,CLI,34.08)
>>
>>
>> (2009-09-14,CLI,35.19)
>>
>>
>> (2009-09-15,CLI,35.82)
>>
>>
>> (2009-09-16,CLI,36.58)
>>
>>
>> (2009-09-24,CLI,33.98)
>>
>>
>> (2009-09-25,CLI,32.44)
>>
>>
>> (2009-09-28,CLI,33.34)
>>
>>
>> (2009-09-29,CLI,33.6)
>>
>>
>> (2009-09-30,CLI,33.24)
>>
>>
>> (2009-10-01,CLI,31.98)
>>
>>
>> (2009-10-02,CLI,31.21)
>>
>>
>> (2009-10-05,CLI,31.31)
>>
>>
>> (2009-10-21,CLI,32.86)
>>
>>
>> (2009-10-26,CLI,33.15)
>>
>>
>> (2009-10-27,CLI,32.71)
>>
>>
>> (2009-10-28,CLI,32.03)
>>
>>
>> (2009-10-29,CLI,32.05)
>>
>>
>> (2009-10-30,CLI,31.88)
>>
>>
>> (2009-11-02,CLI,31.88)
>>
>>
>> (2009-11-03,CLI,31.16)
>>
>>
>> (2009-11-04,CLI,31.47)
>>
>>
>> (2009-11-09,CLI,31.59)
>>
>>
>> (2009-11-25,CLI,30.58)
>>
>>
>> (2009-11-27,CLI,30.19)
>>
>>
>> (2009-11-30,CLI,30.86)
>>
>>
>> (2009-12-01,CLI,31.74)
>>
>>
>> (2009-12-02,CLI,32.62)
>>
>>
>> (2009-12-03,CLI,33.43)
>>
>>
>> (2009-12-04,CLI,34.12)
>>
>>
>> (2009-12-07,CLI,33.77)
>>
>>
>> (2009-12-08,CLI,33.8)
>>
>>
>> (2009-12-09,CLI,33.71)
>>
>>
>>
>>
>> Please help and suggest .
>>
>>
>>
>>
>> Thanks & Regards
>>
>>
>> Yogesh Kumar
>
RE: how to perform GROUP BY in PIG for this case:
Posted by yogesh dhari <yo...@live.com>.
Hi russell,
I am using Pig-0.10.0 version and I checked the directory /opt/Pig-0.10.0/contrib/piggybank/java/
there is no any jar files. :-(
grunt> register /opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/opt/pig-0.10.0/contrib/piggybank/java/piggybank.jar' does not exist.
Details at logfile: /opt/pig-0.10.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/convert/pig_1348974384533.log
similarly
there is no path /opt/build/ivy/lib/Pig/
instead /opt/pig-0.10.0/ivy is there. but it has no /lib/Pig/
Please suggest & help
Thanks & regards
Yogesh Kumar
> From: russell.jurney@gmail.com
> Date: Sat, 29 Sep 2012 19:21:17 -0700
> Subject: Re: how to perform GROUP BY in PIG for this case:
> To: user@pig.apache.org
>
> My bad - you will need to register the Piggybank and jodatime jars. Replace
> /me/pig with your pig install path.
>
> register /me/pig/contrib/piggybank/java/piggybank.jar
> register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar
>
> define CustomFormatToISO
> org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
>
> define ISOToMonth
> org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
>
>
> That should take care of the error.
>
> This example may help:
> https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig
>
> Russell Jurney http://datasyndrome.com
>
> On Sep 29, 2012, at 4:33 PM, yogesh dhari <yo...@live.com> wrote:
>
>
> Thanks Russell,
>
> I am new to Pig. I have tried this command.
> and got this exception.
>
> 2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1070: Could not resolve ISOToMonth using imports: [,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
> Is there some thing more I need to do like import or some thing like that.
>
> Please suggest.
>
> Thanks & regards
> Yogesh Kumar
>
> From: russell.jurney@gmail.com
>
> Date: Sat, 29 Sep 2012 16:15:18 -0700
>
> Subject: Re: how to perform GROUP BY in PIG for this case:
>
> To: user@pig.apache.org
>
>
> answer = foreach (group data by ISOToMonth(Date)) generate group as
>
> month, MAX(data.rate) as max_rate;
>
>
> Note, you will need your date in ISO8601 format, and you can use
>
> CustomFormatToISO to convert it if it's is a string, or UnixToISO if
>
> your date is a long.
>
>
> See:
>
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
>
>
>
>
> Russell Jurney http://datasyndrome.com
>
>
> On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
>
>
>
>
> Hi all,
>
>
>
>
> I have this data, having fields (Date, symbol, rate)
>
>
>
>
> and I want it to be group by Months, and to find out the maximum rate value
> for each month.
>
>
>
>
> like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
>
>
>
>
>
>
> (2009-08-21,CLI,33.38)
>
>
> (2009-08-24,CLI,33.03)
>
>
> (2009-08-25,CLI,33.16)
>
>
> (2009-08-26,CLI,32.78)
>
>
> (2009-08-27,CLI,32.79)
>
>
> (2009-08-28,CLI,33.37)
>
>
> (2009-08-31,CLI,32.51)
>
>
> (2009-09-11,CLI,34.08)
>
>
> (2009-09-14,CLI,35.19)
>
>
> (2009-09-15,CLI,35.82)
>
>
> (2009-09-16,CLI,36.58)
>
>
> (2009-09-24,CLI,33.98)
>
>
> (2009-09-25,CLI,32.44)
>
>
> (2009-09-28,CLI,33.34)
>
>
> (2009-09-29,CLI,33.6)
>
>
> (2009-09-30,CLI,33.24)
>
>
> (2009-10-01,CLI,31.98)
>
>
> (2009-10-02,CLI,31.21)
>
>
> (2009-10-05,CLI,31.31)
>
>
> (2009-10-21,CLI,32.86)
>
>
> (2009-10-26,CLI,33.15)
>
>
> (2009-10-27,CLI,32.71)
>
>
> (2009-10-28,CLI,32.03)
>
>
> (2009-10-29,CLI,32.05)
>
>
> (2009-10-30,CLI,31.88)
>
>
> (2009-11-02,CLI,31.88)
>
>
> (2009-11-03,CLI,31.16)
>
>
> (2009-11-04,CLI,31.47)
>
>
> (2009-11-09,CLI,31.59)
>
>
> (2009-11-25,CLI,30.58)
>
>
> (2009-11-27,CLI,30.19)
>
>
> (2009-11-30,CLI,30.86)
>
>
> (2009-12-01,CLI,31.74)
>
>
> (2009-12-02,CLI,32.62)
>
>
> (2009-12-03,CLI,33.43)
>
>
> (2009-12-04,CLI,34.12)
>
>
> (2009-12-07,CLI,33.77)
>
>
> (2009-12-08,CLI,33.8)
>
>
> (2009-12-09,CLI,33.71)
>
>
>
>
> Please help and suggest .
>
>
>
>
> Thanks & Regards
>
>
> Yogesh Kumar
Re: how to perform GROUP BY in PIG for this case:
Posted by Russell Jurney <ru...@gmail.com>.
My bad - you will need to register the Piggybank and jodatime jars. Replace
/me/pig with your pig install path.
register /me/pig/contrib/piggybank/java/piggybank.jar
register /me/pig/build/ivy/lib/Pig/joda-time-1.6.jar
define CustomFormatToISO
org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
define ISOToMonth
org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToMonth()
That should take care of the error.
This example may help:
https://github.com/rjurney/Collecting-Data/blob/master/src/pig/rfc1123_to_iso8601.pig
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 4:33 PM, yogesh dhari <yo...@live.com> wrote:
Thanks Russell,
I am new to Pig. I have tried this command.
and got this exception.
2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1070: Could not resolve ISOToMonth using imports: [,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Is there some thing more I need to do like import or some thing like that.
Please suggest.
Thanks & regards
Yogesh Kumar
From: russell.jurney@gmail.com
Date: Sat, 29 Sep 2012 16:15:18 -0700
Subject: Re: how to perform GROUP BY in PIG for this case:
To: user@pig.apache.org
answer = foreach (group data by ISOToMonth(Date)) generate group as
month, MAX(data.rate) as max_rate;
Note, you will need your date in ISO8601 format, and you can use
CustomFormatToISO to convert it if it's is a string, or UnixToISO if
your date is a long.
See:
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
Hi all,
I have this data, having fields (Date, symbol, rate)
and I want it to be group by Months, and to find out the maximum rate value
for each month.
like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
(2009-08-21,CLI,33.38)
(2009-08-24,CLI,33.03)
(2009-08-25,CLI,33.16)
(2009-08-26,CLI,32.78)
(2009-08-27,CLI,32.79)
(2009-08-28,CLI,33.37)
(2009-08-31,CLI,32.51)
(2009-09-11,CLI,34.08)
(2009-09-14,CLI,35.19)
(2009-09-15,CLI,35.82)
(2009-09-16,CLI,36.58)
(2009-09-24,CLI,33.98)
(2009-09-25,CLI,32.44)
(2009-09-28,CLI,33.34)
(2009-09-29,CLI,33.6)
(2009-09-30,CLI,33.24)
(2009-10-01,CLI,31.98)
(2009-10-02,CLI,31.21)
(2009-10-05,CLI,31.31)
(2009-10-21,CLI,32.86)
(2009-10-26,CLI,33.15)
(2009-10-27,CLI,32.71)
(2009-10-28,CLI,32.03)
(2009-10-29,CLI,32.05)
(2009-10-30,CLI,31.88)
(2009-11-02,CLI,31.88)
(2009-11-03,CLI,31.16)
(2009-11-04,CLI,31.47)
(2009-11-09,CLI,31.59)
(2009-11-25,CLI,30.58)
(2009-11-27,CLI,30.19)
(2009-11-30,CLI,30.86)
(2009-12-01,CLI,31.74)
(2009-12-02,CLI,32.62)
(2009-12-03,CLI,33.43)
(2009-12-04,CLI,34.12)
(2009-12-07,CLI,33.77)
(2009-12-08,CLI,33.8)
(2009-12-09,CLI,33.71)
Please help and suggest .
Thanks & Regards
Yogesh Kumar
RE: how to perform GROUP BY in PIG for this case:
Posted by yogesh dhari <yo...@live.com>.
Thanks Russell,
I am new to Pig. I have tried this command.
and got this exception.
2012-09-30 04:53:22,995 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve ISOToMonth using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Is there some thing more I need to do like import or some thing like that.
Please suggest.
Thanks & regards
Yogesh Kumar
> From: russell.jurney@gmail.com
> Date: Sat, 29 Sep 2012 16:15:18 -0700
> Subject: Re: how to perform GROUP BY in PIG for this case:
> To: user@pig.apache.org
>
> answer = foreach (group data by ISOToMonth(Date)) generate group as
> month, MAX(data.rate) as max_rate;
>
> Note, you will need your date in ISO8601 format, and you can use
> CustomFormatToISO to convert it if it's is a string, or UnixToISO if
> your date is a long.
>
> See:
>
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
>
>
>
> Russell Jurney http://datasyndrome.com
>
> On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
>
> >
> >
> > Hi all,
> >
> >
> >
> > I have this data, having fields (Date, symbol, rate)
> >
> >
> >
> > and I want it to be group by Months, and to find out the maximum rate value for each month.
> >
> >
> >
> > like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
> >
> >
> >
> >
> >
> > (2009-08-21,CLI,33.38)
> >
> > (2009-08-24,CLI,33.03)
> >
> > (2009-08-25,CLI,33.16)
> >
> > (2009-08-26,CLI,32.78)
> >
> > (2009-08-27,CLI,32.79)
> >
> > (2009-08-28,CLI,33.37)
> >
> > (2009-08-31,CLI,32.51)
> >
> > (2009-09-11,CLI,34.08)
> >
> > (2009-09-14,CLI,35.19)
> >
> > (2009-09-15,CLI,35.82)
> >
> > (2009-09-16,CLI,36.58)
> >
> > (2009-09-24,CLI,33.98)
> >
> > (2009-09-25,CLI,32.44)
> >
> > (2009-09-28,CLI,33.34)
> >
> > (2009-09-29,CLI,33.6)
> >
> > (2009-09-30,CLI,33.24)
> >
> > (2009-10-01,CLI,31.98)
> >
> > (2009-10-02,CLI,31.21)
> >
> > (2009-10-05,CLI,31.31)
> >
> > (2009-10-21,CLI,32.86)
> >
> > (2009-10-26,CLI,33.15)
> >
> > (2009-10-27,CLI,32.71)
> >
> > (2009-10-28,CLI,32.03)
> >
> > (2009-10-29,CLI,32.05)
> >
> > (2009-10-30,CLI,31.88)
> >
> > (2009-11-02,CLI,31.88)
> >
> > (2009-11-03,CLI,31.16)
> >
> > (2009-11-04,CLI,31.47)
> >
> > (2009-11-09,CLI,31.59)
> >
> > (2009-11-25,CLI,30.58)
> >
> > (2009-11-27,CLI,30.19)
> >
> > (2009-11-30,CLI,30.86)
> >
> > (2009-12-01,CLI,31.74)
> >
> > (2009-12-02,CLI,32.62)
> >
> > (2009-12-03,CLI,33.43)
> >
> > (2009-12-04,CLI,34.12)
> >
> > (2009-12-07,CLI,33.77)
> >
> > (2009-12-08,CLI,33.8)
> >
> > (2009-12-09,CLI,33.71)
> >
> >
> >
> > Please help and suggest .
> >
> >
> >
> > Thanks & Regards
> >
> > Yogesh Kumar
> >
> >
> >
> >
> >
Re: how to perform GROUP BY in PIG for this case:
Posted by Russell Jurney <ru...@gmail.com>.
answer = foreach (group data by ISOToMonth(Date)) generate group as
month, MAX(data.rate) as max_rate;
Note, you will need your date in ISO8601 format, and you can use
CustomFormatToISO to convert it if it's is a string, or UnixToISO if
your date is a long.
See:
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/CustomFormatToISO.html
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/UnixToISO.html
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/ISOToMonth.html
Russell Jurney http://datasyndrome.com
On Sep 29, 2012, at 3:02 PM, yogesh dhari <yo...@live.com> wrote:
>
>
> Hi all,
>
>
>
> I have this data, having fields (Date, symbol, rate)
>
>
>
> and I want it to be group by Months, and to find out the maximum rate value for each month.
>
>
>
> like: for month (08, 36.3), (09, 36.4), (10, 36.8), (11, 37.5) ..
>
>
>
>
>
> (2009-08-21,CLI,33.38)
>
> (2009-08-24,CLI,33.03)
>
> (2009-08-25,CLI,33.16)
>
> (2009-08-26,CLI,32.78)
>
> (2009-08-27,CLI,32.79)
>
> (2009-08-28,CLI,33.37)
>
> (2009-08-31,CLI,32.51)
>
> (2009-09-11,CLI,34.08)
>
> (2009-09-14,CLI,35.19)
>
> (2009-09-15,CLI,35.82)
>
> (2009-09-16,CLI,36.58)
>
> (2009-09-24,CLI,33.98)
>
> (2009-09-25,CLI,32.44)
>
> (2009-09-28,CLI,33.34)
>
> (2009-09-29,CLI,33.6)
>
> (2009-09-30,CLI,33.24)
>
> (2009-10-01,CLI,31.98)
>
> (2009-10-02,CLI,31.21)
>
> (2009-10-05,CLI,31.31)
>
> (2009-10-21,CLI,32.86)
>
> (2009-10-26,CLI,33.15)
>
> (2009-10-27,CLI,32.71)
>
> (2009-10-28,CLI,32.03)
>
> (2009-10-29,CLI,32.05)
>
> (2009-10-30,CLI,31.88)
>
> (2009-11-02,CLI,31.88)
>
> (2009-11-03,CLI,31.16)
>
> (2009-11-04,CLI,31.47)
>
> (2009-11-09,CLI,31.59)
>
> (2009-11-25,CLI,30.58)
>
> (2009-11-27,CLI,30.19)
>
> (2009-11-30,CLI,30.86)
>
> (2009-12-01,CLI,31.74)
>
> (2009-12-02,CLI,32.62)
>
> (2009-12-03,CLI,33.43)
>
> (2009-12-04,CLI,34.12)
>
> (2009-12-07,CLI,33.77)
>
> (2009-12-08,CLI,33.8)
>
> (2009-12-09,CLI,33.71)
>
>
>
> Please help and suggest .
>
>
>
> Thanks & Regards
>
> Yogesh Kumar
>
>
>
>
>