You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by yogesh dhari <yo...@live.com> on 2012/10/14 16:54:20 UTC
NEED HELP in Hive Query
Hi all,
I have this file. I want this operation to perform in HIVE & PIG
NAME DATE URL HITCOUNT
timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://www.timesascent.in/ 16
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
timesascent.in 2008-08-27 http://timesascent.in/ 14
timesascent.in 2008-08-27 http://timesascent.in/ 22
I want to add all HITCOUNT for the same NAME, DATE & URL
like
timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
Please suggest me is there any method to perform this query.
Thanks & Regards
Yogesh Kumar
Re: NEED HELP in Hive Query
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
B = group A by ( name, date, url);
-- B now has 2 fields: "group" which is a tuple of (name, date, url)
and "A" which is a collection of tuples from A with the same
name-date-url
-- try "illustrate B" or "describe B" to see what that looks like
counts = foreach B generate flatten(group) as (name, date, url),
COUNT_STAR(A) as num_entries;
Dmitriy
On Sun, Oct 14, 2012 at 10:57 AM, yogesh dhari <yo...@live.com> wrote:
>
> Thanks Chyikwei :-)
>
> I got it now :-), Is there be another method without using flatten(A.name) and so on ?
>
> A = load '/File/000000_0' using PigStorage('\u0001')
>
>
> as (name, date, url, hit:INT);
>
>
>
>
>
> B = group A by ( name, date, url);
>
>
>
>
>
> C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url), SUM(A.hit) ;
>
>
>
>
>
> D = distinct C;
>
>
>
>
>
> Dump D;
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
>> Date: Sun, 14 Oct 2012 13:24:27 -0400
>> Subject: Re: NEED HELP in Hive Query
>> From: chyikwei.yau@gmail.com
>> To: user@pig.apache.org
>>
>> Hi yogesh,
>>
>> Thes result of "group by" should look like:
>> {group: (group keys), { (instance1) , (instance2) } }
>>
>> For example:
>>
>> If A looks like:
>> A: {name: chararray,age: int,gpa: float}
>>
>> And after "B = GROUP A BY age;"
>>
>> B will become:
>> B: {group: int, A: {name: chararray,age: int,gpa: float}}
>>
>> Then you can use
>> FOREACH B Generate.....
>> To get the result you want.
>>
>> If my explaination is not clear, just take a look at
>> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>>
>> Hope this help.
>>
>> Best,
>> Chyi-Kwei
>>
>> On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
>> >
>> > Hi CHyi-kwei,
>> >
>> > Thanks for help, I think I wasn't able to clarify my question
>> >
>> > The query you wrote
>> >
>> > It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
>> >
>> > I want result like this
>> >
>> > like : timesascent.in, 2008-08-27, http://timesascent.in/ (/*addition of
>> > all hitcount under same name, date, url (37+17+17+27+....)*/ 98 )
>> > timesascent.in, 2008-08-27, http://timesascent.in/section/2/Interviews (/*addition of
>> > all hitcount under same name, date, url (15+14)*/ 29)
>> > .
>> > .
>> > .
>> >
>> > From this file below
>> >
>> > NAME DATE URL HITCOUNT
>> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
>> > timesascent.in 2008-08-27
>> > http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
>> > 20
>> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
>> > timesascent.in 2008-08-27
>> > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
>> > 20
>> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
>> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
>> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
>> > timesascent.in 2008-08-27 http://timesascent.in/ 14
>> > timesascent.in 2008-08-27 http://timesascent.in/ 22
>> >
>> >
>> > Please help and suggest how to write query for this in HIVE and PIG
>> >
>> > Thanks & Regards
>> > Yogesh Kumar Dhari
>> >
>> >> Date: Sun, 14 Oct 2012 11:31:00 -0400
>> >> Subject: Re: NEED HELP in Hive Query
>> >> From: chyikwei.yau@gmail.com
>> >> To: user@pig.apache.org
>> >>
>> >> Hi,
>> >>
>> >> In pig, you can try
>> >>
>> >> GROUP data BY (NAME, DATE , URL)
>> >>
>> >> The detail is here:
>> >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>> >>
>> >> Best,
>> >> CHyi-kwei
>> >>
>> >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > I have this file. I want this operation to perform in HIVE & PIG
>> >> >
>> >> > NAME DATE URL HITCOUNT
>> >> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
>> >> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> >> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
>> >> > timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> >> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> >> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
>> >> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 14
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ 22
>> >> >
>> >> >
>> >> > I want to add all HITCOUNT for the same NAME, DATE & URL
>> >> >
>> >> > like
>> >> >
>> >> > timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
>> >> >
>> >> > Please suggest me is there any method to perform this query.
>> >> >
>> >> >
>> >> > Thanks & Regards
>> >> > Yogesh Kumar
>> >> >
>> >> >
>> >> >
>> >> >
>> >
>
RE: NEED HELP in Hive Query
Posted by chyi-kwei yau <ch...@gmail.com>.
Hi,
Try
Foreach B generate group, SUM(A.hit)
Should have similar result.
Best,
Chyi-Kwei
On Oct 14, 2012 1:58 PM, "yogesh dhari" <yo...@live.com> wrote:
>
> Thanks Chyikwei :-)
>
> I got it now :-), Is there be another method without using flatten(A.name)
> and so on ?
>
> A = load '/File/000000_0' using PigStorage('\u0001')
>
>
> as (name, date, url, hit:INT);
>
>
>
>
>
> B = group A by ( name, date, url);
>
>
>
>
>
> C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url),
> SUM(A.hit) ;
>
>
>
>
>
> D = distinct C;
>
>
>
>
>
> Dump D;
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
> > Date: Sun, 14 Oct 2012 13:24:27 -0400
> > Subject: Re: NEED HELP in Hive Query
> > From: chyikwei.yau@gmail.com
> > To: user@pig.apache.org
> >
> > Hi yogesh,
> >
> > Thes result of "group by" should look like:
> > {group: (group keys), { (instance1) , (instance2) } }
> >
> > For example:
> >
> > If A looks like:
> > A: {name: chararray,age: int,gpa: float}
> >
> > And after "B = GROUP A BY age;"
> >
> > B will become:
> > B: {group: int, A: {name: chararray,age: int,gpa: float}}
> >
> > Then you can use
> > FOREACH B Generate.....
> > To get the result you want.
> >
> > If my explaination is not clear, just take a look at
> > http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> >
> > Hope this help.
> >
> > Best,
> > Chyi-Kwei
> >
> > On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com>
> wrote:
> > >
> > > Hi CHyi-kwei,
> > >
> > > Thanks for help, I think I wasn't able to clarify my question
> > >
> > > The query you wrote
> > >
> > > It will count the number of occurrence of same NAME, DATE and URL but
> won't add all hitcount under same name, date, url.
> > >
> > > I want result like this
> > >
> > > like : timesascent.in, 2008-08-27, http://timesascent.in/
> (/*addition of
> > > all hitcount under same name, date, url (37+17+17+27+....)*/ 98 )
> > > timesascent.in, 2008-08-27,
> http://timesascent.in/section/2/Interviews (/*addition of
> > > all hitcount under same name, date, url (15+14)*/ 29)
> > > .
> > > .
> > > .
> > >
> > > From this file below
> > >
> > > NAME DATE
> URL
> HITCOUNT
> > > timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=tparchives 15
> > > timesascent.in 2008-08-27
> > >
> http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
> > > 20
> > > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > > timesascent.in 2008-08-27
> http://timesascent.in/section/39/Job%20Wise 14
> > > timesascent.in 2008-08-27
> > >
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> > > 20
> > > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > > timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 15
> > > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> > > timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 14
> > > timesascent.in 2008-08-27 http://timesascent.in/ 14
> > > timesascent.in 2008-08-27 http://timesascent.in/ 22
> > >
> > >
> > > Please help and suggest how to write query for this in HIVE and PIG
> > >
> > > Thanks & Regards
> > > Yogesh Kumar Dhari
> > >
> > >> Date: Sun, 14 Oct 2012 11:31:00 -0400
> > >> Subject: Re: NEED HELP in Hive Query
> > >> From: chyikwei.yau@gmail.com
> > >> To: user@pig.apache.org
> > >>
> > >> Hi,
> > >>
> > >> In pig, you can try
> > >>
> > >> GROUP data BY (NAME, DATE , URL)
> > >>
> > >> The detail is here:
> > >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> > >>
> > >> Best,
> > >> CHyi-kwei
> > >>
> > >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com>
> wrote:
> > >> >
> > >> > Hi all,
> > >> >
> > >> > I have this file. I want this operation to perform in HIVE & PIG
> > >> >
> > >> > NAME DATE URL
> HITCOUNT
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=tparchives 15
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/section/39/Job%20Wise 14
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 15
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > >> > timesascent.in 2008-08-27 http://www.timesascent.in/
> 16
> > >> > timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 14
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 14
> > >> > timesascent.in 2008-08-27 http://timesascent.in/ 22
> > >> >
> > >> >
> > >> > I want to add all HITCOUNT for the same NAME, DATE & URL
> > >> >
> > >> > like
> > >> >
> > >> > timesascent.in 2008-08-27 http://timesascent.in/
> (addition of all hitcount under same name, date, url (37+17+17+27+....))
> > >> >
> > >> > Please suggest me is there any method to perform this query.
> > >> >
> > >> >
> > >> > Thanks & Regards
> > >> > Yogesh Kumar
> > >> >
> > >> >
> > >> >
> > >> >
> > >
>
RE: NEED HELP in Hive Query
Posted by yogesh dhari <yo...@live.com>.
Thanks Chyikwei :-)
I got it now :-), Is there be another method without using flatten(A.name) and so on ?
A = load '/File/000000_0' using PigStorage('\u0001')
as (name, date, url, hit:INT);
B = group A by ( name, date, url);
C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url), SUM(A.hit) ;
D = distinct C;
Dump D;
Thanks & Regards
Yogesh Kumar Dhari
> Date: Sun, 14 Oct 2012 13:24:27 -0400
> Subject: Re: NEED HELP in Hive Query
> From: chyikwei.yau@gmail.com
> To: user@pig.apache.org
>
> Hi yogesh,
>
> Thes result of "group by" should look like:
> {group: (group keys), { (instance1) , (instance2) } }
>
> For example:
>
> If A looks like:
> A: {name: chararray,age: int,gpa: float}
>
> And after "B = GROUP A BY age;"
>
> B will become:
> B: {group: int, A: {name: chararray,age: int,gpa: float}}
>
> Then you can use
> FOREACH B Generate.....
> To get the result you want.
>
> If my explaination is not clear, just take a look at
> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>
> Hope this help.
>
> Best,
> Chyi-Kwei
>
> On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
> >
> > Hi CHyi-kwei,
> >
> > Thanks for help, I think I wasn't able to clarify my question
> >
> > The query you wrote
> >
> > It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
> >
> > I want result like this
> >
> > like : timesascent.in, 2008-08-27, http://timesascent.in/ (/*addition of
> > all hitcount under same name, date, url (37+17+17+27+....)*/ 98 )
> > timesascent.in, 2008-08-27, http://timesascent.in/section/2/Interviews (/*addition of
> > all hitcount under same name, date, url (15+14)*/ 29)
> > .
> > .
> > .
> >
> > From this file below
> >
> > NAME DATE URL HITCOUNT
> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
> > timesascent.in 2008-08-27
> > http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
> > 20
> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
> > timesascent.in 2008-08-27
> > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> > 20
> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
> > timesascent.in 2008-08-27 http://timesascent.in/ 14
> > timesascent.in 2008-08-27 http://timesascent.in/ 22
> >
> >
> > Please help and suggest how to write query for this in HIVE and PIG
> >
> > Thanks & Regards
> > Yogesh Kumar Dhari
> >
> >> Date: Sun, 14 Oct 2012 11:31:00 -0400
> >> Subject: Re: NEED HELP in Hive Query
> >> From: chyikwei.yau@gmail.com
> >> To: user@pig.apache.org
> >>
> >> Hi,
> >>
> >> In pig, you can try
> >>
> >> GROUP data BY (NAME, DATE , URL)
> >>
> >> The detail is here:
> >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> >>
> >> Best,
> >> CHyi-kwei
> >>
> >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > I have this file. I want this operation to perform in HIVE & PIG
> >> >
> >> > NAME DATE URL HITCOUNT
> >> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
> >> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> >> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
> >> > timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> >> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> >> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> >> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 14
> >> > timesascent.in 2008-08-27 http://timesascent.in/ 22
> >> >
> >> >
> >> > I want to add all HITCOUNT for the same NAME, DATE & URL
> >> >
> >> > like
> >> >
> >> > timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
> >> >
> >> > Please suggest me is there any method to perform this query.
> >> >
> >> >
> >> > Thanks & Regards
> >> > Yogesh Kumar
> >> >
> >> >
> >> >
> >> >
> >
Re: NEED HELP in Hive Query
Posted by chyi-kwei yau <ch...@gmail.com>.
Hi yogesh,
Thes result of "group by" should look like:
{group: (group keys), { (instance1) , (instance2) } }
For example:
If A looks like:
A: {name: chararray,age: int,gpa: float}
And after "B = GROUP A BY age;"
B will become:
B: {group: int, A: {name: chararray,age: int,gpa: float}}
Then you can use
FOREACH B Generate.....
To get the result you want.
If my explaination is not clear, just take a look at
http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
Hope this help.
Best,
Chyi-Kwei
On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
>
> Hi CHyi-kwei,
>
> Thanks for help, I think I wasn't able to clarify my question
>
> The query you wrote
>
> It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
>
> I want result like this
>
> like : timesascent.in, 2008-08-27, http://timesascent.in/ (/*addition of
> all hitcount under same name, date, url (37+17+17+27+....)*/ 98 )
> timesascent.in, 2008-08-27, http://timesascent.in/section/2/Interviews (/*addition of
> all hitcount under same name, date, url (15+14)*/ 29)
> .
> .
> .
>
> From this file below
>
> NAME DATE URL HITCOUNT
> timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
> timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
> 20
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
> timesascent.in 2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> 20
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
> timesascent.in 2008-08-27 http://timesascent.in/ 14
> timesascent.in 2008-08-27 http://timesascent.in/ 22
>
>
> Please help and suggest how to write query for this in HIVE and PIG
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
>> Date: Sun, 14 Oct 2012 11:31:00 -0400
>> Subject: Re: NEED HELP in Hive Query
>> From: chyikwei.yau@gmail.com
>> To: user@pig.apache.org
>>
>> Hi,
>>
>> In pig, you can try
>>
>> GROUP data BY (NAME, DATE , URL)
>>
>> The detail is here:
>> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>>
>> Best,
>> CHyi-kwei
>>
>> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>> >
>> > Hi all,
>> >
>> > I have this file. I want this operation to perform in HIVE & PIG
>> >
>> > NAME DATE URL HITCOUNT
>> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
>> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
>> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
>> > timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
>> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
>> > timesascent.in 2008-08-27 http://timesascent.in/ 17
>> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> > timesascent.in 2008-08-27 http://timesascent.in/ 37
>> > timesascent.in 2008-08-27 http://timesascent.in/ 27
>> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
>> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
>> > timesascent.in 2008-08-27 http://timesascent.in/ 14
>> > timesascent.in 2008-08-27 http://timesascent.in/ 22
>> >
>> >
>> > I want to add all HITCOUNT for the same NAME, DATE & URL
>> >
>> > like
>> >
>> > timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
>> >
>> > Please suggest me is there any method to perform this query.
>> >
>> >
>> > Thanks & Regards
>> > Yogesh Kumar
>> >
>> >
>> >
>> >
>
RE: NEED HELP in Hive Query
Posted by yogesh dhari <yo...@live.com>.
Hi CHyi-kwei,
Thanks for help, I think I wasn't able to clarify my question
The query you wrote
It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
I want result like this
like : timesascent.in, 2008-08-27, http://timesascent.in/ (/*addition of
all hitcount under same name, date, url (37+17+17+27+....)*/ 98 )
timesascent.in, 2008-08-27, http://timesascent.in/section/2/Interviews (/*addition of
all hitcount under same name, date, url (15+14)*/ 29)
.
.
.
>From this file below
NAME DATE URL HITCOUNT
timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
timesascent.in 2008-08-27
http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
20
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
timesascent.in 2008-08-27
http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
20
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://www.timesascent.in/ 16
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
timesascent.in 2008-08-27 http://timesascent.in/ 14
timesascent.in 2008-08-27 http://timesascent.in/ 22
Please help and suggest how to write query for this in HIVE and PIG
Thanks & Regards
Yogesh Kumar Dhari
> Date: Sun, 14 Oct 2012 11:31:00 -0400
> Subject: Re: NEED HELP in Hive Query
> From: chyikwei.yau@gmail.com
> To: user@pig.apache.org
>
> Hi,
>
> In pig, you can try
>
> GROUP data BY (NAME, DATE , URL)
>
> The detail is here:
> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>
> Best,
> CHyi-kwei
>
> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
> >
> > Hi all,
> >
> > I have this file. I want this operation to perform in HIVE & PIG
> >
> > NAME DATE URL HITCOUNT
> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
> > timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
> > timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
> > timesascent.in 2008-08-27 http://timesascent.in/ 17
> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > timesascent.in 2008-08-27 http://timesascent.in/ 37
> > timesascent.in 2008-08-27 http://timesascent.in/ 27
> > timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> > timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
> > timesascent.in 2008-08-27 http://timesascent.in/ 14
> > timesascent.in 2008-08-27 http://timesascent.in/ 22
> >
> >
> > I want to add all HITCOUNT for the same NAME, DATE & URL
> >
> > like
> >
> > timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
> >
> > Please suggest me is there any method to perform this query.
> >
> >
> > Thanks & Regards
> > Yogesh Kumar
> >
> >
> >
> >
Re: NEED HELP in Hive Query
Posted by chyi-kwei yau <ch...@gmail.com>.
Hi,
In pig, you can try
GROUP data BY (NAME, DATE , URL)
The detail is here:
http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
Best,
CHyi-kwei
On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>
> Hi all,
>
> I have this file. I want this operation to perform in HIVE & PIG
>
> NAME DATE URL HITCOUNT
> timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
> timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
> timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
> timesascent.in 2008-08-27 http://timesascent.in/ 14
> timesascent.in 2008-08-27 http://timesascent.in/ 22
>
>
> I want to add all HITCOUNT for the same NAME, DATE & URL
>
> like
>
> timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
>
> Please suggest me is there any method to perform this query.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
>
>
RE: NEED HELP in Hive Query
Posted by yogesh dhari <yo...@live.com>.
Thanks John :-),
I got it now in Pig also :-).
A = load '/File/000000_0' using PigStorage('\u0001')
as as (name, date, url, hit:INT);
B = group A by (id, name, date, url);
C = foreach B generate flatten(A.id), flatten(A.name), flatten(A.url), SUM(A.hit) ;
D = distinct C;
Dump D;
Thanks & Regards
Yogesh Kumar Dhari
From: john@omernik.com
Date: Sun, 14 Oct 2012 12:29:23 -0500
Subject: Re: NEED HELP in Hive Query
To: user@hive.apache.org
select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by NAME, DATE, URL
That's the HIVE answer. Not sure the PIG answer.
On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari <yo...@live.com> wrote:
Hi all,
I have this file. I want this operation to perform in HIVE & PIG
NAME DATE URL HITCOUNT
timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=tparchives 15
timesascent.in 2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/section/39/Job%20Wise 14
timesascent.in 2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 15
timesascent.in 2008-08-27 http://timesascent.in/ 17
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://timesascent.in/ 37
timesascent.in 2008-08-27 http://timesascent.in/ 27
timesascent.in 2008-08-27 http://www.timesascent.in/ 16
timesascent.in 2008-08-27 http://timesascent.in/section/2/Interviews 14
timesascent.in 2008-08-27 http://timesascent.in/ 14
timesascent.in 2008-08-27 http://timesascent.in/ 22
I want to add all HITCOUNT for the same NAME, DATE & URL
like
timesascent.in 2008-08-27 http://timesascent.in/ (addition of all hitcount under same name, date, url (37+17+17+27+....))
Please suggest me is there any method to perform this query.
Thanks & Regards
Yogesh Kumar
Re: NEED HELP in Hive Query
Posted by John Omernik <jo...@omernik.com>.
select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by
NAME, DATE, URL
That's the HIVE answer. Not sure the PIG answer.
On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari <yo...@live.com> wrote:
> Hi all,
>
> I have this file. I want this operation to perform in *HIVE & PIG*
>
> NAME DATE
> URL
> HITCOUNT
> timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=tparchives 15
> timesascent.in 2008-08-27
> http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
> 20
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27
> http://timesascent.in/section/39/Job%20Wise 14
> timesascent.in 2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> 20
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 15
> timesascent.in 2008-08-27 http://timesascent.in/ 17
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://timesascent.in/ 37
> timesascent.in 2008-08-27 http://timesascent.in/ 27
> timesascent.in 2008-08-27 http://www.timesascent.in/ 16
> timesascent.in 2008-08-27
> http://timesascent.in/section/2/Interviews 14
> timesascent.in 2008-08-27 http://timesascent.in/ 14
> timesascent.in 2008-08-27 http://timesascent.in/ 22
>
>
> I want to *add all HITCOUNT for the same NAME, DATE & URL *
>
> like
>
> timesascent.in 2008-08-27 http://timesascent.in/ (addition of
> all hitcount under same name, date, url (37+17+17+27+....))
>
> Please suggest me is there any method to perform this query.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
>
>