You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by yogesh dhari <yo...@live.com> on 2012/10/14 16:54:20 UTC

NEED HELP in Hive Query

Hi all, 

I have this file. I want this operation to perform in HIVE & PIG

      NAME                  DATE               URL                                                                           HITCOUNT
   timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
    timesascent.in    2008-08-27    http://timesascent.in/    37
    timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
    timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
    timesascent.in    2008-08-27    http://timesascent.in/    17
    timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
    timesascent.in    2008-08-27    http://timesascent.in/    17
   timesascent.in    2008-08-27    http://timesascent.in/    27
    timesascent.in    2008-08-27    http://timesascent.in/    37
    timesascent.in    2008-08-27    http://timesascent.in/    27
    timesascent.in    2008-08-27    http://www.timesascent.in/    16
    timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
    timesascent.in    2008-08-27    http://timesascent.in/    14
    timesascent.in    2008-08-27    http://timesascent.in/    22


I want to add all HITCOUNT for the same NAME, DATE & URL  

like 

 timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))

Please suggest me is there any method to perform this query.


Thanks & Regards
Yogesh Kumar



 		 	   		  

Re: NEED HELP in Hive Query

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
 B = group A by ( name, date, url);
-- B now has 2 fields: "group" which is a tuple of (name, date, url)
and "A" which is a collection of tuples from A with the same
name-date-url
-- try "illustrate B" or "describe B" to see what that looks like

counts = foreach B generate flatten(group) as (name, date, url),
COUNT_STAR(A) as num_entries;

Dmitriy

On Sun, Oct 14, 2012 at 10:57 AM, yogesh dhari <yo...@live.com> wrote:
>
> Thanks Chyikwei :-)
>
> I got it now :-), Is there be another method without using flatten(A.name) and so on ?
>
> A = load '/File/000000_0' using PigStorage('\u0001')
>
>
>        as (name, date, url, hit:INT);
>
>
>
>
>
> B = group A by ( name, date, url);
>
>
>
>
>
>  C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url), SUM(A.hit) ;
>
>
>
>
>
> D = distinct C;
>
>
>
>
>
> Dump D;
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
>> Date: Sun, 14 Oct 2012 13:24:27 -0400
>> Subject: Re: NEED HELP in Hive Query
>> From: chyikwei.yau@gmail.com
>> To: user@pig.apache.org
>>
>> Hi yogesh,
>>
>> Thes result of "group by" should look like:
>> {group: (group keys),  { (instance1) , (instance2)  } }
>>
>> For example:
>>
>> If A looks like:
>> A: {name: chararray,age: int,gpa: float}
>>
>> And after  "B = GROUP A BY age;"
>>
>> B will become:
>> B: {group: int, A: {name: chararray,age: int,gpa: float}}
>>
>> Then you can use
>> FOREACH B Generate.....
>> To get the result you want.
>>
>> If my explaination is not clear, just take a look at
>> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>>
>> Hope this help.
>>
>> Best,
>> Chyi-Kwei
>>
>> On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
>> >
>> > Hi CHyi-kwei,
>> >
>> > Thanks for help, I think I wasn't able to clarify my question
>> >
>> > The query you wrote
>> >
>> > It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
>> >
>> > I want result like this
>> >
>> > like :  timesascent.in,     2008-08-27,      http://timesascent.in/      (/*addition of
>> > all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
>> >           timesascent.in,       2008-08-27,       http://timesascent.in/section/2/Interviews    (/*addition of
>> > all hitcount under same name, date, url    (15+14)*/  29)
>> >           .
>> >           .
>> >           .
>> >
>> > From this file below
>> >
>> >       NAME                                 DATE                               URL                                                                  HITCOUNT
>> > timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
>> > timesascent.in    2008-08-27
>> > http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
>> >     20
>> > timesascent.in    2008-08-27    http://timesascent.in/    37
>> > timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
>> > timesascent.in    2008-08-27
>> > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
>> >     20
>> > timesascent.in    2008-08-27    http://timesascent.in/    17
>> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
>> > timesascent.in    2008-08-27    http://timesascent.in/    17
>> > timesascent.in    2008-08-27    http://timesascent.in/    27
>> > timesascent.in    2008-08-27    http://timesascent.in/    37
>> > timesascent.in    2008-08-27    http://timesascent.in/    27
>> > timesascent.in    2008-08-27    http://www.timesascent.in/    16
>> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
>> > timesascent.in    2008-08-27    http://timesascent.in/    14
>> > timesascent.in    2008-08-27    http://timesascent.in/    22
>> >
>> >
>> > Please help and suggest how to write query for this in HIVE and  PIG
>> >
>> > Thanks & Regards
>> > Yogesh Kumar Dhari
>> >
>> >> Date: Sun, 14 Oct 2012 11:31:00 -0400
>> >> Subject: Re: NEED HELP in Hive Query
>> >> From: chyikwei.yau@gmail.com
>> >> To: user@pig.apache.org
>> >>
>> >> Hi,
>> >>
>> >> In pig, you can try
>> >>
>> >> GROUP data BY (NAME, DATE , URL)
>> >>
>> >> The detail is here:
>> >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>> >>
>> >> Best,
>> >> CHyi-kwei
>> >>
>> >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > I have this file. I want this operation to perform in HIVE & PIG
>> >> >
>> >> >       NAME                  DATE               URL                                                                           HITCOUNT
>> >> >    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
>> >> >    timesascent.in    2008-08-27    http://timesascent.in/    27
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    27
>> >> >     timesascent.in    2008-08-27    http://www.timesascent.in/    16
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    14
>> >> >     timesascent.in    2008-08-27    http://timesascent.in/    22
>> >> >
>> >> >
>> >> > I want to add all HITCOUNT for the same NAME, DATE & URL
>> >> >
>> >> > like
>> >> >
>> >> >  timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))
>> >> >
>> >> > Please suggest me is there any method to perform this query.
>> >> >
>> >> >
>> >> > Thanks & Regards
>> >> > Yogesh Kumar
>> >> >
>> >> >
>> >> >
>> >> >
>> >
>

RE: NEED HELP in Hive Query

Posted by chyi-kwei yau <ch...@gmail.com>.
Hi,

Try

Foreach B generate group,  SUM(A.hit)

Should have similar result.

Best,
Chyi-Kwei
On Oct 14, 2012 1:58 PM, "yogesh dhari" <yo...@live.com> wrote:

>
> Thanks Chyikwei :-)
>
> I got it now :-), Is there be another method without using flatten(A.name)
> and so on ?
>
> A = load '/File/000000_0' using PigStorage('\u0001')
>
>
>        as (name, date, url, hit:INT);
>
>
>
>
>
> B = group A by ( name, date, url);
>
>
>
>
>
>  C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url),
> SUM(A.hit) ;
>
>
>
>
>
> D = distinct C;
>
>
>
>
>
> Dump D;
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
> > Date: Sun, 14 Oct 2012 13:24:27 -0400
> > Subject: Re: NEED HELP in Hive Query
> > From: chyikwei.yau@gmail.com
> > To: user@pig.apache.org
> >
> > Hi yogesh,
> >
> > Thes result of "group by" should look like:
> > {group: (group keys),  { (instance1) , (instance2)  } }
> >
> > For example:
> >
> > If A looks like:
> > A: {name: chararray,age: int,gpa: float}
> >
> > And after  "B = GROUP A BY age;"
> >
> > B will become:
> > B: {group: int, A: {name: chararray,age: int,gpa: float}}
> >
> > Then you can use
> > FOREACH B Generate.....
> > To get the result you want.
> >
> > If my explaination is not clear, just take a look at
> > http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> >
> > Hope this help.
> >
> > Best,
> > Chyi-Kwei
> >
> > On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com>
> wrote:
> > >
> > > Hi CHyi-kwei,
> > >
> > > Thanks for help, I think I wasn't able to clarify my question
> > >
> > > The query you wrote
> > >
> > > It will count the number of occurrence of same NAME, DATE and URL but
> won't add all hitcount under same name, date, url.
> > >
> > > I want result like this
> > >
> > > like :  timesascent.in,     2008-08-27,      http://timesascent.in/
>    (/*addition of
> > > all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
> > >           timesascent.in,       2008-08-27,
> http://timesascent.in/section/2/Interviews    (/*addition of
> > > all hitcount under same name, date, url    (15+14)*/  29)
> > >           .
> > >           .
> > >           .
> > >
> > > From this file below
> > >
> > >       NAME                                 DATE
>         URL
>  HITCOUNT
> > > timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=tparchives    15
> > > timesascent.in    2008-08-27
> > >
> http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
> > >     20
> > > timesascent.in    2008-08-27    http://timesascent.in/    37
> > > timesascent.in    2008-08-27
> http://timesascent.in/section/39/Job%20Wise    14
> > > timesascent.in    2008-08-27
> > >
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> > >     20
> > > timesascent.in    2008-08-27    http://timesascent.in/    17
> > > timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    15
> > > timesascent.in    2008-08-27    http://timesascent.in/    17
> > > timesascent.in    2008-08-27    http://timesascent.in/    27
> > > timesascent.in    2008-08-27    http://timesascent.in/    37
> > > timesascent.in    2008-08-27    http://timesascent.in/    27
> > > timesascent.in    2008-08-27    http://www.timesascent.in/    16
> > > timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    14
> > > timesascent.in    2008-08-27    http://timesascent.in/    14
> > > timesascent.in    2008-08-27    http://timesascent.in/    22
> > >
> > >
> > > Please help and suggest how to write query for this in HIVE and  PIG
> > >
> > > Thanks & Regards
> > > Yogesh Kumar Dhari
> > >
> > >> Date: Sun, 14 Oct 2012 11:31:00 -0400
> > >> Subject: Re: NEED HELP in Hive Query
> > >> From: chyikwei.yau@gmail.com
> > >> To: user@pig.apache.org
> > >>
> > >> Hi,
> > >>
> > >> In pig, you can try
> > >>
> > >> GROUP data BY (NAME, DATE , URL)
> > >>
> > >> The detail is here:
> > >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> > >>
> > >> Best,
> > >> CHyi-kwei
> > >>
> > >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com>
> wrote:
> > >> >
> > >> > Hi all,
> > >> >
> > >> > I have this file. I want this operation to perform in HIVE & PIG
> > >> >
> > >> >       NAME                  DATE               URL
>                                                           HITCOUNT
> > >> >    timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=tparchives    15
> > >> >     timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26   20
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> > >> >     timesascent.in    2008-08-27
> http://timesascent.in/section/39/Job%20Wise    14
> > >> >     timesascent.in    2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html   20
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> > >> >     timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    15
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> > >> >    timesascent.in    2008-08-27    http://timesascent.in/    27
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    27
> > >> >     timesascent.in    2008-08-27    http://www.timesascent.in/
>  16
> > >> >     timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    14
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    14
> > >> >     timesascent.in    2008-08-27    http://timesascent.in/    22
> > >> >
> > >> >
> > >> > I want to add all HITCOUNT for the same NAME, DATE & URL
> > >> >
> > >> > like
> > >> >
> > >> >  timesascent.in    2008-08-27    http://timesascent.in/
>  (addition of all hitcount under same name, date, url   (37+17+17+27+....))
> > >> >
> > >> > Please suggest me is there any method to perform this query.
> > >> >
> > >> >
> > >> > Thanks & Regards
> > >> > Yogesh Kumar
> > >> >
> > >> >
> > >> >
> > >> >
> > >
>

RE: NEED HELP in Hive Query

Posted by yogesh dhari <yo...@live.com>.
Thanks Chyikwei :-)

I got it now :-), Is there be another method without using flatten(A.name) and so on ?
 
A = load '/File/000000_0' using PigStorage('\u0001')  


       as (name, date, url, hit:INT); 





B = group A by ( name, date, url);  





 C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url), SUM(A.hit) ;





D = distinct C; 





Dump D;

Thanks & Regards
Yogesh Kumar Dhari

> Date: Sun, 14 Oct 2012 13:24:27 -0400
> Subject: Re: NEED HELP in Hive Query
> From: chyikwei.yau@gmail.com
> To: user@pig.apache.org
> 
> Hi yogesh,
> 
> Thes result of "group by" should look like:
> {group: (group keys),  { (instance1) , (instance2)  } }
> 
> For example:
> 
> If A looks like:
> A: {name: chararray,age: int,gpa: float}
> 
> And after  "B = GROUP A BY age;"
> 
> B will become:
> B: {group: int, A: {name: chararray,age: int,gpa: float}}
> 
> Then you can use
> FOREACH B Generate.....
> To get the result you want.
> 
> If my explaination is not clear, just take a look at
> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> 
> Hope this help.
> 
> Best,
> Chyi-Kwei
> 
> On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
> >
> > Hi CHyi-kwei,
> >
> > Thanks for help, I think I wasn't able to clarify my question
> >
> > The query you wrote
> >
> > It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
> >
> > I want result like this
> >
> > like :  timesascent.in,     2008-08-27,      http://timesascent.in/      (/*addition of
> > all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
> >           timesascent.in,       2008-08-27,       http://timesascent.in/section/2/Interviews    (/*addition of
> > all hitcount under same name, date, url    (15+14)*/  29)
> >           .
> >           .
> >           .
> >
> > From this file below
> >
> >       NAME                                 DATE                               URL                                                                  HITCOUNT
> > timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
> > timesascent.in    2008-08-27
> > http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
> >     20
> > timesascent.in    2008-08-27    http://timesascent.in/    37
> > timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
> > timesascent.in    2008-08-27
> > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> >     20
> > timesascent.in    2008-08-27    http://timesascent.in/    17
> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
> > timesascent.in    2008-08-27    http://timesascent.in/    17
> > timesascent.in    2008-08-27    http://timesascent.in/    27
> > timesascent.in    2008-08-27    http://timesascent.in/    37
> > timesascent.in    2008-08-27    http://timesascent.in/    27
> > timesascent.in    2008-08-27    http://www.timesascent.in/    16
> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
> > timesascent.in    2008-08-27    http://timesascent.in/    14
> > timesascent.in    2008-08-27    http://timesascent.in/    22
> >
> >
> > Please help and suggest how to write query for this in HIVE and  PIG
> >
> > Thanks & Regards
> > Yogesh Kumar Dhari
> >
> >> Date: Sun, 14 Oct 2012 11:31:00 -0400
> >> Subject: Re: NEED HELP in Hive Query
> >> From: chyikwei.yau@gmail.com
> >> To: user@pig.apache.org
> >>
> >> Hi,
> >>
> >> In pig, you can try
> >>
> >> GROUP data BY (NAME, DATE , URL)
> >>
> >> The detail is here:
> >> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> >>
> >> Best,
> >> CHyi-kwei
> >>
> >> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > I have this file. I want this operation to perform in HIVE & PIG
> >> >
> >> >       NAME                  DATE               URL                                                                           HITCOUNT
> >> >    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
> >> >     timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
> >> >     timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> >> >    timesascent.in    2008-08-27    http://timesascent.in/    27
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    27
> >> >     timesascent.in    2008-08-27    http://www.timesascent.in/    16
> >> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    14
> >> >     timesascent.in    2008-08-27    http://timesascent.in/    22
> >> >
> >> >
> >> > I want to add all HITCOUNT for the same NAME, DATE & URL
> >> >
> >> > like
> >> >
> >> >  timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))
> >> >
> >> > Please suggest me is there any method to perform this query.
> >> >
> >> >
> >> > Thanks & Regards
> >> > Yogesh Kumar
> >> >
> >> >
> >> >
> >> >
> >
 		 	   		  

Re: NEED HELP in Hive Query

Posted by chyi-kwei yau <ch...@gmail.com>.
Hi yogesh,

Thes result of "group by" should look like:
{group: (group keys),  { (instance1) , (instance2)  } }

For example:

If A looks like:
A: {name: chararray,age: int,gpa: float}

And after  "B = GROUP A BY age;"

B will become:
B: {group: int, A: {name: chararray,age: int,gpa: float}}

Then you can use
FOREACH B Generate.....
To get the result you want.

If my explaination is not clear, just take a look at
http://pig.apache.org/docs/r0.10.0/basic.html#GROUP

Hope this help.

Best,
Chyi-Kwei

On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <yo...@live.com> wrote:
>
> Hi CHyi-kwei,
>
> Thanks for help, I think I wasn't able to clarify my question
>
> The query you wrote
>
> It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
>
> I want result like this
>
> like :  timesascent.in,     2008-08-27,      http://timesascent.in/      (/*addition of
> all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
>           timesascent.in,       2008-08-27,       http://timesascent.in/section/2/Interviews    (/*addition of
> all hitcount under same name, date, url    (15+14)*/  29)
>           .
>           .
>           .
>
> From this file below
>
>       NAME                                 DATE                               URL                                                                  HITCOUNT
> timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
> timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
>     20
> timesascent.in    2008-08-27    http://timesascent.in/    37
> timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
> timesascent.in    2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
>     20
> timesascent.in    2008-08-27    http://timesascent.in/    17
> timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
> timesascent.in    2008-08-27    http://timesascent.in/    17
> timesascent.in    2008-08-27    http://timesascent.in/    27
> timesascent.in    2008-08-27    http://timesascent.in/    37
> timesascent.in    2008-08-27    http://timesascent.in/    27
> timesascent.in    2008-08-27    http://www.timesascent.in/    16
> timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
> timesascent.in    2008-08-27    http://timesascent.in/    14
> timesascent.in    2008-08-27    http://timesascent.in/    22
>
>
> Please help and suggest how to write query for this in HIVE and  PIG
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
>> Date: Sun, 14 Oct 2012 11:31:00 -0400
>> Subject: Re: NEED HELP in Hive Query
>> From: chyikwei.yau@gmail.com
>> To: user@pig.apache.org
>>
>> Hi,
>>
>> In pig, you can try
>>
>> GROUP data BY (NAME, DATE , URL)
>>
>> The detail is here:
>> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>>
>> Best,
>> CHyi-kwei
>>
>> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>> >
>> > Hi all,
>> >
>> > I have this file. I want this operation to perform in HIVE & PIG
>> >
>> >       NAME                  DATE               URL                                                                           HITCOUNT
>> >    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
>> >     timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
>> >     timesascent.in    2008-08-27    http://timesascent.in/    37
>> >     timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
>> >     timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
>> >     timesascent.in    2008-08-27    http://timesascent.in/    17
>> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
>> >     timesascent.in    2008-08-27    http://timesascent.in/    17
>> >    timesascent.in    2008-08-27    http://timesascent.in/    27
>> >     timesascent.in    2008-08-27    http://timesascent.in/    37
>> >     timesascent.in    2008-08-27    http://timesascent.in/    27
>> >     timesascent.in    2008-08-27    http://www.timesascent.in/    16
>> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
>> >     timesascent.in    2008-08-27    http://timesascent.in/    14
>> >     timesascent.in    2008-08-27    http://timesascent.in/    22
>> >
>> >
>> > I want to add all HITCOUNT for the same NAME, DATE & URL
>> >
>> > like
>> >
>> >  timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))
>> >
>> > Please suggest me is there any method to perform this query.
>> >
>> >
>> > Thanks & Regards
>> > Yogesh Kumar
>> >
>> >
>> >
>> >
>

RE: NEED HELP in Hive Query

Posted by yogesh dhari <yo...@live.com>.
Hi CHyi-kwei,

Thanks for help, I think I wasn't able to clarify my question

The query you wrote

It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.

I want result like this

like :  timesascent.in,     2008-08-27,      http://timesascent.in/      (/*addition of 
all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
          timesascent.in,       2008-08-27,       http://timesascent.in/section/2/Interviews    (/*addition of 
all hitcount under same name, date, url    (15+14)*/  29)
          .
          .
          .

>From this file below

      NAME                                 DATE                               URL                                                                  HITCOUNT
timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
timesascent.in    2008-08-27    
http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
    20
timesascent.in    2008-08-27    http://timesascent.in/    37
timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
timesascent.in    2008-08-27    
http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
    20
timesascent.in    2008-08-27    http://timesascent.in/    17
timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
timesascent.in    2008-08-27    http://timesascent.in/    17
timesascent.in    2008-08-27    http://timesascent.in/    27
timesascent.in    2008-08-27    http://timesascent.in/    37
timesascent.in    2008-08-27    http://timesascent.in/    27
timesascent.in    2008-08-27    http://www.timesascent.in/    16
timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
timesascent.in    2008-08-27    http://timesascent.in/    14
timesascent.in    2008-08-27    http://timesascent.in/    22


Please help and suggest how to write query for this in HIVE and  PIG

Thanks & Regards
Yogesh Kumar Dhari

> Date: Sun, 14 Oct 2012 11:31:00 -0400
> Subject: Re: NEED HELP in Hive Query
> From: chyikwei.yau@gmail.com
> To: user@pig.apache.org
> 
> Hi,
> 
> In pig, you can try
> 
> GROUP data BY (NAME, DATE , URL)
> 
> The detail is here:
> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
> 
> Best,
> CHyi-kwei
> 
> On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
> >
> > Hi all,
> >
> > I have this file. I want this operation to perform in HIVE & PIG
> >
> >       NAME                  DATE               URL                                                                           HITCOUNT
> >    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
> >     timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> >     timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
> >     timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
> >     timesascent.in    2008-08-27    http://timesascent.in/    17
> >    timesascent.in    2008-08-27    http://timesascent.in/    27
> >     timesascent.in    2008-08-27    http://timesascent.in/    37
> >     timesascent.in    2008-08-27    http://timesascent.in/    27
> >     timesascent.in    2008-08-27    http://www.timesascent.in/    16
> >     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
> >     timesascent.in    2008-08-27    http://timesascent.in/    14
> >     timesascent.in    2008-08-27    http://timesascent.in/    22
> >
> >
> > I want to add all HITCOUNT for the same NAME, DATE & URL
> >
> > like
> >
> >  timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))
> >
> > Please suggest me is there any method to perform this query.
> >
> >
> > Thanks & Regards
> > Yogesh Kumar
> >
> >
> >
> >
 		 	   		  

Re: NEED HELP in Hive Query

Posted by chyi-kwei yau <ch...@gmail.com>.
Hi,

In pig, you can try

GROUP data BY (NAME, DATE , URL)

The detail is here:
http://pig.apache.org/docs/r0.10.0/basic.html#GROUP

Best,
CHyi-kwei

On Sun, Oct 14, 2012 at 10:54 AM, yogesh dhari <yo...@live.com> wrote:
>
> Hi all,
>
> I have this file. I want this operation to perform in HIVE & PIG
>
>       NAME                  DATE               URL                                                                           HITCOUNT
>    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
>     timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20
>     timesascent.in    2008-08-27    http://timesascent.in/    37
>     timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
>     timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20
>     timesascent.in    2008-08-27    http://timesascent.in/    17
>     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
>     timesascent.in    2008-08-27    http://timesascent.in/    17
>    timesascent.in    2008-08-27    http://timesascent.in/    27
>     timesascent.in    2008-08-27    http://timesascent.in/    37
>     timesascent.in    2008-08-27    http://timesascent.in/    27
>     timesascent.in    2008-08-27    http://www.timesascent.in/    16
>     timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
>     timesascent.in    2008-08-27    http://timesascent.in/    14
>     timesascent.in    2008-08-27    http://timesascent.in/    22
>
>
> I want to add all HITCOUNT for the same NAME, DATE & URL
>
> like
>
>  timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))
>
> Please suggest me is there any method to perform this query.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
>
>

RE: NEED HELP in Hive Query

Posted by yogesh dhari <yo...@live.com>.
Thanks John :-),

I got it now in Pig also :-).

A = load '/File/000000_0' using PigStorage('\u0001')  
 as as (name, date, url, hit:INT); 

B = group A by (id, name, date, url);  

 C = foreach B generate flatten(A.id), flatten(A.name), flatten(A.url), SUM(A.hit) ;

D = distinct C; 

Dump D;

Thanks & Regards
Yogesh Kumar Dhari

From: john@omernik.com
Date: Sun, 14 Oct 2012 12:29:23 -0500
Subject: Re: NEED HELP in Hive Query
To: user@hive.apache.org

select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by NAME, DATE, URL
That's the HIVE answer. Not sure the PIG answer. 





On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari <yo...@live.com> wrote:






Hi all, 

I have this file. I want this operation to perform in HIVE & PIG

      NAME                  DATE               URL                                                                           HITCOUNT


   timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15


    timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26    20


    timesascent.in    2008-08-27    http://timesascent.in/    37
    timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14


    timesascent.in    2008-08-27    http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html    20


    timesascent.in    2008-08-27    http://timesascent.in/    17
    timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15


    timesascent.in    2008-08-27    http://timesascent.in/    17
   timesascent.in    2008-08-27    http://timesascent.in/    27


    timesascent.in    2008-08-27    http://timesascent.in/    37
    timesascent.in    2008-08-27    http://timesascent.in/    27


    timesascent.in    2008-08-27    http://www.timesascent.in/    16
    timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14


    timesascent.in    2008-08-27    http://timesascent.in/    14
    timesascent.in    2008-08-27    http://timesascent.in/    22




I want to add all HITCOUNT for the same NAME, DATE & URL  

like 

 timesascent.in    2008-08-27    http://timesascent.in/    (addition of all hitcount under same name, date, url   (37+17+17+27+....))



Please suggest me is there any method to perform this query.


Thanks & Regards
Yogesh Kumar



 		 	   		  

 		 	   		  

Re: NEED HELP in Hive Query

Posted by John Omernik <jo...@omernik.com>.
select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by
NAME, DATE, URL

That's the HIVE answer. Not sure the PIG answer.




On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari <yo...@live.com> wrote:

>  Hi all,
>
> I have this file. I want this operation to perform in *HIVE & PIG*
>
>       NAME                  DATE
> URL
> HITCOUNT
>    timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=tparchives    15
>     timesascent.in    2008-08-27
> http://timesascent.in/index.aspx?page=article&sectid=1&contentid=200812182008121814134447219270b26
> 20
>     timesascent.in    2008-08-27    http://timesascent.in/    37
>     timesascent.in    2008-08-27
> http://timesascent.in/section/39/Job%20Wise    14
>     timesascent.in    2008-08-27
> http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
> 20
>     timesascent.in    2008-08-27    http://timesascent.in/    17
>     timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    15
>     timesascent.in    2008-08-27    http://timesascent.in/    17
>    timesascent.in    2008-08-27    http://timesascent.in/    27
>     timesascent.in    2008-08-27    http://timesascent.in/    37
>     timesascent.in    2008-08-27    http://timesascent.in/    27
>     timesascent.in    2008-08-27    http://www.timesascent.in/    16
>     timesascent.in    2008-08-27
> http://timesascent.in/section/2/Interviews    14
>     timesascent.in    2008-08-27    http://timesascent.in/    14
>     timesascent.in    2008-08-27    http://timesascent.in/    22
>
>
> I want to *add all HITCOUNT for the same NAME, DATE & URL  *
>
> like
>
> timesascent.in    2008-08-27    http://timesascent.in/    (addition of
> all hitcount under same name, date, url   (37+17+17+27+....))
>
> Please suggest me is there any method to perform this query.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
>
>