You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cam Bazz <ca...@gmail.com> on 2011/02/21 12:07:31 UTC

calculating unique views based on ip, session_id

Hello,

So I have table of item views with item_sid, ip_number, session_id

I know it will not be that exact, but I want to get unique views per
item, and i will accept ip_number, session_id tuple as an unique view.

when I want to query just item hits I say: select item_sid, count(*)
from item_raw group by item_sid;

but if I say:

select item_sid, count(*) from item_raw group by item_sid, ip_number,
session_id;

it will give me duplicate item sids.

how can I query per unique tuple of ip_number, session_id per item_sid?

best regards,
c.b.

Re: calculating unique views based on ip, session_id

Posted by Cam Bazz <ca...@gmail.com>.
yes, this indeed solved my problem. thanks a bunch.

On Wed, Feb 23, 2011 at 4:33 AM, wd <wd...@wdicc.com> wrote:
> yes, ip_number and session_id should not be in group by clause.
>
> 2011/2/22 Viral Bajaria <vi...@gmail.com>
>>
>> I am guessing the following query should work too:
>> select item_sid, count(distinct ip_number, session_id) from item_raw where
>> date_day = '20110202' group by item_sid;
>> On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz <ca...@gmail.com> wrote:
>>>
>>> The query you have produced mulltiple item_sid's.
>>>
>>> This is rather what I have done:
>>>
>>> select u.item_sid, count(*) cc from (select distinct item_sid,
>>> ip_number, session_id from item_raw where date_day='20110202') u group
>>> by u.eser_sid
>>>
>>> date_day is a partition
>>>
>>> and this produced the results i wanted, but as you can see it is a
>>> double query. I dont know if there is a single query way of doing it.
>>>
>>> best regards.
>>> -c.b.
>>>
>>> On Tue, Feb 22, 2011 at 4:32 AM, wd <wd...@wdicc.com> wrote:
>>> > May be
>>> > select item_sid, count(distinct ip_number, session_id) from item_raw
>>> > group
>>> > by item_sid, ip_number, session_id (I've not test it, maybe it should
>>> > be
>>> > concat(ip_number, session_id) instead of ip_number, session_id )
>>> > is what you want.
>>> >
>>> > 2011/2/21 Cam Bazz <ca...@gmail.com>
>>> >>
>>> >> Hello,
>>> >>
>>> >> So I have table of item views with item_sid, ip_number, session_id
>>> >>
>>> >> I know it will not be that exact, but I want to get unique views per
>>> >> item, and i will accept ip_number, session_id tuple as an unique view.
>>> >>
>>> >> when I want to query just item hits I say: select item_sid, count(*)
>>> >> from item_raw group by item_sid;
>>> >>
>>> >> but if I say:
>>> >>
>>> >> select item_sid, count(*) from item_raw group by item_sid, ip_number,
>>> >> session_id;
>>> >>
>>> >> it will give me duplicate item sids.
>>> >>
>>> >> how can I query per unique tuple of ip_number, session_id per
>>> >> item_sid?
>>> >>
>>> >> best regards,
>>> >> c.b.
>>> >
>>> >
>>
>
>

Re: calculating unique views based on ip, session_id

Posted by wd <wd...@wdicc.com>.
yes, ip_number and session_id should not be in group by clause.

2011/2/22 Viral Bajaria <vi...@gmail.com>

> I am guessing the following query should work too:
>
> select item_sid, count(distinct ip_number, session_id) from item_raw where
> date_day = '20110202' group by item_sid;
>
> On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz <ca...@gmail.com> wrote:
>
>> The query you have produced mulltiple item_sid's.
>>
>> This is rather what I have done:
>>
>> select u.item_sid, count(*) cc from (select distinct item_sid,
>> ip_number, session_id from item_raw where date_day='20110202') u group
>> by u.eser_sid
>>
>> date_day is a partition
>>
>> and this produced the results i wanted, but as you can see it is a
>> double query. I dont know if there is a single query way of doing it.
>>
>> best regards.
>> -c.b.
>>
>> On Tue, Feb 22, 2011 at 4:32 AM, wd <wd...@wdicc.com> wrote:
>> > May be
>> > select item_sid, count(distinct ip_number, session_id) from item_raw
>> group
>> > by item_sid, ip_number, session_id (I've not test it, maybe it should be
>> > concat(ip_number, session_id) instead of ip_number, session_id )
>> > is what you want.
>> >
>> > 2011/2/21 Cam Bazz <ca...@gmail.com>
>> >>
>> >> Hello,
>> >>
>> >> So I have table of item views with item_sid, ip_number, session_id
>> >>
>> >> I know it will not be that exact, but I want to get unique views per
>> >> item, and i will accept ip_number, session_id tuple as an unique view.
>> >>
>> >> when I want to query just item hits I say: select item_sid, count(*)
>> >> from item_raw group by item_sid;
>> >>
>> >> but if I say:
>> >>
>> >> select item_sid, count(*) from item_raw group by item_sid, ip_number,
>> >> session_id;
>> >>
>> >> it will give me duplicate item sids.
>> >>
>> >> how can I query per unique tuple of ip_number, session_id per item_sid?
>> >>
>> >> best regards,
>> >> c.b.
>> >
>> >
>>
>
>

Re: calculating unique views based on ip, session_id

Posted by Viral Bajaria <vi...@gmail.com>.
I am guessing the following query should work too:

select item_sid, count(distinct ip_number, session_id) from item_raw where
date_day = '20110202' group by item_sid;

On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz <ca...@gmail.com> wrote:

> The query you have produced mulltiple item_sid's.
>
> This is rather what I have done:
>
> select u.item_sid, count(*) cc from (select distinct item_sid,
> ip_number, session_id from item_raw where date_day='20110202') u group
> by u.eser_sid
>
> date_day is a partition
>
> and this produced the results i wanted, but as you can see it is a
> double query. I dont know if there is a single query way of doing it.
>
> best regards.
> -c.b.
>
> On Tue, Feb 22, 2011 at 4:32 AM, wd <wd...@wdicc.com> wrote:
> > May be
> > select item_sid, count(distinct ip_number, session_id) from item_raw
> group
> > by item_sid, ip_number, session_id (I've not test it, maybe it should be
> > concat(ip_number, session_id) instead of ip_number, session_id )
> > is what you want.
> >
> > 2011/2/21 Cam Bazz <ca...@gmail.com>
> >>
> >> Hello,
> >>
> >> So I have table of item views with item_sid, ip_number, session_id
> >>
> >> I know it will not be that exact, but I want to get unique views per
> >> item, and i will accept ip_number, session_id tuple as an unique view.
> >>
> >> when I want to query just item hits I say: select item_sid, count(*)
> >> from item_raw group by item_sid;
> >>
> >> but if I say:
> >>
> >> select item_sid, count(*) from item_raw group by item_sid, ip_number,
> >> session_id;
> >>
> >> it will give me duplicate item sids.
> >>
> >> how can I query per unique tuple of ip_number, session_id per item_sid?
> >>
> >> best regards,
> >> c.b.
> >
> >
>

Re: calculating unique views based on ip, session_id

Posted by Cam Bazz <ca...@gmail.com>.
The query you have produced mulltiple item_sid's.

This is rather what I have done:

select u.item_sid, count(*) cc from (select distinct item_sid,
ip_number, session_id from item_raw where date_day='20110202') u group
by u.eser_sid

date_day is a partition

and this produced the results i wanted, but as you can see it is a
double query. I dont know if there is a single query way of doing it.

best regards.
-c.b.

On Tue, Feb 22, 2011 at 4:32 AM, wd <wd...@wdicc.com> wrote:
> May be
> select item_sid, count(distinct ip_number, session_id) from item_raw group
> by item_sid, ip_number, session_id (I've not test it, maybe it should be
> concat(ip_number, session_id) instead of ip_number, session_id )
> is what you want.
>
> 2011/2/21 Cam Bazz <ca...@gmail.com>
>>
>> Hello,
>>
>> So I have table of item views with item_sid, ip_number, session_id
>>
>> I know it will not be that exact, but I want to get unique views per
>> item, and i will accept ip_number, session_id tuple as an unique view.
>>
>> when I want to query just item hits I say: select item_sid, count(*)
>> from item_raw group by item_sid;
>>
>> but if I say:
>>
>> select item_sid, count(*) from item_raw group by item_sid, ip_number,
>> session_id;
>>
>> it will give me duplicate item sids.
>>
>> how can I query per unique tuple of ip_number, session_id per item_sid?
>>
>> best regards,
>> c.b.
>
>

Re: calculating unique views based on ip, session_id

Posted by wd <wd...@wdicc.com>.
May be
select item_sid, count(distinct ip_number, session_id) from item_raw group
by item_sid, ip_number, session_id (I've not test it, maybe it should be
concat(ip_number, session_id) instead of ip_number, session_id )
is what you want.

2011/2/21 Cam Bazz <ca...@gmail.com>

> Hello,
>
> So I have table of item views with item_sid, ip_number, session_id
>
> I know it will not be that exact, but I want to get unique views per
> item, and i will accept ip_number, session_id tuple as an unique view.
>
> when I want to query just item hits I say: select item_sid, count(*)
> from item_raw group by item_sid;
>
> but if I say:
>
> select item_sid, count(*) from item_raw group by item_sid, ip_number,
> session_id;
>
> it will give me duplicate item sids.
>
> how can I query per unique tuple of ip_number, session_id per item_sid?
>
> best regards,
> c.b.
>

Re: calculating unique views based on ip, session_id

Posted by Ajo Fod <aj...@gmail.com>.
Oh, I think I see what you are getting at .. basically you are getting
duplicate item_sids because they represent different views.

... try this:

select item_sid, ip_number,
session_id, count(*) from item_raw group by item_sid, ip_number,
session_id;



On Mon, Feb 21, 2011 at 11:54 AM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> I did not understand this:
>
> when I do a:
>
> select item_sid, count(*) from item_raw group by item_sid
>
> i get hits per item.
>
> how do we join this to the master table?
>
> best regards,
> -c.b.
>
> On Mon, Feb 21, 2011 at 6:28 PM, Ajo Fod <aj...@gmail.com> wrote:
> > You can group by item_sid (drop session_id and ip_number from group by
> > clause) and then join with the parent table to get session_id  and
> > ip_number.
> >
> > -Ajo
> >
> > On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz <ca...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> So I have table of item views with item_sid, ip_number, session_id
> >>
> >> I know it will not be that exact, but I want to get unique views per
> >> item, and i will accept ip_number, session_id tuple as an unique view.
> >>
> >> when I want to query just item hits I say: select item_sid, count(*)
> >> from item_raw group by item_sid;
> >>
> >> but if I say:
> >>
> >> select item_sid, count(*) from item_raw group by item_sid, ip_number,
> >> session_id;
> >>
> >> it will give me duplicate item sids.
> >>
> >> how can I query per unique tuple of ip_number, session_id per item_sid?
> >>
> >> best regards,
> >> c.b.
> >
> >
>

Re: calculating unique views based on ip, session_id

Posted by Cam Bazz <ca...@gmail.com>.
Hello,

I did not understand this:

when I do a:

select item_sid, count(*) from item_raw group by item_sid

i get hits per item.

how do we join this to the master table?

best regards,
-c.b.

On Mon, Feb 21, 2011 at 6:28 PM, Ajo Fod <aj...@gmail.com> wrote:
> You can group by item_sid (drop session_id and ip_number from group by
> clause) and then join with the parent table to get session_id  and
> ip_number.
>
> -Ajo
>
> On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz <ca...@gmail.com> wrote:
>>
>> Hello,
>>
>> So I have table of item views with item_sid, ip_number, session_id
>>
>> I know it will not be that exact, but I want to get unique views per
>> item, and i will accept ip_number, session_id tuple as an unique view.
>>
>> when I want to query just item hits I say: select item_sid, count(*)
>> from item_raw group by item_sid;
>>
>> but if I say:
>>
>> select item_sid, count(*) from item_raw group by item_sid, ip_number,
>> session_id;
>>
>> it will give me duplicate item sids.
>>
>> how can I query per unique tuple of ip_number, session_id per item_sid?
>>
>> best regards,
>> c.b.
>
>

Re: calculating unique views based on ip, session_id

Posted by Ajo Fod <aj...@gmail.com>.
You can group by item_sid (drop session_id and ip_number from group by
clause) and then join with the parent table to get session_id  and
ip_number.

-Ajo

On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> So I have table of item views with item_sid, ip_number, session_id
>
> I know it will not be that exact, but I want to get unique views per
> item, and i will accept ip_number, session_id tuple as an unique view.
>
> when I want to query just item hits I say: select item_sid, count(*)
> from item_raw group by item_sid;
>
> but if I say:
>
> select item_sid, count(*) from item_raw group by item_sid, ip_number,
> session_id;
>
> it will give me duplicate item sids.
>
> how can I query per unique tuple of ip_number, session_id per item_sid?
>
> best regards,
> c.b.
>