You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cam Bazz <ca...@gmail.com> on 2011/03/01 13:33:26 UTC

counting impressions strategy

Hello,

Now I would like to count impressions per item. To achieve this, I
made a logger, for instance when the user goes in a category or search
page, and some items are listed, I am logging:

CATPAGE   CAT1    1,2,3,4,5
CATPAGE   CAT2    6,7,8,9,10
SEARCH     keyword 1,6


basically I am logging all the displayed items in a comma seperated list.

I need to calculate and store daily impressions from this such as:

1, 2
6, 2

(the first line is item sid, the second number is impressions, in
total from different impression types)

Now I have couple of questions:

considering that the system will produce at least 1 line per item per
day, what kind of table i must store this? previously, I have been
using text files for everything, I never had any requirement to query
hive, but rather export results from it. now I will probably need to
make queries like "select * from myimpression table where sid = xx"
giving me a timeline of impressions per item.

Second question:

what kind of query I need in order to count impressions like above?

Thank you very much,
C.B.

Re: counting impressions strategy

Posted by Dave Viner <da...@vinertech.com>.
I am not super familiar with lists inside a column for Hive, but that might
let you define a table that has a schema of "page-type, page-name,
items-displayed", and then query for a count of individual items (
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF).  Possibly use of a
Map type would be best.. not sure.

HTH
Dave Viner


On Tue, Mar 1, 2011 at 4:33 AM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> Now I would like to count impressions per item. To achieve this, I
> made a logger, for instance when the user goes in a category or search
> page, and some items are listed, I am logging:
>
> CATPAGE   CAT1    1,2,3,4,5
> CATPAGE   CAT2    6,7,8,9,10
> SEARCH     keyword 1,6
>
>
> basically I am logging all the displayed items in a comma seperated list.
>
> I need to calculate and store daily impressions from this such as:
>
> 1, 2
> 6, 2
>
> (the first line is item sid, the second number is impressions, in
> total from different impression types)
>
> Now I have couple of questions:
>
> considering that the system will produce at least 1 line per item per
> day, what kind of table i must store this? previously, I have been
> using text files for everything, I never had any requirement to query
> hive, but rather export results from it. now I will probably need to
> make queries like "select * from myimpression table where sid = xx"
> giving me a timeline of impressions per item.
>
> Second question:
>
> what kind of query I need in order to count impressions like above?
>
> Thank you very much,
> C.B.
>