You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Steve Bernstein <St...@deem.com> on 2012/09/01 01:14:46 UTC

RE: group by clickstream

Nope, tried that, it breaks it back into one tuple per record...not what I want.

-----Original Message-----
From: Віталій Тимчишин [mailto:tivv00@gmail.com] 
Sent: Friday, August 31, 2012 1:49 PM
To: user@pig.apache.org
Subject: Re: group by clickstream

Hello.

Does not FLATTEN do exactly this?

Best regards, Vitalii Tymchyshyn

2012/8/30 Steve Bernstein <St...@deem.com>

> Some clarification on the below.  Ignore the outer bag, I'd removed 
> some data elements for clarity and simplicity.  Basically, I'm trying 
> to find a way to go from:
>
> {(pg),(pg),...,(pg)}
> to
> {(pg,pg,...,pg)}
>
> For an abritrary number of "pg" tuples.
>
> SB
>
> -----Original Message-----
> From: Steve Bernstein [mailto:Steve.Bernstein@deem.com]
> Sent: Wednesday, August 29, 2012 4:28 PM
> To: user@pig.apache.org
> Subject: group by clickstream
>
> Hi all,
> I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for 
> which each row represents a sequence of pages and events in a single 
> session on a website.  The interior bag, clickstream, represents this 
> as a sequence of one or more single element tuples, e.g.,
>
> {(homepage),(pg1),(pg2),...,(pgN)}
>
> I'd like to group by the sequences so I can get counts and ultimately 
> sort to find the most common clickstreams.  A bag can't be a key for 
> grouping, I've discovered, but it seems like it ought to be easy to 
> flatten the clickstream bag into some other form such that the 
> sequences can be used as keys for grouping.  But I can't figure it out.
>
> Any ideas?
>
> Thanks!
> Steve
>
>


--
Best regards,
 Vitalii Tymchyshyn