You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Steve Bernstein <St...@deem.com> on 2012/09/01 01:14:46 UTC
RE: group by clickstream
Nope, tried that, it breaks it back into one tuple per record...not what I want.
-----Original Message-----
From: Віталій Тимчишин [mailto:tivv00@gmail.com]
Sent: Friday, August 31, 2012 1:49 PM
To: user@pig.apache.org
Subject: Re: group by clickstream
Hello.
Does not FLATTEN do exactly this?
Best regards, Vitalii Tymchyshyn
2012/8/30 Steve Bernstein <St...@deem.com>
> Some clarification on the below. Ignore the outer bag, I'd removed
> some data elements for clarity and simplicity. Basically, I'm trying
> to find a way to go from:
>
> {(pg),(pg),...,(pg)}
> to
> {(pg,pg,...,pg)}
>
> For an abritrary number of "pg" tuples.
>
> SB
>
> -----Original Message-----
> From: Steve Bernstein [mailto:Steve.Bernstein@deem.com]
> Sent: Wednesday, August 29, 2012 4:28 PM
> To: user@pig.apache.org
> Subject: group by clickstream
>
> Hi all,
> I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for
> which each row represents a sequence of pages and events in a single
> session on a website. The interior bag, clickstream, represents this
> as a sequence of one or more single element tuples, e.g.,
>
> {(homepage),(pg1),(pg2),...,(pgN)}
>
> I'd like to group by the sequences so I can get counts and ultimately
> sort to find the most common clickstreams. A bag can't be a key for
> grouping, I've discovered, but it seems like it ought to be easy to
> flatten the clickstream bag into some other form such that the
> sequences can be used as keys for grouping. But I can't figure it out.
>
> Any ideas?
>
> Thanks!
> Steve
>
>
--
Best regards,
Vitalii Tymchyshyn