You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mohit Anchlia <mo...@gmail.com> on 2012/06/18 20:19:44 UTC

help with Map Type

I am trying to parse URL using map type of pig. My query string is:

https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123

My very simple script for testing is this. But when I look at the part file
it returns null.

A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
(M:map[]);

rmf '/examples/map/output/';

STORE B INTO '/examples/map/output/';

I am working on analyzing clickstream data. For this I need to first parse
these strings into files representing dimensions and also do sessionization
on them before loading it into RDBMS.

Re: help with Map Type

Posted by Chun Yang <cy...@contractor.salesforce.com>.


On 6/19/12 12:27 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> On Tue, Jun 19, 2012 at 10:46 AM, Subir S <su...@gmail.com> wrote:
> 
>> I think content in the end of this link
>> 
>> 
http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwil>>
l
>> help you!!
>> 
>> thanks! I get 404 when I click on that link.
> 

There's a missing space. You want this:
http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pig


Re: help with Map Type

Posted by Mohit Anchlia <mo...@gmail.com>.
On Tue, Jun 19, 2012 at 10:46 AM, Subir S <su...@gmail.com> wrote:

> I think content in the end of this link
>
> http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwill
> help you!!
>
> thanks! I get 404 when I click on that link.


>  On Tue, Jun 19, 2012 at 10:50 PM, Subir S <su...@gmail.com>
> wrote:
>
> > I suggest you load with 2 fields. (uri, query) split at '?' delimiter.
> >
> > Then use regex_extract to extract abc.com and regex_extract_all to
> > extract query parameters.
> >
> > Use foreach...generate to make query into a map.
> >
> >
> > On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
> >
> >> sorry that wasn't a link. It's my input to the pig. Basically what's
> >> inside
> >> params.dat. When I run those 3 pig lines I get empty output. What I want
> >> is
> >> something like this:
> >>
> >> http://abc.com/?a=v1&b=v2
> >>
> >> broken down into a map and also be able to preserve abc.com. Otherwise
> if
> >> it's complex I can write UDFs
> >>
> >>
> >> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <su...@gmail.com>
> >> wrote:
> >>
> >> > I think link Mohit mentioned was his input. Not sure if i understood
> >> > correctly.
> >> >
> >> > I suspect something related to the schema.
> >> >
> >> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
> >> >
> >> > http://stackoverflow.com/a/8238591
> >> >
> >> > So when you load with delimiter '&', what will happen to the first
> >> field?
> >> > and how will the second field automatically become a map...I mean in
> >> your
> >> > schema... you mention only one field...not two fields..URL&QUERY
> >> >
> >> > Thanks, Subir
> >> >
> >> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <
> jcoveney@gmail.com
> >> > >wrote:
> >> >
> >> > > Your link does not work, I recommend using pastebin.
> >> > >
> >> > > 2012/6/18 Mohit Anchlia <mo...@gmail.com>
> >> > >
> >> > > > I am trying to parse URL using map type of pig. My query string
> is:
> >> > > >
> >> > > >
> >> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
> >> > > >
> >> > > > My very simple script for testing is this. But when I look at the
> >> part
> >> > > file
> >> > > > it returns null.
> >> > > >
> >> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> >> > > > (M:map[]);
> >> > > >
> >> > > > rmf '/examples/map/output/';
> >> > > >
> >> > > > STORE B INTO '/examples/map/output/';
> >> > > >
> >> > > > I am working on analyzing clickstream data. For this I need to
> first
> >> > > parse
> >> > > > these strings into files representing dimensions and also do
> >> > > sessionization
> >> > > > on them before loading it into RDBMS.
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: help with Map Type

Posted by Subir S <su...@gmail.com>.
I think content in the end of this link
http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwill
help you!!

On Tue, Jun 19, 2012 at 10:50 PM, Subir S <su...@gmail.com> wrote:

> I suggest you load with 2 fields. (uri, query) split at '?' delimiter.
>
> Then use regex_extract to extract abc.com and regex_extract_all to
> extract query parameters.
>
> Use foreach...generate to make query into a map.
>
>
> On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>> sorry that wasn't a link. It's my input to the pig. Basically what's
>> inside
>> params.dat. When I run those 3 pig lines I get empty output. What I want
>> is
>> something like this:
>>
>> http://abc.com/?a=v1&b=v2
>>
>> broken down into a map and also be able to preserve abc.com. Otherwise if
>> it's complex I can write UDFs
>>
>>
>> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <su...@gmail.com>
>> wrote:
>>
>> > I think link Mohit mentioned was his input. Not sure if i understood
>> > correctly.
>> >
>> > I suspect something related to the schema.
>> >
>> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
>> >
>> > http://stackoverflow.com/a/8238591
>> >
>> > So when you load with delimiter '&', what will happen to the first
>> field?
>> > and how will the second field automatically become a map...I mean in
>> your
>> > schema... you mention only one field...not two fields..URL&QUERY
>> >
>> > Thanks, Subir
>> >
>> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <jcoveney@gmail.com
>> > >wrote:
>> >
>> > > Your link does not work, I recommend using pastebin.
>> > >
>> > > 2012/6/18 Mohit Anchlia <mo...@gmail.com>
>> > >
>> > > > I am trying to parse URL using map type of pig. My query string is:
>> > > >
>> > > >
>> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
>> > > >
>> > > > My very simple script for testing is this. But when I look at the
>> part
>> > > file
>> > > > it returns null.
>> > > >
>> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
>> > > > (M:map[]);
>> > > >
>> > > > rmf '/examples/map/output/';
>> > > >
>> > > > STORE B INTO '/examples/map/output/';
>> > > >
>> > > > I am working on analyzing clickstream data. For this I need to first
>> > > parse
>> > > > these strings into files representing dimensions and also do
>> > > sessionization
>> > > > on them before loading it into RDBMS.
>> > > >
>> > >
>> >
>>
>
>

Re: help with Map Type

Posted by Subir S <su...@gmail.com>.
I suggest you load with 2 fields. (uri, query) split at '?' delimiter.

Then use regex_extract to extract abc.com and regex_extract_all to extract
query parameters.

Use foreach...generate to make query into a map.


On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> sorry that wasn't a link. It's my input to the pig. Basically what's inside
> params.dat. When I run those 3 pig lines I get empty output. What I want is
> something like this:
>
> http://abc.com/?a=v1&b=v2
>
> broken down into a map and also be able to preserve abc.com. Otherwise if
> it's complex I can write UDFs
>
>
> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <su...@gmail.com>
> wrote:
>
> > I think link Mohit mentioned was his input. Not sure if i understood
> > correctly.
> >
> > I suspect something related to the schema.
> >
> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
> >
> > http://stackoverflow.com/a/8238591
> >
> > So when you load with delimiter '&', what will happen to the first field?
> > and how will the second field automatically become a map...I mean in your
> > schema... you mention only one field...not two fields..URL&QUERY
> >
> > Thanks, Subir
> >
> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> >
> > > Your link does not work, I recommend using pastebin.
> > >
> > > 2012/6/18 Mohit Anchlia <mo...@gmail.com>
> > >
> > > > I am trying to parse URL using map type of pig. My query string is:
> > > >
> > > >
> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
> > > >
> > > > My very simple script for testing is this. But when I look at the
> part
> > > file
> > > > it returns null.
> > > >
> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> > > > (M:map[]);
> > > >
> > > > rmf '/examples/map/output/';
> > > >
> > > > STORE B INTO '/examples/map/output/';
> > > >
> > > > I am working on analyzing clickstream data. For this I need to first
> > > parse
> > > > these strings into files representing dimensions and also do
> > > sessionization
> > > > on them before loading it into RDBMS.
> > > >
> > >
> >
>

Re: help with Map Type

Posted by Mohit Anchlia <mo...@gmail.com>.
sorry that wasn't a link. It's my input to the pig. Basically what's inside
params.dat. When I run those 3 pig lines I get empty output. What I want is
something like this:

http://abc.com/?a=v1&b=v2

broken down into a map and also be able to preserve abc.com. Otherwise if
it's complex I can write UDFs


On Mon, Jun 18, 2012 at 1:04 PM, Subir S <su...@gmail.com> wrote:

> I think link Mohit mentioned was his input. Not sure if i understood
> correctly.
>
> I suspect something related to the schema.
>
> http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
>
> http://stackoverflow.com/a/8238591
>
> So when you load with delimiter '&', what will happen to the first field?
> and how will the second field automatically become a map...I mean in your
> schema... you mention only one field...not two fields..URL&QUERY
>
> Thanks, Subir
>
> On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > Your link does not work, I recommend using pastebin.
> >
> > 2012/6/18 Mohit Anchlia <mo...@gmail.com>
> >
> > > I am trying to parse URL using map type of pig. My query string is:
> > >
> > > https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
> > >
> > > My very simple script for testing is this. But when I look at the part
> > file
> > > it returns null.
> > >
> > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> > > (M:map[]);
> > >
> > > rmf '/examples/map/output/';
> > >
> > > STORE B INTO '/examples/map/output/';
> > >
> > > I am working on analyzing clickstream data. For this I need to first
> > parse
> > > these strings into files representing dimensions and also do
> > sessionization
> > > on them before loading it into RDBMS.
> > >
> >
>

Re: help with Map Type

Posted by Subir S <su...@gmail.com>.
I think link Mohit mentioned was his input. Not sure if i understood
correctly.

I suspect something related to the schema.

http://pig.apache.org/docs/r0.9.1/basic.html#map-schema

http://stackoverflow.com/a/8238591

So when you load with delimiter '&', what will happen to the first field?
and how will the second field automatically become a map...I mean in your
schema... you mention only one field...not two fields..URL&QUERY

Thanks, Subir

On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> Your link does not work, I recommend using pastebin.
>
> 2012/6/18 Mohit Anchlia <mo...@gmail.com>
>
> > I am trying to parse URL using map type of pig. My query string is:
> >
> > https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
> >
> > My very simple script for testing is this. But when I look at the part
> file
> > it returns null.
> >
> > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> > (M:map[]);
> >
> > rmf '/examples/map/output/';
> >
> > STORE B INTO '/examples/map/output/';
> >
> > I am working on analyzing clickstream data. For this I need to first
> parse
> > these strings into files representing dimensions and also do
> sessionization
> > on them before loading it into RDBMS.
> >
>

Re: help with Map Type

Posted by Jonathan Coveney <jc...@gmail.com>.
Your link does not work, I recommend using pastebin.

2012/6/18 Mohit Anchlia <mo...@gmail.com>

> I am trying to parse URL using map type of pig. My query string is:
>
> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
>
> My very simple script for testing is this. But when I look at the part file
> it returns null.
>
> A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> (M:map[]);
>
> rmf '/examples/map/output/';
>
> STORE B INTO '/examples/map/output/';
>
> I am working on analyzing clickstream data. For this I need to first parse
> these strings into files representing dimensions and also do sessionization
> on them before loading it into RDBMS.
>