You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by yonghu <yo...@gmail.com> on 2012/06/11 18:07:11 UTC

How can I use load function to load bag field?

Dear All,

How can I define UDF load function to load the bag field? Such as A = LOAD
'location' as (filed_name : bag {}). Can anyone show me an example code?

Regards!

Yong

Re: How can I use load function to load bag field?

Posted by yonghu <yo...@gmail.com>.
Thanks for your guys. I tried the code and found out what was the right
pattern of the bag which could be loaded.

regards!

Yong

On Mon, Jun 11, 2012 at 10:32 PM, Russell Jurney
<ru...@gmail.com>wrote:

> my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int);
> by_name = foreach (group my_data by name) generate group as name,
> my_data.(val1, val2) as my_data;
> store by_name into 'new_location';
>
> grouped_data = LOAD 'new_location') AS (name:chararray,
> my_bag:bag{T2:tuple(val1:int, val2:int)});
> -- Wallah!
>
> On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > Yong,
> >
> > If your data is not in the form of a bag, then there is no reason to load
> > it in as a bag. You should load it in as chararray, int, int, and then
> you
> > can transform it into the form you want via the script itself.
> >
> > 2012/6/11 yonghu <yo...@gmail.com>
> >
> > > Dear Russell,
> > >
> > > My pig version is 0.91. I have tried a little bit. But I got a problem.
> > My
> > > data is looks like:
> > >
> > > henrietta    1    25
> > > sally    1    82
> > > fred    2    120
> > > elsie    3    45
> > > tom    1    82
> > > tom    4    98
> > > sally    2    87
> > >
> > > the delimiter is '\t'.
> > >
> > > I use the command to load the data
> > >
> > > A = LOAD '/home/yonghu/test/student.txt' AS
> > > >> (name:chararray,B:{T1:(id:int,result:int)});
> > >
> > > then I got the following errors:
> > >
> > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column
> 42>
> > > mismatched input ';' expecting RIGHT_PAREN
> > > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
> > >
> > > what does here right_paren mean? Is there any request of the input
> data?
> > >
> > > Thanks.
> > >
> > > Yong
> > >
> > > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <
> > russell.jurney@gmail.com
> > > >wrote:
> > >
> > > > High five! o/\o
> > > >
> > > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <yo...@gmail.com>
> > wrote:
> > > >
> > > > > Dear Russell,
> > > > >
> > > > > Thanks for your response.
> > > > >
> > > > > Yong
> > > > >
> > > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > > > russell.jurney@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Doesn't need a UDF (if it's PigStorage or something else
> > supported),
> > > > > just a
> > > > > > cast.
> > > > > >
> > > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > > > >
> > > > > > Pulled from the docs:
> > > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> > > B:{T2:(t1:float,t2:float)},
> > > > > > M:[] );
> > > > > >
> > > > > >
> > > > > > Russell Jurney
> > > > > > twitter.com/rjurney
> > > > > > russell.jurney@gmail.com
> > > > > > datasyndrome.com
> > > > > >
> > > > > > On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com>
> wrote:
> > > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > How can I define UDF load function to load the bag field? Such as
> > A =
> > > > > LOAD
> > > > > > 'location' as (filed_name : bag {}). Can anyone show me an
> example
> > > > code?
> > > > > >
> > > > > > Regards!
> > > > > >
> > > > > > Yong
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: How can I use load function to load bag field?

Posted by Russell Jurney <ru...@gmail.com>.
my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int);
by_name = foreach (group my_data by name) generate group as name,
my_data.(val1, val2) as my_data;
store by_name into 'new_location';

grouped_data = LOAD 'new_location') AS (name:chararray,
my_bag:bag{T2:tuple(val1:int, val2:int)});
-- Wallah!

On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> Yong,
>
> If your data is not in the form of a bag, then there is no reason to load
> it in as a bag. You should load it in as chararray, int, int, and then you
> can transform it into the form you want via the script itself.
>
> 2012/6/11 yonghu <yo...@gmail.com>
>
> > Dear Russell,
> >
> > My pig version is 0.91. I have tried a little bit. But I got a problem.
> My
> > data is looks like:
> >
> > henrietta    1    25
> > sally    1    82
> > fred    2    120
> > elsie    3    45
> > tom    1    82
> > tom    4    98
> > sally    2    87
> >
> > the delimiter is '\t'.
> >
> > I use the command to load the data
> >
> > A = LOAD '/home/yonghu/test/student.txt' AS
> > >> (name:chararray,B:{T1:(id:int,result:int)});
> >
> > then I got the following errors:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 42>
> > mismatched input ';' expecting RIGHT_PAREN
> > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
> >
> > what does here right_paren mean? Is there any request of the input data?
> >
> > Thanks.
> >
> > Yong
> >
> > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > High five! o/\o
> > >
> > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <yo...@gmail.com>
> wrote:
> > >
> > > > Dear Russell,
> > > >
> > > > Thanks for your response.
> > > >
> > > > Yong
> > > >
> > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > > russell.jurney@gmail.com
> > > > >wrote:
> > > >
> > > > > Doesn't need a UDF (if it's PigStorage or something else
> supported),
> > > > just a
> > > > > cast.
> > > > >
> > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > > >
> > > > > Pulled from the docs:
> > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > > >
> > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > > >
> > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> > B:{T2:(t1:float,t2:float)},
> > > > > M:[] );
> > > > >
> > > > >
> > > > > Russell Jurney
> > > > > twitter.com/rjurney
> > > > > russell.jurney@gmail.com
> > > > > datasyndrome.com
> > > > >
> > > > > On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:
> > > > >
> > > > > Dear All,
> > > > >
> > > > > How can I define UDF load function to load the bag field? Such as
> A =
> > > > LOAD
> > > > > 'location' as (filed_name : bag {}). Can anyone show me an example
> > > code?
> > > > >
> > > > > Regards!
> > > > >
> > > > > Yong
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: How can I use load function to load bag field?

Posted by Jonathan Coveney <jc...@gmail.com>.
Yong,

If your data is not in the form of a bag, then there is no reason to load
it in as a bag. You should load it in as chararray, int, int, and then you
can transform it into the form you want via the script itself.

2012/6/11 yonghu <yo...@gmail.com>

> Dear Russell,
>
> My pig version is 0.91. I have tried a little bit. But I got a problem. My
> data is looks like:
>
> henrietta    1    25
> sally    1    82
> fred    2    120
> elsie    3    45
> tom    1    82
> tom    4    98
> sally    2    87
>
> the delimiter is '\t'.
>
> I use the command to load the data
>
> A = LOAD '/home/yonghu/test/student.txt' AS
> >> (name:chararray,B:{T1:(id:int,result:int)});
>
> then I got the following errors:
>
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 42>
> mismatched input ';' expecting RIGHT_PAREN
> Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
>
> what does here right_paren mean? Is there any request of the input data?
>
> Thanks.
>
> Yong
>
> On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > High five! o/\o
> >
> > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <yo...@gmail.com> wrote:
> >
> > > Dear Russell,
> > >
> > > Thanks for your response.
> > >
> > > Yong
> > >
> > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > russell.jurney@gmail.com
> > > >wrote:
> > >
> > > > Doesn't need a UDF (if it's PigStorage or something else supported),
> > > just a
> > > > cast.
> > > >
> > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > >
> > > > Pulled from the docs:
> > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > >
> > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > >
> > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> B:{T2:(t1:float,t2:float)},
> > > > M:[] );
> > > >
> > > >
> > > > Russell Jurney
> > > > twitter.com/rjurney
> > > > russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > > > On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:
> > > >
> > > > Dear All,
> > > >
> > > > How can I define UDF load function to load the bag field? Such as A =
> > > LOAD
> > > > 'location' as (filed_name : bag {}). Can anyone show me an example
> > code?
> > > >
> > > > Regards!
> > > >
> > > > Yong
> > > >
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>

Re: How can I use load function to load bag field?

Posted by yonghu <yo...@gmail.com>.
Dear Russell,

My pig version is 0.91. I have tried a little bit. But I got a problem. My
data is looks like:

henrietta    1    25
sally    1    82
fred    2    120
elsie    3    45
tom    1    82
tom    4    98
sally    2    87

the delimiter is '\t'.

I use the command to load the data

A = LOAD '/home/yonghu/test/student.txt' AS
>> (name:chararray,B:{T1:(id:int,result:int)});

then I got the following errors:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 42>
mismatched input ';' expecting RIGHT_PAREN
Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log

what does here right_paren mean? Is there any request of the input data?

Thanks.

Yong

On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <ru...@gmail.com>wrote:

> High five! o/\o
>
> On Mon, Jun 11, 2012 at 11:51 AM, yonghu <yo...@gmail.com> wrote:
>
> > Dear Russell,
> >
> > Thanks for your response.
> >
> > Yong
> >
> > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > Doesn't need a UDF (if it's PigStorage or something else supported),
> > just a
> > > cast.
> > >
> > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > >
> > > Pulled from the docs:
> > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > >
> > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > >
> > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int), B:{T2:(t1:float,t2:float)},
> > > M:[] );
> > >
> > >
> > > Russell Jurney
> > > twitter.com/rjurney
> > > russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> > > On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:
> > >
> > > Dear All,
> > >
> > > How can I define UDF load function to load the bag field? Such as A =
> > LOAD
> > > 'location' as (filed_name : bag {}). Can anyone show me an example
> code?
> > >
> > > Regards!
> > >
> > > Yong
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: How can I use load function to load bag field?

Posted by Russell Jurney <ru...@gmail.com>.
High five! o/\o

On Mon, Jun 11, 2012 at 11:51 AM, yonghu <yo...@gmail.com> wrote:

> Dear Russell,
>
> Thanks for your response.
>
> Yong
>
> On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Doesn't need a UDF (if it's PigStorage or something else supported),
> just a
> > cast.
> >
> > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> >
> > Pulled from the docs:
> http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> >
> > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> >
> > A = LOAD 'mydata' AS (T1:(f1:int, f2:int), B:{T2:(t1:float,t2:float)},
> > M:[] );
> >
> >
> > Russell Jurney
> > twitter.com/rjurney
> > russell.jurney@gmail.com
> > datasyndrome.com
> >
> > On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:
> >
> > Dear All,
> >
> > How can I define UDF load function to load the bag field? Such as A =
> LOAD
> > 'location' as (filed_name : bag {}). Can anyone show me an example code?
> >
> > Regards!
> >
> > Yong
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: How can I use load function to load bag field?

Posted by yonghu <yo...@gmail.com>.
Dear Russell,

Thanks for your response.

Yong

On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <ru...@gmail.com>wrote:

> Doesn't need a UDF (if it's PigStorage or something else supported), just a
> cast.
>
> foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
>
> Pulled from the docs: http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
>
> A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
>
> A = LOAD 'mydata' AS (T1:(f1:int, f2:int), B:{T2:(t1:float,t2:float)},
> M:[] );
>
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:
>
> Dear All,
>
> How can I define UDF load function to load the bag field? Such as A = LOAD
> 'location' as (filed_name : bag {}). Can anyone show me an example code?
>
> Regards!
>
> Yong
>

Re: How can I use load function to load bag field?

Posted by Russell Jurney <ru...@gmail.com>.
Doesn't need a UDF (if it's PigStorage or something else supported), just a
cast.

foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};

Pulled from the docs: http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );

A = LOAD 'mydata' AS (T1:(f1:int, f2:int), B:{T2:(t1:float,t2:float)}, M:[] );


Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Jun 11, 2012, at 9:07 AM, yonghu <yo...@gmail.com> wrote:

Dear All,

How can I define UDF load function to load the bag field? Such as A = LOAD
'location' as (filed_name : bag {}). Can anyone show me an example code?

Regards!

Yong