You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Huo Zhu <zh...@gmail.com> on 2012/09/04 13:16:58 UTC

schema of pig flatten

i recently meet this problem in my work, it's about pig flatten. i use a
simple example to express it

two files
===file1===
1_a
2_b
4_d

===file2 (tab seperated)===
1 a
2 b
3 c

i tried three scripts in pig 0.9 and pig 0.10, and get some exceptions

pig script 1:

a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
c = join a1 by num, b by num;
dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer

pig script 2:

a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
a2 = foreach a1 generate (int)num as num, ch as ch;
c = join a2 by num, b by num;
dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer

pig script 3:

a = load 'file1' as (str:chararray);
b = load 'file2' as (num:int, ch:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
a2 = foreach a1 generate (int)$0 as num, $1 as ch;
c = join a2 by num, b by num;
dump c;   -- right

could somebody explain why script1 and script2 fail but script3 success?
thanks !

Re: schema of pig flatten

Posted by Huo Zhu <zh...@gmail.com>.
i think it's a wonderful improvement,   waiting for you good news

On 5 September 2012 16:22, Gianmarco De Francisci Morales
<gd...@apache.org>wrote:

> Script 2 has " as (num:int, ch:chararray); " before the cast.
> I guess that's why you get the error.
> We were discussing of doing implicit casting as a result of this kind of
> syntax, but we never came around implementing it (shame on me).
> See PIG-2315 <https://issues.apache.org/jira/browse/PIG-2315>
>
> Cheers,
> --
> Gianmarco
>
>
>
> On Wed, Sep 5, 2012 at 4:08 AM, Huo Zhu <zh...@gmail.com> wrote:
>
> > script 2 and script 3 are both explict cast, why result are different?
> >
> > On 4 September 2012 22:04, Russell Jurney <ru...@gmail.com>
> > wrote:
> >
> > > You must cast explicitly:
> > >
> > > b = foreach a generate (int)foo as foo:int;
> > >
> > > Russell Jurney
> > > twitter.com/rjurney
> > > russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> > > On Sep 4, 2012, at 4:17 AM, Huo Zhu <zh...@gmail.com> wrote:
> > >
> > > > i recently meet this problem in my work, it's about pig flatten. i
> use
> > a
> > > > simple example to express it
> > > >
> > > > two files
> > > > ===file1===
> > > > 1_a
> > > > 2_b
> > > > 4_d
> > > >
> > > > ===file2 (tab seperated)===
> > > > 1 a
> > > > 2 b
> > > > 3 c
> > > >
> > > > i tried three scripts in pig 0.9 and pig 0.10, and get some
> exceptions
> > > >
> > > > pig script 1:
> > > >
> > > > a = load 'file1' as (str:chararray);
> > > > b = load 'file2' as (num:int, ch:chararray);
> > > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> > > ch:chararray);
> > > > c = join a1 by num, b by num;
> > > > dump c;   -- exception java.lang.String cannot be cast to
> > > java.lang.Integer
> > > >
> > > > pig script 2:
> > > >
> > > > a = load 'file1' as (str:chararray);
> > > > b = load 'file2' as (num:int, ch:chararray);
> > > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> > > ch:chararray);
> > > > a2 = foreach a1 generate (int)num as num, ch as ch;
> > > > c = join a2 by num, b by num;
> > > > dump c;   -- exception java.lang.String cannot be cast to
> > > java.lang.Integer
> > > >
> > > > pig script 3:
> > > >
> > > > a = load 'file1' as (str:chararray);
> > > > b = load 'file2' as (num:int, ch:chararray);
> > > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
> > > > a2 = foreach a1 generate (int)$0 as num, $1 as ch;
> > > > c = join a2 by num, b by num;
> > > > dump c;   -- right
> > > >
> > > > could somebody explain why script1 and script2 fail but script3
> > success?
> > > > thanks !
> > >
> >
> >
> >
> > --
> > 祝好
> >
>



-- 
祝好

Re: schema of pig flatten

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Script 2 has " as (num:int, ch:chararray); " before the cast.
I guess that's why you get the error.
We were discussing of doing implicit casting as a result of this kind of
syntax, but we never came around implementing it (shame on me).
See PIG-2315 <https://issues.apache.org/jira/browse/PIG-2315>

Cheers,
--
Gianmarco



On Wed, Sep 5, 2012 at 4:08 AM, Huo Zhu <zh...@gmail.com> wrote:

> script 2 and script 3 are both explict cast, why result are different?
>
> On 4 September 2012 22:04, Russell Jurney <ru...@gmail.com>
> wrote:
>
> > You must cast explicitly:
> >
> > b = foreach a generate (int)foo as foo:int;
> >
> > Russell Jurney
> > twitter.com/rjurney
> > russell.jurney@gmail.com
> > datasyndrome.com
> >
> > On Sep 4, 2012, at 4:17 AM, Huo Zhu <zh...@gmail.com> wrote:
> >
> > > i recently meet this problem in my work, it's about pig flatten. i use
> a
> > > simple example to express it
> > >
> > > two files
> > > ===file1===
> > > 1_a
> > > 2_b
> > > 4_d
> > >
> > > ===file2 (tab seperated)===
> > > 1 a
> > > 2 b
> > > 3 c
> > >
> > > i tried three scripts in pig 0.9 and pig 0.10, and get some exceptions
> > >
> > > pig script 1:
> > >
> > > a = load 'file1' as (str:chararray);
> > > b = load 'file2' as (num:int, ch:chararray);
> > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> > ch:chararray);
> > > c = join a1 by num, b by num;
> > > dump c;   -- exception java.lang.String cannot be cast to
> > java.lang.Integer
> > >
> > > pig script 2:
> > >
> > > a = load 'file1' as (str:chararray);
> > > b = load 'file2' as (num:int, ch:chararray);
> > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> > ch:chararray);
> > > a2 = foreach a1 generate (int)num as num, ch as ch;
> > > c = join a2 by num, b by num;
> > > dump c;   -- exception java.lang.String cannot be cast to
> > java.lang.Integer
> > >
> > > pig script 3:
> > >
> > > a = load 'file1' as (str:chararray);
> > > b = load 'file2' as (num:int, ch:chararray);
> > > a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
> > > a2 = foreach a1 generate (int)$0 as num, $1 as ch;
> > > c = join a2 by num, b by num;
> > > dump c;   -- right
> > >
> > > could somebody explain why script1 and script2 fail but script3
> success?
> > > thanks !
> >
>
>
>
> --
> 祝好
>

Re: schema of pig flatten

Posted by Huo Zhu <zh...@gmail.com>.
script 2 and script 3 are both explict cast, why result are different?

On 4 September 2012 22:04, Russell Jurney <ru...@gmail.com> wrote:

> You must cast explicitly:
>
> b = foreach a generate (int)foo as foo:int;
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On Sep 4, 2012, at 4:17 AM, Huo Zhu <zh...@gmail.com> wrote:
>
> > i recently meet this problem in my work, it's about pig flatten. i use a
> > simple example to express it
> >
> > two files
> > ===file1===
> > 1_a
> > 2_b
> > 4_d
> >
> > ===file2 (tab seperated)===
> > 1 a
> > 2 b
> > 3 c
> >
> > i tried three scripts in pig 0.9 and pig 0.10, and get some exceptions
> >
> > pig script 1:
> >
> > a = load 'file1' as (str:chararray);
> > b = load 'file2' as (num:int, ch:chararray);
> > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> ch:chararray);
> > c = join a1 by num, b by num;
> > dump c;   -- exception java.lang.String cannot be cast to
> java.lang.Integer
> >
> > pig script 2:
> >
> > a = load 'file1' as (str:chararray);
> > b = load 'file2' as (num:int, ch:chararray);
> > a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,
> ch:chararray);
> > a2 = foreach a1 generate (int)num as num, ch as ch;
> > c = join a2 by num, b by num;
> > dump c;   -- exception java.lang.String cannot be cast to
> java.lang.Integer
> >
> > pig script 3:
> >
> > a = load 'file1' as (str:chararray);
> > b = load 'file2' as (num:int, ch:chararray);
> > a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
> > a2 = foreach a1 generate (int)$0 as num, $1 as ch;
> > c = join a2 by num, b by num;
> > dump c;   -- right
> >
> > could somebody explain why script1 and script2 fail but script3 success?
> > thanks !
>



-- 
祝好

Re: schema of pig flatten

Posted by Russell Jurney <ru...@gmail.com>.
You must cast explicitly:

b = foreach a generate (int)foo as foo:int;

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Sep 4, 2012, at 4:17 AM, Huo Zhu <zh...@gmail.com> wrote:

> i recently meet this problem in my work, it's about pig flatten. i use a
> simple example to express it
>
> two files
> ===file1===
> 1_a
> 2_b
> 4_d
>
> ===file2 (tab seperated)===
> 1 a
> 2 b
> 3 c
>
> i tried three scripts in pig 0.9 and pig 0.10, and get some exceptions
>
> pig script 1:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
> c = join a1 by num, b by num;
> dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer
>
> pig script 2:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int, ch:chararray);
> a2 = foreach a1 generate (int)num as num, ch as ch;
> c = join a2 by num, b by num;
> dump c;   -- exception java.lang.String cannot be cast to java.lang.Integer
>
> pig script 3:
>
> a = load 'file1' as (str:chararray);
> b = load 'file2' as (num:int, ch:chararray);
> a1 = foreach a generate flatten(STRSPLIT(str,'_',2));
> a2 = foreach a1 generate (int)$0 as num, $1 as ch;
> c = join a2 by num, b by num;
> dump c;   -- right
>
> could somebody explain why script1 and script2 fail but script3 success?
> thanks !