You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by 施兴 <pa...@gmail.com> on 2008/10/22 08:01:03 UTC
The FLATTEN can't process multi columns in types-stable-1
code:
*
*
*term2url_orig = LOAD 'term2url_orig' AS (term, termscore:double,
url:chararray, total:double);
set job.name 'same term';
term2url_termqueryscore_group = GROUP term2url_orig BY term PARALLEL 2;
term2url_termqueryscore = FOREACH term2url_termqueryscore_group GENERATE
FLATTEN(group) AS term, FLATTEN(term2url_orig.(termscore, url)),
SUM(term2url_orig.termscore) AS termallscore;
S*TORE *term2url_termqueryscore into '**term2url_termqueryscore';
grunt>cat term2url_termqueryscore
<term>** <termscore>** <termallscore>(no url)**
养 0.333333 1.666665
养 0.333333 1.666665
养 0.333333 1.666665
养 0.333333 1.666665
养 0.333333 1.666665
赛鸽 0.333333 1.666665
赛鸽 0.333333 1.666665
赛鸽 0.333333 1.666665
赛鸽 0.333333 1.666665
赛鸽 0.333333 1.666665
昆明理工 0.166667 0.333334
昆明理工 0.166667 0.333334
*
And the url is missing, when I move the url ahead of the termscore(*
FLATTEN(term2url_orig.(**url, **termscore))*, the termscore is also missing.
*grunt>cat term2url_termqueryscore *
*<term> <url>** <termallscore>(no termscore)*
*养 bird.intopet.com/98962.shtml 1.666665
养 gd-hxloft.com/html/10/n-1410.html 1.666665
养 wenfeng0083.bokee.com 1.666665
养 gdxh.chinaxinge.com/detail.asp?id=2303 1.666665
养 www.hpw-js.com/article/article2.aspx?id=153616 1.666665
赛鸽 bird.intopet.com/98962.shtml 1.666665
赛鸽 gdxh.chinaxinge.com/detail.asp?id=2303 1.666665
赛鸽 wenfeng0083.bokee.com 1.666665
赛鸽 gd-hxloft.com/html/10/n-1410.html 1.666665
赛鸽 www.hpw-js.com/article/article2.aspx?id=153616 1.666665
昆明理工 learning.sohu.com/20080715/n258160475.shtml 0.333334
昆明理工 www.henanart.com/gaokao/kaofen/kf2008/200804/14867.html 0.333334*
Does the flatten can't process multi columns? I used the types-stable-1.
--
Best wishes!
My Friend~
Re: The FLATTEN can't process multi columns in types-stable-1
Posted by Daniel Dai <da...@gmail.com>.
Yes flatten on multi columns is wrong in types-stable-1. It is fixed in
https://issues.apache.org/jira/browse/PIG-495. Latest pig snapshot should be
ok.
Daniel
----- Original Message -----
From: "施兴" <pa...@gmail.com>
To: <pi...@incubator.apache.org>
Sent: Wednesday, October 22, 2008 2:01 AM
Subject: The FLATTEN can't process multi columns in types-stable-1
> code:
> *
> *
> *term2url_orig = LOAD 'term2url_orig' AS (term, termscore:double,
> url:chararray, total:double);
>
> set job.name 'same term';
>
> term2url_termqueryscore_group = GROUP term2url_orig BY term PARALLEL 2;
>
> term2url_termqueryscore = FOREACH term2url_termqueryscore_group GENERATE
> FLATTEN(group) AS term, FLATTEN(term2url_orig.(termscore, url)),
> SUM(term2url_orig.termscore) AS termallscore;
> S*TORE *term2url_termqueryscore into '**term2url_termqueryscore';
>
> grunt>cat term2url_termqueryscore
> <term>** <termscore>** <termallscore>(no url)**
> 养 0.333333 1.666665
> 养 0.333333 1.666665
> 养 0.333333 1.666665
> 养 0.333333 1.666665
> 养 0.333333 1.666665
> 赛鸽 0.333333 1.666665
> 赛鸽 0.333333 1.666665
> 赛鸽 0.333333 1.666665
> 赛鸽 0.333333 1.666665
> 赛鸽 0.333333 1.666665
> 昆明理工 0.166667 0.333334
> 昆明理工 0.166667 0.333334
>
> *
> And the url is missing, when I move the url ahead of the termscore(*
> FLATTEN(term2url_orig.(**url, **termscore))*, the termscore is also
> missing.
>
> *grunt>cat term2url_termqueryscore *
> *<term> <url>** <termallscore>(no termscore)*
> *养 bird.intopet.com/98962.shtml 1.666665
> 养 gd-hxloft.com/html/10/n-1410.html 1.666665
> 养 wenfeng0083.bokee.com 1.666665
> 养 gdxh.chinaxinge.com/detail.asp?id=2303 1.666665
> 养 www.hpw-js.com/article/article2.aspx?id=153616 1.666665
> 赛鸽 bird.intopet.com/98962.shtml 1.666665
> 赛鸽 gdxh.chinaxinge.com/detail.asp?id=2303 1.666665
> 赛鸽 wenfeng0083.bokee.com 1.666665
> 赛鸽 gd-hxloft.com/html/10/n-1410.html 1.666665
> 赛鸽 www.hpw-js.com/article/article2.aspx?id=153616 1.666665
> 昆明理工 learning.sohu.com/20080715/n258160475.shtml 0.333334
> 昆明理工 www.henanart.com/gaokao/kaofen/kf2008/200804/14867.html
> 0.333334*
>
> Does the flatten can't process multi columns? I used the types-stable-1.
>
> --
> Best wishes!
> My Friend~
>