You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by 施兴 <pa...@gmail.com> on 2008/10/22 08:01:03 UTC

The FLATTEN can't process multi columns in types-stable-1

code:
*
*
*term2url_orig = LOAD 'term2url_orig' AS (term, termscore:double,
url:chararray, total:double);

set job.name 'same term';

term2url_termqueryscore_group = GROUP term2url_orig BY term  PARALLEL 2;

term2url_termqueryscore = FOREACH term2url_termqueryscore_group GENERATE
FLATTEN(group) AS term, FLATTEN(term2url_orig.(termscore, url)),
SUM(term2url_orig.termscore) AS termallscore;
S*TORE *term2url_termqueryscore into '**term2url_termqueryscore';

grunt>cat term2url_termqueryscore
<term>**    <termscore>**    <termallscore>(no url)**
养    0.333333    1.666665
养    0.333333    1.666665
养    0.333333    1.666665
养    0.333333    1.666665
养    0.333333    1.666665
赛鸽    0.333333    1.666665
赛鸽    0.333333    1.666665
赛鸽    0.333333    1.666665
赛鸽    0.333333    1.666665
赛鸽    0.333333    1.666665
昆明理工    0.166667    0.333334
昆明理工    0.166667    0.333334

*
 And the url is missing, when I move the url ahead of the termscore(*
FLATTEN(term2url_orig.(**url, **termscore))*, the termscore is also missing.

*grunt>cat term2url_termqueryscore  *
*<term>    <url>**    <termallscore>(no termscore)*
*养    bird.intopet.com/98962.shtml    1.666665
养    gd-hxloft.com/html/10/n-1410.html    1.666665
养    wenfeng0083.bokee.com    1.666665
养    gdxh.chinaxinge.com/detail.asp?id=2303    1.666665
养    www.hpw-js.com/article/article2.aspx?id=153616    1.666665
赛鸽    bird.intopet.com/98962.shtml    1.666665
赛鸽    gdxh.chinaxinge.com/detail.asp?id=2303    1.666665
赛鸽    wenfeng0083.bokee.com    1.666665
赛鸽    gd-hxloft.com/html/10/n-1410.html    1.666665
赛鸽    www.hpw-js.com/article/article2.aspx?id=153616    1.666665
昆明理工    learning.sohu.com/20080715/n258160475.shtml    0.333334
昆明理工    www.henanart.com/gaokao/kaofen/kf2008/200804/14867.html    0.333334*

Does the flatten can't process multi columns? I used the types-stable-1.

-- 
Best wishes!
My Friend~

Re: The FLATTEN can't process multi columns in types-stable-1

Posted by Daniel Dai <da...@gmail.com>.
Yes flatten on multi columns is wrong in types-stable-1. It is fixed in 
https://issues.apache.org/jira/browse/PIG-495. Latest pig snapshot should be 
ok.

Daniel

----- Original Message ----- 
From: "施兴" <pa...@gmail.com>
To: <pi...@incubator.apache.org>
Sent: Wednesday, October 22, 2008 2:01 AM
Subject: The FLATTEN can't process multi columns in types-stable-1


> code:
> *
> *
> *term2url_orig = LOAD 'term2url_orig' AS (term, termscore:double,
> url:chararray, total:double);
>
> set job.name 'same term';
>
> term2url_termqueryscore_group = GROUP term2url_orig BY term  PARALLEL 2;
>
> term2url_termqueryscore = FOREACH term2url_termqueryscore_group GENERATE
> FLATTEN(group) AS term, FLATTEN(term2url_orig.(termscore, url)),
> SUM(term2url_orig.termscore) AS termallscore;
> S*TORE *term2url_termqueryscore into '**term2url_termqueryscore';
>
> grunt>cat term2url_termqueryscore
> <term>**    <termscore>**    <termallscore>(no url)**
> 养    0.333333    1.666665
> 养    0.333333    1.666665
> 养    0.333333    1.666665
> 养    0.333333    1.666665
> 养    0.333333    1.666665
> 赛鸽    0.333333    1.666665
> 赛鸽    0.333333    1.666665
> 赛鸽    0.333333    1.666665
> 赛鸽    0.333333    1.666665
> 赛鸽    0.333333    1.666665
> 昆明理工    0.166667    0.333334
> 昆明理工    0.166667    0.333334
>
> *
> And the url is missing, when I move the url ahead of the termscore(*
> FLATTEN(term2url_orig.(**url, **termscore))*, the termscore is also 
> missing.
>
> *grunt>cat term2url_termqueryscore  *
> *<term>    <url>**    <termallscore>(no termscore)*
> *养    bird.intopet.com/98962.shtml    1.666665
> 养    gd-hxloft.com/html/10/n-1410.html    1.666665
> 养    wenfeng0083.bokee.com    1.666665
> 养    gdxh.chinaxinge.com/detail.asp?id=2303    1.666665
> 养    www.hpw-js.com/article/article2.aspx?id=153616    1.666665
> 赛鸽    bird.intopet.com/98962.shtml    1.666665
> 赛鸽    gdxh.chinaxinge.com/detail.asp?id=2303    1.666665
> 赛鸽    wenfeng0083.bokee.com    1.666665
> 赛鸽    gd-hxloft.com/html/10/n-1410.html    1.666665
> 赛鸽    www.hpw-js.com/article/article2.aspx?id=153616    1.666665
> 昆明理工    learning.sohu.com/20080715/n258160475.shtml    0.333334
> 昆明理工    www.henanart.com/gaokao/kaofen/kf2008/200804/14867.html 
> 0.333334*
>
> Does the flatten can't process multi columns? I used the types-stable-1.
>
> -- 
> Best wishes!
> My Friend~
>