You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Brian Adams <br...@chacha.com> on 2010/07/06 20:13:21 UTC
Placing 0 in position $1 when position $0 is emtpy/null

joined = JOIN webordered BY ngram FULL OUTER, smsordered BY ngram; 
these are basically (ngram,count) joined to (ngram,count)
In the case where Dog is webordered but not smsordered, I still want to
put 0 in the count column so i can eventually do a sum column.

I get something like this.

dog, 500, dog,10000
cat,500,(nothing)
(nothing),mouse,100

If i wanted to do a sum column like
FOREACH GENERATE $0,$1,$2,$3,($1+$3) as thesum

thesum will be blank in the case of cat, or mouse above. 

How do I work around this? 

I was trying conditionals like so:
FOREACH joined GENERATE $0 AS smsgram,($0=='null'?(int)0:$1) AS
smscount,$2 AS webgram,($2=='null'?(int)0:$3) AS webcount,($1+$3) AS
sumcount;

Thanks ahead of time guys and gals.


On Tue, 2010-07-06 at 13:30 +0800, Yuting Lin wrote:
> Thanks Chao
> 
> It works in the m/r code. Thanks.
> 
> Regards
> Yuting
> 
> On Tue, Jul 6, 2010 at 1:12 PM, Chao Wang <ch...@yahoo-inc.com> wrote:
> 
> > Make sure you call "BasicTableOutputFormat.close()" at the end of your
> > m/r job. It will create .meta for Zebra tables.
> >
> > Chao
> >
> >
> > -----Original Message-----
> > From: Yuting Lin [mailto:mikelin36@gmail.com]
> > Sent: Monday, July 05, 2010 9:03 PM
> > To: pig-user@hadoop.apache.org
> > Subject: zebra TableInputFormat errors: Missing Meta File .meta
> >
> > Hi all
> >
> > I am trying to load the Zebra file generated from BasicTableOutputFormat
> > in
> > the MapReduce code. The code is similar with
> > org.apache.hadoop.zebra.mapred.TableMapReduceExample. But it throws
> > following exceptions while it splits the data in TableInputFormat:
> >
> > Exception in thread "main" java.io.IOException: BasicTable.Reader
> > constructor failed : Missing Meta File of t_table/CG0/.meta
> >    at
> > org.apache.hadoop.zebra.io.BasicTable$Reader.<init>(BasicTable.java:328)
> >    at
> > org.apache.hadoop.zebra.io.BasicTable$Reader.<init>(BasicTable.java:287)
> >    at
> > org.apache.hadoop.zebra.mapred.TableInputFormat.getSplits(TableInputForm
> > at.java:883)
> >
> > The directory generated from BasicTableOutputFormat contains the
> > following
> > files (without .meta)
> >
> > /table/.btschema
> > /table/CG0
> > /table/CG0/.schema
> > /table/CG0/part-0
> > /table/CG1
> > /table/CG1/.schema
> > /table/CG1/part-0
> > /table/_temporary
> > /table/_temporary/CG0
> > /table/_temporary/CG1
> >
> > The same erroe occurs if I store and then load the data in Pig interface
> > (missing .meta file).
> >
> > How can I transfer the raw data into zebra format and then load them in
> > Pig
> > or MR program? Any suggestions would be appreciated!
> >
> > -
> > Regards
> > Yuting
> >