You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sameer Tilak <ss...@live.com> on 2013/12/30 19:49:58 UTC
A question regarding schema
Hi All,I have a UDF that returns a tuple. The number of elements in the tuple will differ for each user. For example:
(userid1, item1, item2, item 100, item 400)(userid1, item1, item200)(userid1, item1, item2, item 100, item200, item250, item300, item 400)(userid1, item 100, item 200, item250, item300, item380, item400, item450, item480, item560, item800, item1000)
etc.
Pig script:
A = LOAD '/scratch/input.seq' USING $SEQFILE_LOADER ( '-c $TEXT_CONVERTER', '-c $TEXT_CONVERTER') AS (key: chararray, value: chararray);
UserItemAssoc = FOREACH A GENERATE myparser.myUDF(key, value) AS {(userid: chararray, itemtid: How to specify this???)};
If I want to specify the schema in the AS clause, how do I do it since the number of fields will differ in each row? Is it possible to somehow do this dynamically?
Re: A question regarding schema
Posted by centerqi hu <ce...@gmail.com>.
I encountered your problem, you can handle as follows
There are two ways to add field
One is append(object) function, the other is set(index,object) function
my code:
DataBag inputBag = (DataBag)input.get(0);
for (Tuple t : inputBag) {
if(isDetailUrl(t) > 0){
t.append(this.orderID);
//int tupleSize = t.size();
//t.set(tupleSize+1, this.orderID);
outputBag.add(t);
this.orderID = null;
}
}
2013/12/31 Sameer Tilak <ss...@live.com>
> Hi All,I have a UDF that returns a tuple. The number of elements in the
> tuple will differ for each user. For example:
> (userid1, item1, item2, item 100, item 400)(userid1, item1,
> item200)(userid1, item1, item2, item 100, item200, item250, item300, item
> 400)(userid1, item 100, item 200, item250, item300, item380, item400,
> item450, item480, item560, item800, item1000)
> etc.
> Pig script:
> A = LOAD '/scratch/input.seq' USING $SEQFILE_LOADER ( '-c
> $TEXT_CONVERTER', '-c $TEXT_CONVERTER') AS (key: chararray, value:
> chararray);
> UserItemAssoc = FOREACH A GENERATE myparser.myUDF(key, value) AS {(userid:
> chararray, itemtid: How to specify this???)};
> If I want to specify the schema in the AS clause, how do I do it since the
> number of fields will differ in each row? Is it possible to somehow do this
> dynamically?
>
--
centerqi@gmail.com|齐忠