You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Todd Lee <ro...@gmail.com> on 2011/01/15 10:52:56 UTC

Loop through records row by row?

Hi,

Newbie here. So let's say I have a file which contains the closing market
price of a stock in 2010. i.e.

<Date>, <Price>
===================
2010-1-1, 10.1
2010-1-2, 10.2
2010-1-3, 9.9
2010-1-4, 10.0
2010-1-7, 11.0
...

and all I want is to find out the max number of consecutive days in which
the stock has been in a UP trend. (for the above example, the result should
be 3) It is fairly simple to solve in other programming languages using a
for-loop and a couple of temp variables, but is this possible to do in Pig?
the dataset is pretty big.

None of the examples and tutorials I found online had this kind of data
relationship between rows so I really could use your help.

Thanks a lot,
T

Re: Loop through records row by row?

Posted by Jonathan Coveney <jc...@gmail.com>.
You would have to write a udf that takes the bag and calculates what you want. I'd use the accumuator interface. A bit annoying to have to learn at first, but worth it as it will turn pig from useful to very powerful.

Your implementation would be quite similar to the accumulator implementation of max, except that the updating conditions would be trickier.

One nice thing about the udf is that it could very easily handle a file that is "stock price.    Date    price."

Sent via BlackBerry

-----Original Message-----
From: Todd Lee <ro...@gmail.com>
Date: Sat, 15 Jan 2011 01:52:56 
To: <us...@pig.apache.org>
Reply-To: user@pig.apache.org
Subject: Loop through records row by row?

Hi,

Newbie here. So let's say I have a file which contains the closing market
price of a stock in 2010. i.e.

<Date>, <Price>
===================
2010-1-1, 10.1
2010-1-2, 10.2
2010-1-3, 9.9
2010-1-4, 10.0
2010-1-7, 11.0
...

and all I want is to find out the max number of consecutive days in which
the stock has been in a UP trend. (for the above example, the result should
be 3) It is fairly simple to solve in other programming languages using a
for-loop and a couple of temp variables, but is this possible to do in Pig?
the dataset is pretty big.

None of the examples and tutorials I found online had this kind of data
relationship between rows so I really could use your help.

Thanks a lot,
T