You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/05/17 17:09:55 UTC

[jira] Commented: (PIG-242) Incremental operation

    [ https://issues.apache.org/jira/browse/PIG-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597737#action_12597737 ] 

Pi Song commented on PIG-242:
-----------------------------

I also like this concept. I will extend your thinking here to start a discussion.

incremental data = new records + updates

Let U(t1,t2) be a function telling you all the updates between t1 and t2. This will be the incremental update process:-

{noformat}
If U(t1,t2) is nothing: 
       Process( f(delta T) )
Else:  // meaning there are changes
       Process( f(delta T) )
       Merge( f(T), U(t1,t2))
{noformat}

That is we will need U and Merge in order to have the whole solution.
By saying we assume there is no updates, the solution can be much simplified. And that is the special case you mentioned.
Though I think just assuming like that is a bit dodgy. To be on the safe side, we should need G(t1,t2) telling if there is an update.

> Incremental operation
> ---------------------
>
>                 Key: PIG-242
>                 URL: https://issues.apache.org/jira/browse/PIG-242
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: John DeTreville
>
> Some Pig programs repeatedly perform operations on different versions of tables, where these versions differ only slightly. For example, a program may compute f(T) at one point and f(T + delta T) at a later point. In many such cases, having already computed f(T) can allow us to speed up the computation of f(T + delta T). For example, if f is a map operation, then f(T + delta T) is f(T) + f(delta T), which can be computed relatively rapidly if delta T is small and f(T) is already known.
> It is already possible but often tedious for Pig programers to perform incremental operations. It would possibly help if Pig provided syntax for incremental operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.