You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/08/04 19:38:44 UTC

[jira] Created: (PIG-357) progress reported on every tuple

progress reported on every tuple
--------------------------------

                 Key: PIG-357
                 URL: https://issues.apache.org/jira/browse/PIG-357
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Olga Natkovich
             Fix For: types_branch


Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.

We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-357) progress reported on every tuple

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-357:
---------------------------

    Priority: Major  (was: Critical)

I tried commenting out all reporting and in runs of 200 million rows on 25 machines it made absolutely no measurable difference.  We may still need to fix this, but it isn't an immediate performance issue.

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-357) progress reported on every tuple

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-357:
-------------------------------

    Priority: Minor  (was: Major)

Lowering priority since it is not causing performance issues

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-357) progress reported on every tuple

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-357:
------------------------------

    Assignee: Alan Gates

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-357) progress reported on every tuple

Posted by "Sam Pullara (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619634#action_12619634 ] 

Sam Pullara commented on PIG-357:
---------------------------------

I would do this using a timer rather than a absolute number of tuples due to the vagaries of how long processing might take.  Maybe every 10s?  You could check a boolean to see if the timer went off every tuple, if so report and reset timer.

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-357) progress reported on every tuple

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619635#action_12619635 ] 

Olga Natkovich commented on PIG-357:
------------------------------------

I agree that it is a better solution but it adds more complexity to the system. We might choose to do this later. The count while not optimal worked reasonably well in the past

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-357) PERFORMANCE: progress reported on every tuple

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-357:
---------------------------

    Summary: PERFORMANCE: progress reported on every tuple  (was: progress reported on every tuple)

> PERFORMANCE: progress reported on every tuple
> ---------------------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-357) progress reported on every tuple

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-357:
-------------------------------

    Priority: Critical  (was: Major)

> progress reported on every tuple
> --------------------------------
>
>                 Key: PIG-357
>                 URL: https://issues.apache.org/jira/browse/PIG-357
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>
>
> Currently, if the reporter is set, we report progress on every tuple. This could be too expensive and impact performance. In the old code, we used to do it on every 1000th tuple or something like that.
> We might want to go to similar model.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.