You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2012/08/02 07:30:02 UTC

[jira] [Created] (PIG-2855) Provide a method to measure time spent in UDFs

Dmitriy V. Ryaboy created PIG-2855:
--------------------------------------

             Summary: Provide a method to measure time spent in UDFs
                 Key: PIG-2855
                 URL: https://issues.apache.org/jira/browse/PIG-2855
             Project: Pig
          Issue Type: New Feature
            Reporter: Dmitriy V. Ryaboy


When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2855:
-----------------------------------

    Status: Patch Available  (was: Open)
    
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428415#comment-13428415 ] 

Dmitriy V. Ryaboy commented on PIG-2855:
----------------------------------------

I don't know a way around that one -- I could use the object id, but then the counters wouldn't get aggregated across mappers and we'd have a ridiculous counter explosion.
At least with FuncSpec (rather than class name) different invocations with different initialization args go to different groups.
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2855:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.11
           Status: Resolved  (was: Patch Available)

Committed to trunk.
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.11
>
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431209#comment-13431209 ] 

Jonathan Coveney commented on PIG-2855:
---------------------------------------

+1
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2855:
-----------------------------------

    Attachment: PIG-2855.patch

Attaching patch, complete with docs.

Pasting usage here:

# Use this option to turn on UDF timers. This will cause two 
# counters to be tracked for every UDF and LoadFunc in your script:
# approx_microsecs measures approximate time spent inside a UDF
# approx_invocations reports the approximate number of times the UDF was invoked
# pig.udf.profile=false
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428409#comment-13428409 ] 

Jonathan Coveney commented on PIG-2855:
---------------------------------------

Love the idea.

One potential issue... with POUser func, it looks like your counter group is FuncSpec#toString, which means multiple invocations of the same UDF in different parts of the code will go to the same counter.
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy reassigned PIG-2855:
--------------------------------------

    Assignee: Dmitriy V. Ryaboy
    
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2855:
-----------------------------------

    Attachment: PIG-2855.2.patch

Forgot to add the new PigConfiguration file.
                
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2855) Provide a method to measure time spent in UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2855:
-----------------------------------

    Release Note: 
New Feature: Timing your UDFs

The first step to improving performance and efficiency is measuring where the time is going. Pig provides a light-weight method for approximately measuring how much time is spent in different user-defined functions (UDFs) and Loaders. Simply set the pig.udf.profile property to true. This will cause new counters to be tracked for all Map-Reduce jobs generated by your script: approx_microsecs measures the approximate amount of time spent in a UDF, and approx_invocations measures the approximate number of times the UDF was invoked. Note that this may produce a large number of counters (two per UDF). Excessive amounts of counters can lead to poor JobTracker performance, so use this feature carefully, and preferably on a test cluster.

    
> Provide a method to measure time spent in UDFs
> ----------------------------------------------
>
>                 Key: PIG-2855
>                 URL: https://issues.apache.org/jira/browse/PIG-2855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.11
>
>         Attachments: PIG-2855.2.patch, PIG-2855.patch
>
>
> When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira