You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Adam Kramer (JIRA)" <ji...@apache.org> on 2009/05/11 21:23:45 UTC

[jira] Created: (HIVE-478) Surface "processor time" for queries

Surface "processor time" for queries
------------------------------------

                 Key: HIVE-478
                 URL: https://issues.apache.org/jira/browse/HIVE-478
             Project: Hadoop Hive
          Issue Type: Wish
          Components: Logging, Query Processor
            Reporter: Adam Kramer


We currently list real-time metrics of how long queries take--"finished in: 1min 13sec" appears on the job tracker. However, this is affected by a lot more than just the quality or implementation of the query. For example, number of mappers used varies a lot when you use subqueries versus single-query aggregation, as does the amount of work necessary.

For implementation comparisons (e.g., "should I use this version of the query or that one"), ti would be great to know the processor time used instead of the real time used...both in terms of "mapper cpu seconds" and "reducer cpu seconds."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-478) Surface "processor time" for queries

Posted by "Adam Kramer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729787#action_12729787 ] 

Adam Kramer commented on HIVE-478:
----------------------------------

Also, in case it was not obvious, the current system counts time going by while mappers/reducers are "pending." This request would tell me how much time I actually used, e.g., not include time spent waiting for mappers or reducers.

> Surface "processor time" for queries
> ------------------------------------
>
>                 Key: HIVE-478
>                 URL: https://issues.apache.org/jira/browse/HIVE-478
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Logging, Query Processor
>            Reporter: Adam Kramer
>
> We currently list real-time metrics of how long queries take--"finished in: 1min 13sec" appears on the job tracker. However, this is affected by a lot more than just the quality or implementation of the query. For example, number of mappers used varies a lot when you use subqueries versus single-query aggregation, as does the amount of work necessary.
> For implementation comparisons (e.g., "should I use this version of the query or that one"), ti would be great to know the processor time used instead of the real time used...both in terms of "mapper cpu seconds" and "reducer cpu seconds."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-478) Surface "processor time" for queries

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797164#action_12797164 ] 

Namit Jain commented on HIVE-478:
---------------------------------

Can you set the configuration parameter hive.task.progress to true. It will dump the total time taken by each operator.
Please check if this meets your requirements, we can enhance it to add more stuff.

> Surface "processor time" for queries
> ------------------------------------
>
>                 Key: HIVE-478
>                 URL: https://issues.apache.org/jira/browse/HIVE-478
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Logging, Query Processor
>            Reporter: Adam Kramer
>
> We currently list real-time metrics of how long queries take--"finished in: 1min 13sec" appears on the job tracker. However, this is affected by a lot more than just the quality or implementation of the query. For example, number of mappers used varies a lot when you use subqueries versus single-query aggregation, as does the amount of work necessary.
> For implementation comparisons (e.g., "should I use this version of the query or that one"), ti would be great to know the processor time used instead of the real time used...both in terms of "mapper cpu seconds" and "reducer cpu seconds."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.