You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Grandl Robert <rg...@yahoo.com> on 2014/06/27 00:01:15 UTC

tez counters

Hi guys,

I would like to get some counters per task, like: 

1. CPU used per task ( I can use CPU_MILLISECOND counter)

2. PEAK physical memory used by task at any point in time

I know there is a PHYSICAL_MEMORY_BYTES counter updated from ResourceCalculatorProcessTree but the update is invoked only once at the end of task lifetime, so it's hard to get the PEAK memory used at any time using it. In MR, there is an updateProgressSplits() method in TaskAttemptImpl, 

from where the PHYSICAL_MEMORY_BYTES counter was updated, and also I could grab the max value used at any point in time, but in Tez that code is out. 


Do you know what is the easiest way to get PEAK_PHYSICAL_MEMORY_BYTES for a task in Tez, or easiest way to make counters update at runtime, and not only when task finished ?

3. I would like to get(or at least get good approximates) of the amount of bytes read/write from the network or disk for each task. Do you know what is the best way to purse to get those ? I saw there are some Input/Output Spec per task. Is there a way to get it from those ?

Thanks,
robert

Re: tez counters

Posted by Siddharth Seth <ss...@apache.org>.
>
> Can you suggest some place where to inject some code for counter update
> during task lifetime ?

Look at TaskReporter. That's where a heartbeat thread runs which sends
status updates to the AM for a task - which includes sending counter
updates. If you try making this change, resetting the memory and CPU
counters between tasks also needs to be looked. There's an open jira on CPU
usage being incorrect, for example.


On Thu, Jun 26, 2014 at 4:33 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Thanks for your answer.
>
> I need to look at these values as soon as a task have finished(as in MR,
> whenever a task finished, in RMContainerAllocator, I could get all the
> counters for that task - though I did not have anything for network as
> well).
>
> At the moment, I don't think there's a good way to access any of this data
> other than the counters.
> Can you suggest some place where to inject some code for counter update
> during task lifetime ?
>
> Thanks,
> Robert
>
>
>
>   On Thursday, June 26, 2014 4:08 PM, Siddharth Seth <ss...@apache.org>
> wrote:
>
>
> Hi Grandl
> Could you please file jiras to update these properties more often. At the
> moment, I don't think there's a good way to access any of this data other
> than the counters.
>
> For Input bytes from local disk / HDFS - the task should have HDFS and
> LOCAL filesystem counters which provide this information. Nothing is
> currently available for the network.
>
> Do you need to look at these values while the DAG (and task) is running or
> after the DAG completes ?
>
>
> On Thu, Jun 26, 2014 at 3:01 PM, Grandl Robert <rg...@yahoo.com> wrote:
>
> Hi guys,
>
> I would like to get some counters per task, like:
> 1. CPU used per task ( I can use CPU_MILLISECOND counter)
>
> 2. PEAK physical memory used by task at any point in time
> I know there is a PHYSICAL_MEMORY_BYTES counter updated from
> ResourceCalculatorProcessTree but the update is invoked only once at the
> end of task lifetime, so it's hard to get the PEAK memory used at any time
> using it. In MR, there is an updateProgressSplits() method in
> TaskAttemptImpl,
> from where the PHYSICAL_MEMORY_BYTES counter was updated, and also I could
> grab the max value used at any point in time, but in Tez that code is out.
>
> Do you know what is the easiest way to get PEAK_PHYSICAL_MEMORY_BYTES for
> a task in Tez, or easiest way to make counters update at runtime, and not
> only when task finished ?
>
> 3. I would like to get(or at least get good approximates) of the amount of
> bytes read/write from the network or disk for each task. Do you know what
> is the best way to purse to get those ? I saw there are some Input/Output
> Spec per task. Is there a way to get it from those ?
>
> Thanks,
> robert
>
>
>
>
>

Re: tez counters

Posted by Grandl Robert <rg...@yahoo.com>.
Thanks for your answer. 


I need to look at these values as soon as a task have finished(as in MR, whenever a task finished, in RMContainerAllocator, I could get all the counters for that task - though I did not have anything for network as well). 


At the moment, I don't think there's a good way to access any of this data other than the counters.
Can you suggest some place where to inject some code for counter update during task lifetime ?

Thanks,
Robert




On Thursday, June 26, 2014 4:08 PM, Siddharth Seth <ss...@apache.org> wrote:
 


Hi Grandl
Could you please file jiras to update these properties more often. At the moment, I don't think there's a good way to access any of this data other than the counters.

For Input bytes from local disk / HDFS - the task should have HDFS and LOCAL filesystem counters which provide this information. Nothing is currently available for the network.

Do you need to look at these values while the DAG (and task) is running or after the DAG completes ?



On Thu, Jun 26, 2014 at 3:01 PM, Grandl Robert <rg...@yahoo.com> wrote:

Hi guys,
>
>
>I would like to get some counters per task, like: 
>
>1. CPU used per task ( I can use CPU_MILLISECOND counter)
>
>
>2. PEAK physical memory used by task at any point in time
>
>I know there is a PHYSICAL_MEMORY_BYTES counter updated from ResourceCalculatorProcessTree but the update is invoked only once at the end of task lifetime, so it's hard to get the PEAK memory used at any time using it. In MR, there is an updateProgressSplits() method in TaskAttemptImpl, 
>
>from where the PHYSICAL_MEMORY_BYTES counter was updated, and also I could grab the max value used at any point in time, but in Tez that code is out. 
>
>
>
>Do you know what is the easiest way to get PEAK_PHYSICAL_MEMORY_BYTES for a task in Tez, or easiest way to make counters update at runtime, and not only when task finished ?
>
>
>3. I would like to get(or at least get good approximates) of the amount of bytes read/write from the network or disk for each task. Do you know what is the best way to purse to get those ? I saw there are some Input/Output Spec per task. Is there a way to get it from those ?
>
>
>Thanks,
>robert
>

Re: tez counters

Posted by Siddharth Seth <ss...@apache.org>.
Hi Grandl
Could you please file jiras to update these properties more often. At the
moment, I don't think there's a good way to access any of this data other
than the counters.

For Input bytes from local disk / HDFS - the task should have HDFS and
LOCAL filesystem counters which provide this information. Nothing is
currently available for the network.

Do you need to look at these values while the DAG (and task) is running or
after the DAG completes ?


On Thu, Jun 26, 2014 at 3:01 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi guys,
>
> I would like to get some counters per task, like:
> 1. CPU used per task ( I can use CPU_MILLISECOND counter)
>
> 2. PEAK physical memory used by task at any point in time
> I know there is a PHYSICAL_MEMORY_BYTES counter updated from
> ResourceCalculatorProcessTree but the update is invoked only once at the
> end of task lifetime, so it's hard to get the PEAK memory used at any time
> using it. In MR, there is an updateProgressSplits() method in
> TaskAttemptImpl,
> from where the PHYSICAL_MEMORY_BYTES counter was updated, and also I could
> grab the max value used at any point in time, but in Tez that code is out.
>
> Do you know what is the easiest way to get PEAK_PHYSICAL_MEMORY_BYTES for
> a task in Tez, or easiest way to make counters update at runtime, and not
> only when task finished ?
>
> 3. I would like to get(or at least get good approximates) of the amount of
> bytes read/write from the network or disk for each task. Do you know what
> is the best way to purse to get those ? I saw there are some Input/Output
> Spec per task. Is there a way to get it from those ?
>
> Thanks,
> robert
>