You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2013/12/04 17:15:36 UTC
[jira] [Created] (GIRAPH-810) Giraph should track aggregate
statistics over lifetime of the computation
Rob Vesse created GIRAPH-810:
--------------------------------
Summary: Giraph should track aggregate statistics over lifetime of the computation
Key: GIRAPH-810
URL: https://issues.apache.org/jira/browse/GIRAPH-810
Project: Giraph
Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Rob Vesse
When Giraph completes a job it reports a set of information about the job like so:
{noformat}
Giraph Timers
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Superstep 3 TriangleFindingComputation (ms)=102234
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Superstep 2 TriangleFindingComputation (ms)=29419
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Superstep 1 TriangleFindingComputation (ms)=34397
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Input superstep (ms)=12642
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Total (ms)=208962
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Superstep 0 TriangleFindingComputation (ms)=4201
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Shutdown (ms)=2698
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main): Setup (ms)=23351
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Zookeeper server:port
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): ip-10-145-221-220.ec2.internal:22181=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Giraph Stats
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Aggregate edges=150000
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Sent message bytes=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Superstep=4
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Last checkpointed superstep=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Current workers=16
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Current master task partition=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Sent messages=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Aggregate finished vertices=1000
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main): Aggregate vertices=1000
{noformat}
The problem is that some of this statistics are not particularly helpful since they pertain only to the most recent super step, namely Sent messages and Sent messages bytes.
I can understand that there is a reason for doing this since the number of sent messages is used in helping to determine whether a computation should halt at a given super step but it would be useful if these were also tracked in aggregate over the lifetime of the computation.
--
This message was sent by Atlassian JIRA
(v6.1#6144)