You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2013/12/03 12:32:35 UTC

[jira] [Commented] (GIRAPH-808) Giraph should report progress more accurately when running on Map/Reduce

    [ https://issues.apache.org/jira/browse/GIRAPH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837594#comment-13837594 ] 

Rob Vesse commented on GIRAPH-808:
----------------------------------

The middle segment of progress could be better approximated using the number of vertices to be processed i.e.

CurrentSuperstepProgress = VerticesProcessed / TotalVertices
SuperstepsProgress = (CurrentSuperstep + 1 / SuperstepsSoFar)
ComputationProgress = CurrentSuperstepProgress * (SuperstepsProgress / (N - 1))

This would have the effect that the progress would not be a linear trend since it would trend towards N during a super step and then drop back down at the start of the next super step but providing Hadoop allows progress to change in this way it would be a much better way of reporting the progression of a Giraph computation.


> Giraph should report progress more accurately when running on Map/Reduce
> ------------------------------------------------------------------------
>
>                 Key: GIRAPH-808
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-808
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Rob Vesse
>
> The current way that Giraph reports progress when running on Map/Reduce seems rather flawed.  When running a Giraph program the map tasks are launched and after initialisation their progress almost immediately goes to 100% and stays there throughout.  So the only way to monitor progress is to 
> I appreciate that there is no way for Giraph to report accurate progress since it does not know in advance how many super steps there will be but it could report progress in a more useful way.
> For example:
> - First N percent of progress is the input phase, this part could likely be accurately calculated by using standard Hadoop input APIs which Giraph input is built on
> - Next N percent of progress is an estimation that trends towards the final value but does not reach it until the computation has halted i.e. (Superstep + 1) / (N - 1) so this will naturally trend towards N.  Once the computation halts then the value becomes N
> - Last N percentage of progress is the output phase, again this part could likely be accurately calculated easily since Giraph knows how many items it has to output
> What does anybody else think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)