You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Darabos <da...@lynxanalytics.com> on 2014/11/06 18:04:40 UTC

Task duration graph on Spark stage UI

Even though the stage UI has min, 25th%, median, 75th%, and max durations,
I am often still left clueless about the distribution. For example, 100 out
of 200 tasks (started at the same time) have completed in 1 hour. How much
longer do I have to wait? I cannot guess well based on the five numbers.

A graph of the durations will not answer the question either, but I think
it gives a better idea. I can hopefully see if the distribution is linearly
sloped or bimodal or exponentially slowing down, etc.

It's easy to draw this graph, so I set it up as a Chrome extension:

https://chrome.google.com/webstore/detail/spark-distributions/hhgnppbenlghmcimkmiccfiemdohdgoo

And here's the complete source code that you can throw in the JavaScript
console for the same results:

var x = $('table:eq(2)').find('td:nth-child(8)').map(function (i, e) {
return parseInt($(e).attr('sorttable_customkey')); });
x.sort(function(a, b) { return a - b; });
var w = x.length;
var h = x[w - 1];
var W = 180;
var H = 80;
var canvas = $('<canvas width="' + W + '" height="' + H + '">');
canvas.css({ position: 'absolute', top: '100px', left: '500px' });
$('body').append(canvas);
var ctx = canvas[0].getContext('2d');
ctx.fillStyle = 'orange';
ctx.beginPath();
ctx.moveTo(0, H);
for (var i = 0; i < w; ++i) {
  ctx.lineTo(i * W / (w - 1), H - x[i] * H / h);
}
ctx.lineTo(W, H);
ctx.fill();

It should not be much work to add this to the stage status page itself
either, if there is interest.