You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Brenden Matthews <br...@diddyinc.com> on 2013/05/14 02:19:58 UTC

Review Request: Fix TaskTracker pending tasks calculation.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

Review request for mesos.


Description
-------

>From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
From: Brenden Matthews <br...@airbnb.com>
Date: Mon, 13 May 2013 15:27:45 -0700
Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116
---
 .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)


Diffs
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review20666
-----------------------------------------------------------



hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java
<https://reviews.apache.org/r/11116/#comment42675>

    How about a comment explaining why 'progress.pendingMaps()' and 'progress.pendingReduces()' is insufficient?


- Benjamin Hindman


On May 14, 2013, 5:57 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated May 14, 2013, 5:57 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <br...@airbnb.com>
> Date: Mon, 13 May 2013 15:27:45 -0700
> Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> ---
>  .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11116: Fixed TaskTracker pending tasks calculation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review23884
-----------------------------------------------------------

Ship it!


- Vinod Kone


On July 25, 2013, 1:31 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:31 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request 11116: Fixed TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated July 25, 2013, 1:31 a.m.)


Review request for mesos.


Changes
-------

Rebase on master.


Summary (updated)
-----------------

Fixed TaskTracker pending tasks calculation.


Repository: mesos


Description (updated)
-------

Fixed TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request 11116: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated July 24, 2013, 10:59 p.m.)


Review request for mesos.


Changes
-------

Rebasing on master, updating as per Ben's comments.


Repository: mesos


Description
-------

Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21546
-----------------------------------------------------------



hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java
<https://reviews.apache.org/r/11116/#comment44589>

    Looks like you don't use these variables?


- Ben Mahler


On June 6, 2013, 2:08 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 2:08 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated June 6, 2013, 2:08 a.m.)


Review request for mesos.


Changes
-------

Rebasing on master.


Description
-------

Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated June 3, 2013, 6:02 p.m.)


Review request for mesos.


Description (updated)
-------

Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request 11116: Fixed TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.
> 
> Ben Mahler wrote:
>     Ah, yes that's what I'm getting at, wrong in what way?
> 
> Ben Mahler wrote:
>     Hey Brenden, I would really like to get this change in as I trust that you've found an issue with the code. But for posterity, it would be nice to have a clearer explanation of what was wrong with the old code (so if we run into bugs again we can understand why we choose to do this instead of just call pendingMaps/Reduces). Perhaps we need a different term for "pending", i.e., we're looking for tasks that don't have a slot open to run on.
>     
>     Do you remember how you noticed there was a problem? Did you notice in production? Was it because pending was 0, despite there being map tasks that needed to be run? Was the symptom that there were tasks that were not running because pending was 0?
> 
> Brenden Matthews wrote:
>     Hi Ben,
>     
>     As I recall, I was finding that the values Mesos was reporting in the log did not match the Hadoop JobTracker web UI.  In fact, on many occasions the values for the pending tasks were negative.  Since it's not possible to have a negative count, it was obviously broken.
>     
>     I thought perhaps it was related to this issue:
>     
>     https://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201204.mbox/%3C20120410201424.A7FE02388962@eris.apache.org%3E
>     
>     But the build of Hadoop I was running had this patch.  I decided that rather than patching Hadoop, I should just use the same calculation that the web UI was using.
> 
> Ben Mahler wrote:
>     Perfect! Can you mention that negative values were observed, and reference https://issues.apache.org/jira/browse/MAPREDUCE-1238? It would be great to mention that you've seen discrepancies with getPendingMaps/Reduces and what the webUI reports as pending.

Following up here, can you make the changes to the comment as a separate review?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On July 25, 2013, 1:31 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated July 25, 2013, 1:31 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 279f84e0f0c43ad3cfd9e4442010e706ee3565d9 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.
> 
> Ben Mahler wrote:
>     Ah, yes that's what I'm getting at, wrong in what way?
> 
> Ben Mahler wrote:
>     Hey Brenden, I would really like to get this change in as I trust that you've found an issue with the code. But for posterity, it would be nice to have a clearer explanation of what was wrong with the old code (so if we run into bugs again we can understand why we choose to do this instead of just call pendingMaps/Reduces). Perhaps we need a different term for "pending", i.e., we're looking for tasks that don't have a slot open to run on.
>     
>     Do you remember how you noticed there was a problem? Did you notice in production? Was it because pending was 0, despite there being map tasks that needed to be run? Was the symptom that there were tasks that were not running because pending was 0?
> 
> Brenden Matthews wrote:
>     Hi Ben,
>     
>     As I recall, I was finding that the values Mesos was reporting in the log did not match the Hadoop JobTracker web UI.  In fact, on many occasions the values for the pending tasks were negative.  Since it's not possible to have a negative count, it was obviously broken.
>     
>     I thought perhaps it was related to this issue:
>     
>     https://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201204.mbox/%3C20120410201424.A7FE02388962@eris.apache.org%3E
>     
>     But the build of Hadoop I was running had this patch.  I decided that rather than patching Hadoop, I should just use the same calculation that the web UI was using.

Perfect! Can you mention that negative values were observed, and reference https://issues.apache.org/jira/browse/MAPREDUCE-1238? It would be great to mention that you've seen discrepancies with getPendingMaps/Reduces and what the webUI reports as pending.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On June 6, 2013, 2:08 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 2:08 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.
> 
> Ben Mahler wrote:
>     Ah, yes that's what I'm getting at, wrong in what way?
> 
> Ben Mahler wrote:
>     Hey Brenden, I would really like to get this change in as I trust that you've found an issue with the code. But for posterity, it would be nice to have a clearer explanation of what was wrong with the old code (so if we run into bugs again we can understand why we choose to do this instead of just call pendingMaps/Reduces). Perhaps we need a different term for "pending", i.e., we're looking for tasks that don't have a slot open to run on.
>     
>     Do you remember how you noticed there was a problem? Did you notice in production? Was it because pending was 0, despite there being map tasks that needed to be run? Was the symptom that there were tasks that were not running because pending was 0?

Hi Ben,

As I recall, I was finding that the values Mesos was reporting in the log did not match the Hadoop JobTracker web UI.  In fact, on many occasions the values for the pending tasks were negative.  Since it's not possible to have a negative count, it was obviously broken.

I thought perhaps it was related to this issue:

https://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201204.mbox/%3C20120410201424.A7FE02388962@eris.apache.org%3E

But the build of Hadoop I was running had this patch.  I decided that rather than patching Hadoop, I should just use the same calculation that the web UI was using.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On June 6, 2013, 2:08 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 2:08 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.

Ah, yes that's what I'm getting at, wrong in what way?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On June 3, 2013, 6:02 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 6:02 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.
> 
> Ben Mahler wrote:
>     Ah, yes that's what I'm getting at, wrong in what way?

Hey Brenden, I would really like to get this change in as I trust that you've found an issue with the code. But for posterity, it would be nice to have a clearer explanation of what was wrong with the old code (so if we run into bugs again we can understand why we choose to do this instead of just call pendingMaps/Reduces). Perhaps we need a different term for "pending", i.e., we're looking for tasks that don't have a slot open to run on.

Do you remember how you noticed there was a problem? Did you notice in production? Was it because pending was 0, despite there being map tasks that needed to be run? Was the symptom that there were tasks that were not running because pending was 0?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On June 6, 2013, 2:08 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 2:08 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.

> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?

Yes, possibly.  To be honest I don't remember, it's been months since I fixed this.  All I remember specifically is that it was wrong.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On May 22, 2013, 7:49 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated May 22, 2013, 7:49 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <br...@airbnb.com>
> Date: Mon, 13 May 2013 15:27:45 -0700
> Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> ---
>  .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


What differences were you seeing between job.pendingMaps() and this new technique?

Looking at JobInProgress.pendingMaps():

  public synchronized int pendingMaps() {
    return numMapTasks - runningMapTasks - failedMapTIPs - 
    finishedMapTasks + speculativeMapTasks;
  }

vs. jobdetails_jsp.printTaskSummary() {
   int totalTasks = tasks.length;
    int runningTasks = 0;
    int finishedTasks = 0;
    int killedTasks = 0;
    int failedTaskAttempts = 0;
    int killedTaskAttempts = 0;
    for(int i=0; i < totalTasks; ++i) {
      TaskInProgress task = tasks[i];
      if (task.isComplete()) {
        finishedTasks += 1;
      } else if (task.isRunning()) {
        runningTasks += 1;
      } else if (task.wasKilled()) {
        killedTasks += 1;
      }
      failedTaskAttempts += task.numTaskFailures();
      killedTaskAttempts += task.numKilledTasks();
    }
    int pendingTasks = totalTasks - runningTasks - killedTasks - finishedTasks; 
    ...
}

It seems like the difference here might be between failed_ vs. killed_ and/or the fact that the latter case uses speculativeMapTasks?

- Ben Mahler


On May 22, 2013, 7:49 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated May 22, 2013, 7:49 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <br...@airbnb.com>
> Date: Mon, 13 May 2013 15:27:45 -0700
> Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> ---
>  .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated May 22, 2013, 7:49 p.m.)


Review request for mesos.


Description
-------

>From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
From: Brenden Matthews <br...@airbnb.com>
Date: Mon, 13 May 2013 15:27:45 -0700
Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116
---
 .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews


Re: Review Request: Fix TaskTracker pending tasks calculation.

Posted by Brenden Matthews <br...@diddyinc.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/
-----------------------------------------------------------

(Updated May 14, 2013, 5:57 p.m.)


Review request for mesos.


Description
-------

>From dc2c7f4116dec16bf0e22ab4c9fccac8f3a45020 Mon Sep 17 00:00:00 2001
From: Brenden Matthews <br...@airbnb.com>
Date: Mon, 13 May 2013 15:27:45 -0700
Subject: [PATCH 09/24] Fix TaskTracker pending tasks calculation.

Review: https://reviews.apache.org/r/11116
---
 .../org/apache/hadoop/mapred/MesosScheduler.java   |   33 ++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)


Diffs (updated)
-----

  hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java afe401f5265e3d9494af7eace42eec45943184a3 

Diff: https://reviews.apache.org/r/11116/diff/


Testing
-------

Used in production at airbnb.


Thanks,

Brenden Matthews