You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Robert Kanter <rk...@cloudera.com> on 2015/08/14 01:42:55 UTC

Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/
-----------------------------------------------------------

Review request for oozie.


Bugs: OOZIE-2277
    https://issues.apache.org/jira/browse/OOZIE-2277


Repository: oozie-git


Description
-------

https://issues.apache.org/jira/browse/OOZIE-2277


Diffs
-----

  core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
  core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
  sharelib/spark/pom.xml 6f7e74a 
  sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
  sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 

Diff: https://reviews.apache.org/r/37452/diff/


Testing
-------

- Ran unit tests with Hadoop 1 and Hadoop 2
- Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes


Thanks,

Robert Kanter


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Robert Kanter <rk...@cloudera.com>.

> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java, line 85
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039661#file1039661line85>
> >
> >     Should we be doing this? From http://spark.apache.org/docs/latest/running-on-yarn.html ,it refers to hdfs location or local installation in task node. Since it applies for other clients, should we retain that in Oozie as well or we are saying that Oozie is only going to use spark libraries via sharelib?  Or atleast it should be configurable to retain this setting to support users have local installation in task nodes.

In CDH 5.4, we were shipping the assembly jar in the sharelib, and this would override the sharelib jar.  And this config is loaded from the Oozie Server's host which can be a different location than the Launcher Job's host, where Spark is run.  We saw a problem with this when they didn't match.  With the dependency changes in this patch, the assembly jar isn't required in the sharelib (or at all), which is good because it's not published to maven.  When testing the different modes, at least one of them (I forget which) also had a weird Serialization error because the assembly jar was from a different build than the sharelib jars.

I'll add a config to enable/disable removing it to be flexible just in case someone wants to use it, but I think we should remove it by default.


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, lines 60-61
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line60>
> >
> >     Robert, 
> >        In our chat, you mentioned about ability to specify this alternatively as -master yarn -mode client and -master yarn -mode cluster. Will have to handle that as well.

Good point.  Also, I had the argument name wrong: it's --deploy-mode instead of --mode, but the behavior is as I described to you.


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, line 78
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line78>
> >
> >     If local mode will be ever used in Oozie, then all the new code can go into a if (yarnClusterMode || yarnClientMode) block to be done only for non-local mode.

local mode still requires setting --jars, so I'd have to duplicate that part of the new code in an else statement which I think might be harder to follow/maintain.  Can we leave this as is?


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, line 113
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line113>
> >
> >     Can you place local files also in spark.yarn.dist.files and spark takes care of shipping them like --jars? Asking because you are adding files from java.classpath to sparkJars. Atleast in hadoop mapreduce.cache.files have to be hdfs paths.

I don't think spark.yarn.dist.files actually sends the jars anywhere.  I'm pretty sure the combination of master/modes and the many jar-related Spark configs that I'm using is the only way that will work for each master/mode type.  It took a lot of trial-and-error and checking with our Spark team who wasn't 100% sure on the necessary configs either.  (I don't know why they had to make this so complicated)


> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, line 123
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line123>
> >
> >     Can you add just add a comment here saying this is redundant for yarnClientMode as driver is the launcher jvm and it is already launched.

Surprisingly, IIRC, this is actually required even though you'd think it would be able to use the JVM's classpath (I think they must do something funny with classloaders).  I'll double check and add a comment if it's not.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review95987
-----------------------------------------------------------


On Aug. 13, 2015, 11:42 p.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2015, 11:42 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Robert Kanter <rk...@cloudera.com>.

> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, line 123
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line123>
> >
> >     Can you add just add a comment here saying this is redundant for yarnClientMode as driver is the launcher jvm and it is already launched.
> 
> Robert Kanter wrote:
>     Surprisingly, IIRC, this is actually required even though you'd think it would be able to use the JVM's classpath (I think they must do something funny with classloaders).  I'll double check and add a comment if it's not.

Ya, it's necessary.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review95987
-----------------------------------------------------------


On Aug. 13, 2015, 11:42 p.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2015, 11:42 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Aug. 27, 2015, 10:03 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java, line 78
> > <https://reviews.apache.org/r/37452/diff/1/?file=1039664#file1039664line78>
> >
> >     If local mode will be ever used in Oozie, then all the new code can go into a if (yarnClusterMode || yarnClientMode) block to be done only for non-local mode.
> 
> Robert Kanter wrote:
>     local mode still requires setting --jars, so I'd have to duplicate that part of the new code in an else statement which I think might be harder to follow/maintain.  Can we leave this as is?

Sure.


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review95987
-----------------------------------------------------------


On Aug. 28, 2015, 5:03 a.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2015, 5:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/main/resources/oozie-default.xml 32a1df0 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   docs/src/site/twiki/DG_SparkActionExtension.twiki 32ebe12 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review95987
-----------------------------------------------------------



core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java (line 85)
<https://reviews.apache.org/r/37452/#comment151180>

    Should we be doing this? From http://spark.apache.org/docs/latest/running-on-yarn.html ,it refers to hdfs location or local installation in task node. Since it applies for other clients, should we retain that in Oozie as well or we are saying that Oozie is only going to use spark libraries via sharelib?  Or atleast it should be configurable to retain this setting to support users have local installation in task nodes.



sharelib/spark/pom.xml (line 121)
<https://reviews.apache.org/r/37452/#comment151181>

    Nitpick. Can you put spark-core in the beginning followed by other spark feature dependencies as that is the main one.



sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java (lines 60 - 61)
<https://reviews.apache.org/r/37452/#comment152402>

    Robert, 
       In our chat, you mentioned about ability to specify this alternatively as -master yarn -mode client and -master yarn -mode cluster. Will have to handle that as well.



sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java (line 78)
<https://reviews.apache.org/r/37452/#comment152398>

    If local mode will be ever used in Oozie, then all the new code can go into a if (yarnClusterMode || yarnClientMode) block to be done only for non-local mode.



sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java (line 103)
<https://reviews.apache.org/r/37452/#comment152407>

    DELIM is space which means we expect user to specify exactly one space between the arguments. We should do \s+ or we can try something like http://stackoverflow.com/questions/6049470/can-apache-commons-cli-options-parser-ignore-unknown-command-line-options/8613949#8613949 to be more cleaner with parsing the arguments.



sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java (line 113)
<https://reviews.apache.org/r/37452/#comment152417>

    Can you place local files also in spark.yarn.dist.files and spark takes care of shipping them like --jars? Asking because you are adding files from java.classpath to sparkJars. Atleast in hadoop mapreduce.cache.files have to be hdfs paths.



sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java (line 123)
<https://reviews.apache.org/r/37452/#comment152408>

    Can you add just add a comment here saying this is redundant for yarnClientMode as driver is the launcher jvm and it is already launched.


Robert did enlighten me that

"in local mode, everything runs in the launcher job. in yarn-client mode, the driver runs in the launcher and the executor in Yarn. in yarn-cluster mode, the driver and executor run in Yarn"

Can we added that to code comments and also in documentation as it will be confusing for users as well.

- Rohini Palaniswamy


On Aug. 13, 2015, 11:42 p.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2015, 11:42 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review97167
-----------------------------------------------------------

Ship it!


Ship It!

- Rohini Palaniswamy


On Aug. 28, 2015, 5:03 a.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2015, 5:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/main/resources/oozie-default.xml 32a1df0 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   docs/src/site/twiki/DG_SparkActionExtension.twiki 32ebe12 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Robert Kanter <rk...@cloudera.com>.

> On Aug. 31, 2015, 10:16 p.m., Rohini Palaniswamy wrote:
> > sharelib/spark/pom.xml, line 137
> > <https://reviews.apache.org/r/37452/diff/1-2/?file=1039663#file1039663line137>
> >
> >     Are we removing spark-core altogether as other dependencies will bring it?

I think the previous version of the patch was moving the spark-core dependency definition in the pom file, but now it's not, so RB is showing it funny like this.  If you look at the patch on the JIRA, it's not removing spark-core.  In any case, we don't want to remove it and I'll make sure it's not when committing.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review97166
-----------------------------------------------------------


On Aug. 28, 2015, 5:03 a.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2015, 5:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/main/resources/oozie-default.xml 32a1df0 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   docs/src/site/twiki/DG_SparkActionExtension.twiki 32ebe12 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/#review97166
-----------------------------------------------------------



sharelib/spark/pom.xml 
<https://reviews.apache.org/r/37452/#comment152949>

    Are we removing spark-core altogether as other dependencies will bring it?


- Rohini Palaniswamy


On Aug. 28, 2015, 5:03 a.m., Robert Kanter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37452/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2015, 5:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-2277
>     https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/OOZIE-2277
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
>   core/src/main/resources/oozie-default.xml 32a1df0 
>   core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
>   docs/src/site/twiki/DG_SparkActionExtension.twiki 32ebe12 
>   sharelib/spark/pom.xml 6f7e74a 
>   sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
>   sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 
> 
> Diff: https://reviews.apache.org/r/37452/diff/
> 
> 
> Testing
> -------
> 
> - Ran unit tests with Hadoop 1 and Hadoop 2
> - Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes
> 
> 
> Thanks,
> 
> Robert Kanter
> 
>


Re: Review Request 37452: OOZIE-2277: Honor oozie.action.sharelib.for.spark in Spark jobs

Posted by Robert Kanter <rk...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37452/
-----------------------------------------------------------

(Updated Aug. 28, 2015, 5:03 a.m.)


Review request for oozie.


Bugs: OOZIE-2277
    https://issues.apache.org/jira/browse/OOZIE-2277


Repository: oozie-git


Description
-------

https://issues.apache.org/jira/browse/OOZIE-2277


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/service/SparkConfigurationService.java 1b7cf4a 
  core/src/main/resources/oozie-default.xml 32a1df0 
  core/src/test/java/org/apache/oozie/service/TestSparkConfigurationService.java b2c499d 
  docs/src/site/twiki/DG_SparkActionExtension.twiki 32ebe12 
  sharelib/spark/pom.xml 6f7e74a 
  sharelib/spark/src/main/java/org.apache.oozie.action.hadoop/SparkMain.java b18a0b9 
  sharelib/spark/src/test/java/org/apache/oozie/action/hadoop/TestSparkActionExecutor.java f271abc 

Diff: https://reviews.apache.org/r/37452/diff/


Testing
-------

- Ran unit tests with Hadoop 1 and Hadoop 2
- Ran in a Hadoop 2 cluster with local, yarn-client, and yarn-cluster modes


Thanks,

Robert Kanter