You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2015/05/01 18:11:02 UTC

dependency-reduced jar

There is an assembly xml in mahout/spark/src/main/assembly/dependency-reduced.xml. It contains dependencies that are external to mahout but required for either the client or backend executor distributed code.

Guava has recently been removed but scopt is still used by the client. For some reason the following artifacts were added to the assembly and I’m not sure why. This is only used with Spark.

<includes>
  <include>com.github.scopt</include>
  <include>com.tdunning:t-digest</include>
  <include>org.apache.commons:commons-math3</include>
</includes>

Are these all used? Does anyone know where t-digest and math3 came from?

I’d also like to propose that we create two jars, one for client and one for backend executors. There are three configs we need to work in, spark alone, yarn-cleint, and yarn-cluster. All these modes separate the needs of the client from the backend executors but have slightly different ways to get the classes needed for each. I think separating into client and backend dependencies jars will cover all cases but we’ll have to explain how to launch code in each mode.

Re: dependency-reduced jar

Posted by Suneel Marthi <su...@gmail.com>.

T-digest is being used in Mahout-MR, I believe its also packaged as part of
Spark -> AddThis jar.

On Fri, May 1, 2015 at 12:11 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> There is an assembly xml in
> mahout/spark/src/main/assembly/dependency-reduced.xml. It contains
> dependencies that are external to mahout but required for either the client
> or backend executor distributed code.
>
> Guava has recently been removed but scopt is still used by the client. For
> some reason the following artifacts were added to the assembly and I’m not
> sure why. This is only used with Spark.
>
> <includes>
>   <include>com.github.scopt</include>
>   <include>com.tdunning:t-digest</include>
>   <include>org.apache.commons:commons-math3</include>
> </includes>
>
> Are these all used? Does anyone know where t-digest and math3 came from?
>
> I’d also like to propose that we create two jars, one for client and one
> for backend executors. There are three configs we need to work in, spark
> alone, yarn-cleint, and yarn-cluster. All these modes separate the needs of
> the client from the backend executors but have slightly different ways to
> get the classes needed for each. I think separating into client and backend
> dependencies jars will cover all cases but we’ll have to explain how to
> launch code in each mode.