You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2014/12/23 17:54:13 UTC

[jira] [Created] (MAHOUT-1636) Class dependencies for the spark module are put in a job.jar, which is very inefficient

Pat Ferrel created MAHOUT-1636:
----------------------------------

             Summary: Class dependencies for the spark module are put in a job.jar, which is very inefficient
                 Key: MAHOUT-1636
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
             Project: Mahout
          Issue Type: Bug
          Components: spark
    Affects Versions: 1.0-snapshot
            Reporter: Pat Ferrel
             Fix For: 1.0-snapshot


using a maven plugin and an assembly job.xml a job.jar is created with all dependencies including transitive ones. This job.jar is in mahout/spark/target and is included in the classpath when a Spark job is run. This allows dependency classes to be found at runtime but the job.jar include a great deal of things not needed that are duplicates of classes found in the main mrlegacy job.jar.  If the job.jar is removed, drivers will not find needed classes. A better way needs to be implemented for including class dependencies.

I'm not sure what that better way is so am leaving the assembly alone for now. Whoever picks up this Jira will have to remove it after deciding on a better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Created] (MAHOUT-1636) Class dependencies for the spark module are put in a job.jar, which is very inefficient

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Are we talking about classpath problems in front end or backend?
 On Dec 23, 2014 8:54 AM, "Pat Ferrel (JIRA)" <ji...@apache.org> wrote:

> Pat Ferrel created MAHOUT-1636:
> ----------------------------------
>
>              Summary: Class dependencies for the spark module are put in a
> job.jar, which is very inefficient
>                  Key: MAHOUT-1636
>                  URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>              Project: Mahout
>           Issue Type: Bug
>           Components: spark
>     Affects Versions: 1.0-snapshot
>             Reporter: Pat Ferrel
>              Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all
> dependencies including transitive ones. This job.jar is in
> mahout/spark/target and is included in the classpath when a Spark job is
> run. This allows dependency classes to be found at runtime but the job.jar
> include a great deal of things not needed that are duplicates of classes
> found in the main mrlegacy job.jar.  If the job.jar is removed, drivers
> will not find needed classes. A better way needs to be implemented for
> including class dependencies.
>
> I'm not sure what that better way is so am leaving the assembly alone for
> now. Whoever picks up this Jira will have to remove it after deciding on a
> better method.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>