You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/02/13 00:06:13 UTC

[jira] [Commented] (MAHOUT-1636) Class dependencies for the spark module are put in a job.jar, which is very inefficient

    [ https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319188#comment-14319188 ] 

ASF GitHub Bot commented on MAHOUT-1636:
----------------------------------------

Github user pferrel commented on the pull request:

    https://github.com/apache/mahout/pull/69#issuecomment-74173166
  
    nothing on the assembly, I'll wait to see your refactoring.
    
    Did lots of comment and scaladoc cleanup to take care of a fair bit of untidiness. Didn't remove scopt, which is just too small to be worth the effort. 
    
    Replaced use of o.a.m.Pair with Scala tuples There may be some unneeded imports since I can't trust my IDE to clean those up. I'll have to take a longer look to remove any that are unnecessary. 
    
    Still testing against the latest master but hope to push soon.


> Class dependencies for the spark module are put in a job.jar, which is very inefficient
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1636
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.0-snapshot
>            Reporter: Pat Ferrel
>            Assignee: Ted Dunning
>             Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all dependencies including transitive ones. This job.jar is in mahout/spark/target and is included in the classpath when a Spark job is run. This allows dependency classes to be found at runtime but the job.jar include a great deal of things not needed that are duplicates of classes found in the main mrlegacy job.jar.  If the job.jar is removed, drivers will not find needed classes. A better way needs to be implemented for including class dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for now. Whoever picks up this Jira will have to remove it after deciding on a better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)