You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/05/28 04:54:17 UTC
[jira] [Comment Edited] (SPARK-7529) Java compatibility check for MLlib 1.4

    [ https://issues.apache.org/jira/browse/SPARK-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562199#comment-14562199 ] 

Joseph K. Bradley edited comment on SPARK-7529 at 5/28/15 2:53 AM:
-------------------------------------------------------------------

*spark.mllib: Issues found in a pass through the spark.mllib package*
* _Complete, but needs to be annotated with what needs to be fixed._

h3. Classification

LogisticRegressionModel + SVMModel
* scala.Option<Object>	getThreshold()

NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels, Iterable<Object> pi, Iterable<Iterable<Object>> theta)

h3. Clustering

DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>>	topicDistributions()

GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object>	predict(RDD<Vector> points)

StreamingKMeans
* DStream<Object>	predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>>	predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)

h3. Evaluation

AreaUnderCurve
* static double	of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double	of(RDD<scala.Tuple2<Object,Object>> curve)

BinaryClassificationMetrics
* LOTS (everything taking/returning an RDD)

RankingMetrics constructor
* RankingMetrics(RDD<scala.Tuple2<Object,Object>> predictionAndLabels, scala.reflect.ClassTag<T> evidence$1)

h3. Feature

Word2VecModel
* scala.Tuple2<String,Object>[]	findSynonyms

h3. Linalg

SparseMatrix
* static SparseMatrix	fromCOO(int numRows, int numCols, scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries)

Vectors
* static Vector	sparse(int size, scala.collection.Seq<scala.Tuple2<Object,Object>> elements)

BlockMatrix
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>>	blocks()
** _This issue appears in the constructors too._

h3. Optimization

Optimizer
* Vector	optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
* _Same issue appears elsewhere, wherever Double is used in a tuple._

Gradient
* scala.Tuple2<Vector,Object>	compute(Vector data, double label, Vector weights)

h3. Recommendation

MatrixFactorizationModel
* _constructor_: MatrixFactorizationModel(int rank, RDD<scala.Tuple2<Object,double[]>> userFeatures, RDD<scala.Tuple2<Object,double[]>> productFeatures)
* RDD<scala.Tuple2<Object,double[]>>	productFeatures()
* RDD<scala.Tuple2<Object,Rating[]>>	recommendProductsForUsers(int num)
* RDD<scala.Tuple2<Object,Rating[]>>	recommendUsersForProducts(int num)
* RDD<scala.Tuple2<Object,double[]>>	userFeatures()

h3. Regression

GeneralizedLinearModel
* RDD<Object>	predict(RDD<Vector> testData)

h3. Stats

Statistics
* static double	corr(RDD<Object> x, RDD<Object> y)
* static double	corr(RDD<Object> x, RDD<Object> y, String method)

h3. Trees

DecisionTreeModel
* JavaRDD<Object>	predict(JavaRDD<Vector> features)
** _This is because we use Double instead of java.lang.Double (unlike in, e.g., TreeEnsembleModel._

Split
* scala.collection.immutable.List<Object>	categories()

h3. util

DataValidators
* static scala.Function1<RDD<LabeledPoint>,Object>	binaryLabelValidator()
* static scala.Function1<RDD<LabeledPoint>,Object>	multiLabelValidator(int k)



was (Author: josephkb):
*spark.mllib: Issues found in a pass through the spark.mllib package*
* _This is not yet complete, but I'm saving my work partway.  Will update soon._

h3. Classification

LogisticRegressionModel + SVMModel
* scala.Option<Object>	getThreshold()

NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels, Iterable<Object> pi, Iterable<Iterable<Object>> theta)

h3. Clustering

DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>>	topicDistributions()

GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object>	predict(RDD<Vector> points)

StreamingKMeans
* DStream<Object>	predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>>	predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)

h3. Evaluation

AreaUnderCurve
* static double	of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double	of(RDD<scala.Tuple2<Object,Object>> curve)

BinaryClassificationMetrics
* LOTS (everything taking/returning an RDD)

RankingMetrics constructor
* RankingMetrics(RDD<scala.Tuple2<Object,Object>> predictionAndLabels, scala.reflect.ClassTag<T> evidence$1)

h3. Feature

Word2VecModel
* scala.Tuple2<String,Object>[]	findSynonyms

h3. Linalg

SparseMatrix
* static SparseMatrix	fromCOO(int numRows, int numCols, scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries)

Vectors
* static Vector	sparse(int size, scala.collection.Seq<scala.Tuple2<Object,Object>> elements)

BlockMatrix
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>>	blocks()
** _This issue appears in the constructors too._



> Java compatibility check for MLlib 1.4
> --------------------------------------
>
>                 Key: SPARK-7529
>                 URL: https://issues.apache.org/jira/browse/SPARK-7529
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: Joseph K. Bradley
>
> Check Java compatibility for MLlib 1.4. We should create separate JIRAs for each possible issue.
> Checking compatibility means:
> * comparing with the Scala doc
> * verifying that Java docs are not messed up by Scala type incompatibilities (E.g., check for generic "Object" types where Java cannot understand complex Scala types.  Also check Scala objects (especially with nesting!) carefully.
> * If needed for complex issues, create small Java unit tests which execute each method.  (The correctness can be checked in Scala.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org