You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/05/28 04:54:17 UTC
[jira] [Comment Edited] (SPARK-7529) Java compatibility check for
MLlib 1.4
[ https://issues.apache.org/jira/browse/SPARK-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562199#comment-14562199 ]
Joseph K. Bradley edited comment on SPARK-7529 at 5/28/15 2:53 AM:
-------------------------------------------------------------------
*spark.mllib: Issues found in a pass through the spark.mllib package*
* _Complete, but needs to be annotated with what needs to be fixed._
h3. Classification
LogisticRegressionModel + SVMModel
* scala.Option<Object> getThreshold()
NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels, Iterable<Object> pi, Iterable<Iterable<Object>> theta)
h3. Clustering
DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>> topicDistributions()
GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object> predict(RDD<Vector> points)
StreamingKMeans
* DStream<Object> predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)
h3. Evaluation
AreaUnderCurve
* static double of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double of(RDD<scala.Tuple2<Object,Object>> curve)
BinaryClassificationMetrics
* LOTS (everything taking/returning an RDD)
RankingMetrics constructor
* RankingMetrics(RDD<scala.Tuple2<Object,Object>> predictionAndLabels, scala.reflect.ClassTag<T> evidence$1)
h3. Feature
Word2VecModel
* scala.Tuple2<String,Object>[] findSynonyms
h3. Linalg
SparseMatrix
* static SparseMatrix fromCOO(int numRows, int numCols, scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries)
Vectors
* static Vector sparse(int size, scala.collection.Seq<scala.Tuple2<Object,Object>> elements)
BlockMatrix
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
** _This issue appears in the constructors too._
h3. Optimization
Optimizer
* Vector optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
* _Same issue appears elsewhere, wherever Double is used in a tuple._
Gradient
* scala.Tuple2<Vector,Object> compute(Vector data, double label, Vector weights)
h3. Recommendation
MatrixFactorizationModel
* _constructor_: MatrixFactorizationModel(int rank, RDD<scala.Tuple2<Object,double[]>> userFeatures, RDD<scala.Tuple2<Object,double[]>> productFeatures)
* RDD<scala.Tuple2<Object,double[]>> productFeatures()
* RDD<scala.Tuple2<Object,Rating[]>> recommendProductsForUsers(int num)
* RDD<scala.Tuple2<Object,Rating[]>> recommendUsersForProducts(int num)
* RDD<scala.Tuple2<Object,double[]>> userFeatures()
h3. Regression
GeneralizedLinearModel
* RDD<Object> predict(RDD<Vector> testData)
h3. Stats
Statistics
* static double corr(RDD<Object> x, RDD<Object> y)
* static double corr(RDD<Object> x, RDD<Object> y, String method)
h3. Trees
DecisionTreeModel
* JavaRDD<Object> predict(JavaRDD<Vector> features)
** _This is because we use Double instead of java.lang.Double (unlike in, e.g., TreeEnsembleModel._
Split
* scala.collection.immutable.List<Object> categories()
h3. util
DataValidators
* static scala.Function1<RDD<LabeledPoint>,Object> binaryLabelValidator()
* static scala.Function1<RDD<LabeledPoint>,Object> multiLabelValidator(int k)
was (Author: josephkb):
*spark.mllib: Issues found in a pass through the spark.mllib package*
* _This is not yet complete, but I'm saving my work partway. Will update soon._
h3. Classification
LogisticRegressionModel + SVMModel
* scala.Option<Object> getThreshold()
NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels, Iterable<Object> pi, Iterable<Iterable<Object>> theta)
h3. Clustering
DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>> topicDistributions()
GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object> predict(RDD<Vector> points)
StreamingKMeans
* DStream<Object> predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)
h3. Evaluation
AreaUnderCurve
* static double of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double of(RDD<scala.Tuple2<Object,Object>> curve)
BinaryClassificationMetrics
* LOTS (everything taking/returning an RDD)
RankingMetrics constructor
* RankingMetrics(RDD<scala.Tuple2<Object,Object>> predictionAndLabels, scala.reflect.ClassTag<T> evidence$1)
h3. Feature
Word2VecModel
* scala.Tuple2<String,Object>[] findSynonyms
h3. Linalg
SparseMatrix
* static SparseMatrix fromCOO(int numRows, int numCols, scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries)
Vectors
* static Vector sparse(int size, scala.collection.Seq<scala.Tuple2<Object,Object>> elements)
BlockMatrix
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
** _This issue appears in the constructors too._
> Java compatibility check for MLlib 1.4
> --------------------------------------
>
> Key: SPARK-7529
> URL: https://issues.apache.org/jira/browse/SPARK-7529
> Project: Spark
> Issue Type: Sub-task
> Components: ML, MLlib
> Affects Versions: 1.4.0
> Reporter: Xiangrui Meng
> Assignee: Joseph K. Bradley
>
> Check Java compatibility for MLlib 1.4. We should create separate JIRAs for each possible issue.
> Checking compatibility means:
> * comparing with the Scala doc
> * verifying that Java docs are not messed up by Scala type incompatibilities (E.g., check for generic "Object" types where Java cannot understand complex Scala types. Also check Scala objects (especially with nesting!) carefully.
> * If needed for complex issues, create small Java unit tests which execute each method. (The correctness can be checked in Scala.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org