You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Benjii519 <be...@gmail.com> on 2016/05/27 01:54:37 UTC

Creation of SparkML Estimators in Java broken?

Hello, 

Let me preface this with the fact that I am completely new to Spark and
Scala, so I may be missing something basic. 

I have been looking at implementing a clustering algorithm on top of SparkML
using Java, and ran into immediate problems. As a sanity check, I went to
the Java API example, but encountered the same behavior: I am unable to set
parameters on a Java defined Estimator 

Focusing on the JavaDeveloperApiExample, as I trust that more than my code,
I encounter the exception pasted at end of post. 

Digging around the Spark code, it looks like adding parameters through Java
is broken because the Scala params implementation is using reflection to
determine valid parameters. This works fine in the Scala Estimators as they
appear to use implementation specific params as a trait. In the Java case,
the params are a generic base class and reflection on params won't find
anything to populate (all defined on the Estimator class). Therefore, when I
try to set a parameter on the estimator, the validation fails as an unknown
parameter. 

Any feedback / suggestions? Is this a known issue? 

Thanks! 

Exception in thread "main" java.lang.IllegalArgumentException: requirement
failed: Param myJavaLogReg_d3e770dacdc9__maxIter does not belong to
myJavaLogReg_d3e770dacdc9. 
        at scala.Predef$.require(Predef.scala:233) 
        at
org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:740) 
        at org.apache.spark.ml.param.Params$class.set(params.scala:618) 
        at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43) 
        at org.apache.spark.ml.param.Params$class.set(params.scala:604) 
        at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43) 
        at
org.apache.spark.examples.ml.MyJavaLogisticRegression.setMaxIter(JavaDeveloperApiExample.java:144) 
        at
org.apache.spark.examples.ml.MyJavaLogisticRegression.init(JavaDeveloperApiExample.java:139) 
        at
org.apache.spark.examples.ml.MyJavaLogisticRegression.<init>(JavaDeveloperApiExample.java:111) 
        at
org.apache.spark.examples.ml.JavaDeveloperApiExample.main(JavaDeveloperApiExample.java:68)



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Creation-of-SparkML-Estimators-in-Java-broken-tp17710.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Creation of SparkML Estimators in Java broken?

Posted by Yanbo Liang <yb...@gmail.com>.
Create JIRA https://issues.apache.org/jira/browse/SPARK-15605 .

2016-05-27 1:02 GMT-07:00 Yanbo Liang <yb...@gmail.com>:

> This is because we do not have excellent coverage for Java-friendly
> wrappers.
> I found we only implement JavaParams who is the wrappers of Scala Params.
> We still need Java-friendly wrappers for other traits who extends from
> Scala Params.
>
> For example, in Scala we have:
>     trait HasLabelCol extends Params
> We should have the Java-friendly wrappers as follows:
>     class JavaHasLabelCol extends JavaParams
>
> Then each params of the Estimator will be generated correctly and param
> validation will success.
> I think this should be further defined.
>
>
>
>
>
>
>
> 2016-05-26 18:54 GMT-07:00 Benjii519 <be...@gmail.com>:
>
>> Hello,
>>
>> Let me preface this with the fact that I am completely new to Spark and
>> Scala, so I may be missing something basic.
>>
>> I have been looking at implementing a clustering algorithm on top of
>> SparkML
>> using Java, and ran into immediate problems. As a sanity check, I went to
>> the Java API example, but encountered the same behavior: I am unable to
>> set
>> parameters on a Java defined Estimator
>>
>> Focusing on the JavaDeveloperApiExample, as I trust that more than my
>> code,
>> I encounter the exception pasted at end of post.
>>
>> Digging around the Spark code, it looks like adding parameters through
>> Java
>> is broken because the Scala params implementation is using reflection to
>> determine valid parameters. This works fine in the Scala Estimators as
>> they
>> appear to use implementation specific params as a trait. In the Java case,
>> the params are a generic base class and reflection on params won't find
>> anything to populate (all defined on the Estimator class). Therefore,
>> when I
>> try to set a parameter on the estimator, the validation fails as an
>> unknown
>> parameter.
>>
>> Any feedback / suggestions? Is this a known issue?
>>
>> Thanks!
>>
>> Exception in thread "main" java.lang.IllegalArgumentException: requirement
>> failed: Param myJavaLogReg_d3e770dacdc9__maxIter does not belong to
>> myJavaLogReg_d3e770dacdc9.
>>         at scala.Predef$.require(Predef.scala:233)
>>         at
>> org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:740)
>>         at org.apache.spark.ml.param.Params$class.set(params.scala:618)
>>         at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43)
>>         at org.apache.spark.ml.param.Params$class.set(params.scala:604)
>>         at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43)
>>         at
>>
>> org.apache.spark.examples.ml.MyJavaLogisticRegression.setMaxIter(JavaDeveloperApiExample.java:144)
>>         at
>>
>> org.apache.spark.examples.ml.MyJavaLogisticRegression.init(JavaDeveloperApiExample.java:139)
>>         at
>>
>> org.apache.spark.examples.ml.MyJavaLogisticRegression.<init>(JavaDeveloperApiExample.java:111)
>>         at
>>
>> org.apache.spark.examples.ml.JavaDeveloperApiExample.main(JavaDeveloperApiExample.java:68)
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Creation-of-SparkML-Estimators-in-Java-broken-tp17710.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Re: Creation of SparkML Estimators in Java broken?

Posted by Yanbo Liang <yb...@gmail.com>.
This is because we do not have excellent coverage for Java-friendly
wrappers.
I found we only implement JavaParams who is the wrappers of Scala Params.
We still need Java-friendly wrappers for other traits who extends from
Scala Params.

For example, in Scala we have:
    trait HasLabelCol extends Params
We should have the Java-friendly wrappers as follows:
    class JavaHasLabelCol extends JavaParams

Then each params of the Estimator will be generated correctly and param
validation will success.
I think this should be further defined.







2016-05-26 18:54 GMT-07:00 Benjii519 <be...@gmail.com>:

> Hello,
>
> Let me preface this with the fact that I am completely new to Spark and
> Scala, so I may be missing something basic.
>
> I have been looking at implementing a clustering algorithm on top of
> SparkML
> using Java, and ran into immediate problems. As a sanity check, I went to
> the Java API example, but encountered the same behavior: I am unable to set
> parameters on a Java defined Estimator
>
> Focusing on the JavaDeveloperApiExample, as I trust that more than my code,
> I encounter the exception pasted at end of post.
>
> Digging around the Spark code, it looks like adding parameters through Java
> is broken because the Scala params implementation is using reflection to
> determine valid parameters. This works fine in the Scala Estimators as they
> appear to use implementation specific params as a trait. In the Java case,
> the params are a generic base class and reflection on params won't find
> anything to populate (all defined on the Estimator class). Therefore, when
> I
> try to set a parameter on the estimator, the validation fails as an unknown
> parameter.
>
> Any feedback / suggestions? Is this a known issue?
>
> Thanks!
>
> Exception in thread "main" java.lang.IllegalArgumentException: requirement
> failed: Param myJavaLogReg_d3e770dacdc9__maxIter does not belong to
> myJavaLogReg_d3e770dacdc9.
>         at scala.Predef$.require(Predef.scala:233)
>         at
> org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:740)
>         at org.apache.spark.ml.param.Params$class.set(params.scala:618)
>         at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43)
>         at org.apache.spark.ml.param.Params$class.set(params.scala:604)
>         at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:43)
>         at
>
> org.apache.spark.examples.ml.MyJavaLogisticRegression.setMaxIter(JavaDeveloperApiExample.java:144)
>         at
>
> org.apache.spark.examples.ml.MyJavaLogisticRegression.init(JavaDeveloperApiExample.java:139)
>         at
>
> org.apache.spark.examples.ml.MyJavaLogisticRegression.<init>(JavaDeveloperApiExample.java:111)
>         at
>
> org.apache.spark.examples.ml.JavaDeveloperApiExample.main(JavaDeveloperApiExample.java:68)
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Creation-of-SparkML-Estimators-in-Java-broken-tp17710.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>