You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Kwan (JIRA)" <ji...@apache.org> on 2017/08/04 21:35:00 UTC

[jira] [Created] (SPARK-21643) LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward

Thomas Kwan created SPARK-21643:
-----------------------------------

             Summary: LR dataset worked in Spark 1.6.3, 2.0.2 stopped working in 2.1.0 onward
                 Key: SPARK-21643
                 URL: https://issues.apache.org/jira/browse/SPARK-21643
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.2.0, 2.1.1, 2.1.0
         Environment: CentOS 7, 256G memory, and 52 CPUs VM
            Reporter: Thomas Kwan


This dataset is working on 1.6.x, and 2.0.x. But it is not converging with 2.1+

a) Download the data set (https://s3.amazonaws.com/manage-partners/pipeline/di873-train.json.gz) and uncompress it, i placed it /tmp/di873-train.json
b) Download the spark package to /usr/lib/spark/spark-*
c) cd sbin
d) start-master.sh
e) start-slave.sh <master-url>
f) cd ../bin
g) Start spark-shell <master-url>
h) I pasted in the following scala cods:

import org.apache.spark.sql.types._
val VT = org.apache.spark.ml.linalg.SQLDataTypes.VectorType
val schema = StructType(Array(StructField("features", VT,true),StructField("label",DoubleType,true)))

val df = spark.read.schema(schema).json("file:///tmp/di873-train.json")
val trainer = new org.apache.spark.ml.classification.LogisticRegression().setMaxIter(500).setElasticNetParam(1.0).setRegParam(0.00001).setTol(0.00001).setFitIntercept(true)
val model = trainer.fit(df)

i) Then I monitored the progress in the Spark UI under the Jobs tab.
With Spark 1.6.1, Spark 2.0.2, the training (treeAggregate), the training finished around 25-30 jobs. But with 2.1+, the trainings were not converging and the training were finished only because they hitted the max iterations (i.e. 500).





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org