You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pete Prokopowicz <pp...@groupon.com.INVALID> on 2017/04/05 21:21:03 UTC

run-time exception trying to train MultilayerPerceptronClassifier with DataFrame

Hello,

I am trying to train a neural net using a dataframe constructed from an RDD
of LabeledPoints. The data frame's schema is:

[label: double, features: vector]

The actual features values are SparseVectors.  The runtime error I get when
I call


    val labeledPoints: RDD[LabeledPoint] = <generated earlier>

    val fields: Seq[StructField] = List[StructField] (
      StructField("label", DoubleType, nullable = false),
      StructField("features", VectorType, nullable = false))

    val schema : StructType = StructType(fields)

    val labeledPointsAsRowRDD = labeledPoints.map(point =>
             Row(point.label, point.features))
    val trainingDataFrame =
             spark.createDataFrame(labeledPointsAsRowRDD, schema)

    trainer.fit(trainingDataFrame)

is:

org.apache.spark.mllib.linalg.SparseVector is not a valid external type for
schema of vector

I'm not able to figure out whether the DataFrame doesn't conform to the
schema, or the schema doesn't conform to what the ml lib expects, or what.
Any suggestions would be very helpful.

 Also, I'm confused about why the MultilayerPerceptronClassifier doesn't
work directly with an RDD[LabeledPoint] as DecisionTree, RandomForest, etc
do.

Caused by: java.lang.RuntimeException: Error while encoding:
java.lang.RuntimeException: org.apache.spark.mllib.linalg.SparseVector is
not a valid external type for schema of vector
validateexternaltype(getexternalrowfield(assertnotnull(input[0,
org.apache.spark.sql.Row, true], top level row object), 0, label),
DoubleType) AS label#0
+- validateexternaltype(getexternalrowfield(assertnotnull(input[0,
org.apache.spark.sql.Row, true], top level row object), 0, label),
DoubleType)
   +- getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row,
true], top level row object), 0, label)
      +- assertnotnull(input[0, org.apache.spark.sql.Row, true], top level
row object)
         +- input[0, org.apache.spark.sql.Row, true]

newInstance(class org.apache.spark.ml.linalg.VectorUDT).serialize AS
features#1
+- newInstance(class org.apache.spark.ml.linalg.VectorUDT).serialize
   :- newInstance(class org.apache.spark.ml.linalg.VectorUDT)
   +- validateexternaltype(getexternalrowfield(assertnotnull(input[0,
org.apache.spark.sql.Row, true], top level row object), 1, features),
org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7)
      +- getexternalrowfield(assertnotnull(input[0,
org.apache.spark.sql.Row, true], top level row object), 1, features)
         +- assertnotnull(input[0, org.apache.spark.sql.Row, true], top
level row object)
            +- input[0, org.apache.spark.sql.Row, true]

-- 

*Pete Prokopowicz*Sr. Engineer - BEMOD!  Behavioral Modeling

600 W. Chicago Ave, Chicago, IL 60654
Mobile: 708-654-8137
Groupon
<http://www.google.com/url?q=http%3A%2F%2Fwww.groupon.com%2F&sa=D&sntz=1&usg=AFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg>
 II Grouponworks
<http://www.google.com/url?q=http%3A%2F%2Fwww.grouponworks.com%2F&sa=D&sntz=1&usg=AFrqEzdLBm3Dql75wz1BTY0mA30ov3RnWg>