You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Patrick <ti...@gmail.com> on 2020/01/10 04:01:21 UTC
Spark Mllib logistic regression setWeightCol illegal argument exception
Hi Spark Users,
I am trying to solve a class imbalance problem, I figured out, spark
supports setting weight in its API but I get IIlegal Argument exception
weight column do not exist, but it do exists in the dataset. Any
recommedation to go about this problem ? I am using Pipeline API with
Logistic regression model and TestTrainSplit.
LogisticRegression l;
l.setWeightCol()
Caused by: java.lang.IllegalArgumentException: Field "weight" does not
exist.
at
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
at
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.apache.spark.sql.types.StructType.apply(StructType.scala:266)
at
org.apache.spark.ml.util.SchemaUtils$.checkNumericType(SchemaUtils.scala:71)
at
org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:58)
at org.apache.spark.ml.classification.Classifier.org
$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58)
at
org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:42)
at org.apache.spark.ml.classification.ProbabilisticClassifier.org
$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:53)
at
org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37)
at org.apache.spark.ml.classification.LogisticRegression.org
$apache$spark$ml$classification$LogisticRegressionParams$$super$validateAndTransformSchema(LogisticRegression.scala:278)
at
org.apache.spark.ml.classification.LogisticRegressionParams$class.validateAndTransformSchema(LogisticRegression.scala:265)
at
org.apache.spark.ml.classification.LogisticRegression.validateAndTransformSchema(LogisticRegression.scala:278)
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:144)
at
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184)
at
org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184)
at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:184)
at
org.apache.spark.ml.tuning.ValidatorParams$class.transformSchemaImpl(ValidatorParams.scala:77)
at
org.apache.spark.ml.tuning.TrainValidationSplit.transformSchemaImpl(TrainValidationSplit.scala:67)
at
org.apache.spark.ml.tuning.TrainValidationSplit.transformSchema(TrainValidationSplit.scala:180)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at
org.apache.spark.ml.tuning.TrainValidationSplit.fit(TrainValidationSplit.scala:121)
Thanks in advance,