You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Raghuvarran V H (Jira)" <ji...@apache.org> on 2020/06/22 11:30:00 UTC

[jira] [Updated] (SPARK-32050) GBTClassifier not working with OnevsRest

     [ https://issues.apache.org/jira/browse/SPARK-32050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghuvarran V H updated SPARK-32050:
------------------------------------
    Description: 
I am trying to use GBT classifier for multi class classification using OnevsRest

 
{code:java}
from pyspark.ml.classification import MultilayerPerceptronClassifier,OneVsRest,GBTClassifier
from pyspark.ml import Pipeline,PipelineModel
lr = GBTClassifier(featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32,                                          minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False,                                                  checkpointInterval=10, lossType='logistic', maxIter=20,stepSize=0.1, seed=None,                                                                subsamplingRate=1.0, featureSubsetStrategy='auto')

classifier = OneVsRest(featuresCol='features', labelCol='label', predictionCol='prediction', classifier=lr,                                                                   weightCol=None,parallelism=1)
pipeline = Pipeline(stages=[str_indxr,ohe,vecAssembler,normalizer,classifier])
model = pipeline.fit(train_data)
{code}
 

 

When I try this I get this error:

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/python/pyspark/ml/classification.py in _fit(self, dataset)
 1800 classifier = self.getClassifier()
 1801 assert isinstance(classifier, HasRawPredictionCol),\
 -> 1802 "Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier)
 1803 
 1804 numClasses = int(dataset.agg(\{labelCol: "max"}).head()["max("+labelCol+")"]) + 1

AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.

  was:
I am trying to use GBT classifier for multi class classification using OnevsRest

from pyspark.ml.classification import MultilayerPerceptronClassifier,OneVsRest,GBTClassifier

from pyspark.ml import Pipeline,PipelineModel

lr = GBTClassifier(featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32, minInstancesPerNode=1, \
 minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType='logistic', maxIter=20,\
 stepSize=0.1, seed=None, subsamplingRate=1.0, featureSubsetStrategy='auto')
 classifier = OneVsRest(featuresCol='features', labelCol='label', predictionCol='prediction', \
 classifier=lr, weightCol=None, parallelism=1)

pipeline = Pipeline(stages=[str_indxr,ohe,vecAssembler,normalizer,classifier])

model = pipeline.fit(train_data)

 

When I try this I get this error:

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/python/pyspark/ml/classification.py in _fit(self, dataset)
 1800 classifier = self.getClassifier()
 1801 assert isinstance(classifier, HasRawPredictionCol),\
-> 1802 "Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier)
 1803 
 1804 numClasses = int(dataset.agg(\{labelCol: "max"}).head()["max("+labelCol+")"]) + 1

AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.


> GBTClassifier not working with OnevsRest
> ----------------------------------------
>
>                 Key: SPARK-32050
>                 URL: https://issues.apache.org/jira/browse/SPARK-32050
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0
>         Environment: spark 2.4.0
>            Reporter: Raghuvarran V H
>            Priority: Minor
>
> I am trying to use GBT classifier for multi class classification using OnevsRest
>  
> {code:java}
> from pyspark.ml.classification import MultilayerPerceptronClassifier,OneVsRest,GBTClassifier
> from pyspark.ml import Pipeline,PipelineModel
> lr = GBTClassifier(featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32,                                          minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False,                                                  checkpointInterval=10, lossType='logistic', maxIter=20,stepSize=0.1, seed=None,                                                                subsamplingRate=1.0, featureSubsetStrategy='auto')
> classifier = OneVsRest(featuresCol='features', labelCol='label', predictionCol='prediction', classifier=lr,                                                                   weightCol=None,parallelism=1)
> pipeline = Pipeline(stages=[str_indxr,ohe,vecAssembler,normalizer,classifier])
> model = pipeline.fit(train_data)
> {code}
>  
>  
> When I try this I get this error:
> /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/python/pyspark/ml/classification.py in _fit(self, dataset)
>  1800 classifier = self.getClassifier()
>  1801 assert isinstance(classifier, HasRawPredictionCol),\
>  -> 1802 "Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier)
>  1803 
>  1804 numClasses = int(dataset.agg(\{labelCol: "max"}).head()["max("+labelCol+")"]) + 1
> AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org