You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Kravitz (Jira)" <ji...@apache.org> on 2019/10/09 19:27:00 UTC

[jira] [Created] (SPARK-29418) Mismatched indices between input and featureImportances is at best extremely confusing

David Kravitz created SPARK-29418:
-------------------------------------

             Summary: Mismatched indices between input and featureImportances is at best extremely confusing
                 Key: SPARK-29418
                 URL: https://issues.apache.org/jira/browse/SPARK-29418
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.4.4
         Environment: I'm on AWS but I presume this is happening everywhere.  
            Reporter: David Kravitz


When you read in a "libsvm" file, it requires you to be one-based, so lines look like this:

37.0 1:1.0 2:2.75

But then when you finish something like RandomForestRegressor and look at feature importances, it is zero based.  

model.stages[-1].featureImportances

SparseVector(144, \{0: 0.0292, 1: 0.0041}

I guess you can add one to make them line up, but why force us to do that?  Either accept zero-based lists on libsvm files (easiest) or have featureImportances output correctly.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org