You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Kravitz (Jira)" <ji...@apache.org> on 2019/10/09 19:27:00 UTC
[jira] [Created] (SPARK-29418) Mismatched indices between input and
featureImportances is at best extremely confusing
David Kravitz created SPARK-29418:
-------------------------------------
Summary: Mismatched indices between input and featureImportances is at best extremely confusing
Key: SPARK-29418
URL: https://issues.apache.org/jira/browse/SPARK-29418
Project: Spark
Issue Type: Bug
Components: ML
Affects Versions: 2.4.4
Environment: I'm on AWS but I presume this is happening everywhere.
Reporter: David Kravitz
When you read in a "libsvm" file, it requires you to be one-based, so lines look like this:
37.0 1:1.0 2:2.75
But then when you finish something like RandomForestRegressor and look at feature importances, it is zero based.
model.stages[-1].featureImportances
SparseVector(144, \{0: 0.0292, 1: 0.0041}
I guess you can add one to make them line up, but why force us to do that? Either accept zero-based lists on libsvm files (easiest) or have featureImportances output correctly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org