You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2017/08/15 08:20:00 UTC

[jira] [Commented] (SPARK-21723) Can't write LibSVM - key not found: numFeatures

    [ https://issues.apache.org/jira/browse/SPARK-21723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126939#comment-16126939 ] 

Nick Pentreath commented on SPARK-21723:
----------------------------------------

Yes, we should definitely be able to write LibSVM format regardless of whether the original data was read from that format, and whether we have ML metadata attached to the dataframe. We should be able to inspect the vectors to get the size in the absence of the metadata.



> Can't write LibSVM - key not found: numFeatures
> -----------------------------------------------
>
>                 Key: SPARK-21723
>                 URL: https://issues.apache.org/jira/browse/SPARK-21723
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, ML
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Jan Vršovský
>
> Writing a dataset to LibSVM format raises an exception
> {{java.util.NoSuchElementException: key not found: numFeatures}}
> Happens only when the dataset was NOT read from a LibSVM format before (because otherwise numFeatures is in its metadata). Steps to reproduce:
> {{import org.apache.spark.ml.linalg.Vectors
> val rawData = Seq((1.0, Vectors.sparse(3, Seq((0, 2.0), (1, 3.0)))),
>                   (4.0, Vectors.sparse(3, Seq((0, 5.0), (2, 6.0)))))
> val dfTemp = spark.sparkContext.parallelize(rawData).toDF("label", "features")
> dfTemp.coalesce(1).write.format("libsvm").save("...filename...")}}
> PR with a fix and unit test is ready.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org