You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/06/05 13:47:04 UTC

[jira] [Commented] (SPARK-20987) columns with name having dots caused issues with VectorAssemblor

    [ https://issues.apache.org/jira/browse/SPARK-20987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036952#comment-16036952 ] 

Sean Owen commented on SPARK-20987:
-----------------------------------

Looks like a duplicate of one of several issues, like SPARK-12965.

> columns with name having dots caused issues with VectorAssemblor
> ----------------------------------------------------------------
>
>                 Key: SPARK-20987
>                 URL: https://issues.apache.org/jira/browse/SPARK-20987
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>            Reporter: Maher Hattabi
>
> Hello 
>  i used this code knowing that that the data contains actually dots here is the dataset.
> "col0.1","col1.2","col2.3","col3.4"
> 1,2,3,4
> 10,12,15,3
> 1,12,10,5
> Here is the code i used  
> val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
> val df = spark.read.format("csv").options(Map("header" -> "true", "inferSchema" -> "true")).load("C:/Users/mhattabi/Desktop/donnee/test.txt")
> val rows = new VectorAssembler().setInputCols(df.columns).setOutputCol("vs").transform(df).select("vs").rdd
> val data =rows  .map(_.getAs[org.apache.spark.ml.linalg.Vector](0))
>   .map(org.apache.spark.mllib.linalg.Vectors.fromML)
> val mat: RowMatrix = new RowMatrix(data)
> //// Compute the top 5 singular values and corresponding singular vectors.
> val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(mat.numCols().toInt, computeU = true)
> val U: RowMatrix = svd.U  // The U factor is a RowMatrix.
> val s: Vector = svd.s  // The singular values are stored in a local dense vector.
> val V: Matrix = svd.V  // The V factor is a local dense matrix.
> Here is the issue 
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "col0.1" among (col0.1, col1.2, col2.3, col3.4);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org