You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/09/09 02:22:45 UTC
[jira] [Closed] (SPARK-10467) Vector is converted to tuple when
extracted from Row using __getitem__
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley closed SPARK-10467.
-------------------------------------
Resolution: Fixed
Assignee: Davies Liu
Fix Version/s: 1.5.0
Confirmed that it's a duplicate.
> Vector is converted to tuple when extracted from Row using __getitem__
> ----------------------------------------------------------------------
>
> Key: SPARK-10467
> URL: https://issues.apache.org/jira/browse/SPARK-10467
> Project: Spark
> Issue Type: Bug
> Components: ML, PySpark, SQL
> Affects Versions: 1.4.1
> Reporter: Maciej Szymkiewicz
> Assignee: Davies Liu
> Priority: Minor
> Fix For: 1.5.0
>
>
> If we take a row from a data frame and try to extract vector element by index it is converted to tuple:
> {code}
> from pyspark.ml.feature import HashingTF
> df = sqlContext.createDataFrame([(["foo", "bar"], )], ("keys", ))
> transformer = HashingTF(inputCol="keys", outputCol="vec", numFeatures=5)
> transformed = transformer.transform(df)
> row = transformed.first()
> row.vec # As expected
> ## SparseVector(5, {4: 2.0})
> row[1] # Returns tuple
> ## (0, 5, [4], [2.0])
> {code}
> Problem cannot be reproduced if we create and access Row directly:
> {code}
> from pyspark.mllib.linalg import Vectors
> from pyspark.sql.types import Row
> row = Row(vec=Vectors.sparse(3, [(0, 1)]))
> row.vec
> ## SparseVector(3, {0: 1.0})
> row[0]
> ## SparseVector(3, {0: 1.0})
> {code}
> but if we use above to create a data frame and extract:
> {code}
> df = sqlContext.createDataFrame([row], ("vec", ))
> df.first()[0]
> ## (0, 3, [0], [1.0])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org