You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:16:12 UTC

[jira] [Resolved] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

     [ https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-19653.
----------------------------------
    Resolution: Incomplete

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> ----------------------------------------------------------
>
>                 Key: SPARK-19653
>                 URL: https://issues.apache.org/jira/browse/SPARK-19653
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib, SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Mike Dusenberry
>            Priority: Major
>              Labels: bulk-closed
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$] to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain vectors, it does not allow one to make complete use of the rich set of features made available by Spark SQL.  For example, it is not possible to use any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} column, nor is it possible to save a DataFrame with a {{Vector}} column as a CSV file.  In any of these cases, an error message is returned with an note that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a first-class citizen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org