You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/08/02 18:02:00 UTC

[jira] [Resolved] (SPARK-24988) Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType

     [ https://issues.apache.org/jira/browse/SPARK-24988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24988.
----------------------------------
    Resolution: Won't Fix

> Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24988
>                 URL: https://issues.apache.org/jira/browse/SPARK-24988
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: mahmoud mehdi
>            Priority: Minor
>
> The main goal of this User Story is to extend the Dataframe methods in order to add a method which casts all the values of a Dataframe, based on the DataTypes of a StructType.
> This feature can be useful when we have a large dataframe, and that we need to make multiple casts. In that case, we won't have to cast each value independently, all we have to do is to pass a StructType to the method castBySchema with the types we need (In real world examples, this schema is generally provided by the client, which was my case).
> I'll explain the new feature via an example, let's create a dataframe of strings : 
> {code:java}
> val df = Seq(("test1", "0"), ("test2", "1")).toDF("name", "id")
> {code}
> Let's suppose that we want to cast the second column's values of the dataframe to integers, all we have to do is the following : 
> {code:java}
> val schema = StructType( Seq( StructField("name", StringType, true), StructField("id", IntegerType, true))){code}
> {code:java}
> df.castBySchema(schema)
> {code}
> I made sure that castBySchema works also with nested StructTypes by adding several tests.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org