You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "mahmoud mehdi (JIRA)" <ji...@apache.org> on 2018/08/01 13:56:00 UTC

[jira] [Created] (SPARK-24988) Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType

mahmoud mehdi created SPARK-24988:
-------------------------------------

             Summary: Add a castBySchema method which casts all the values of a DataFrame based on the DataTypes of a StructType
                 Key: SPARK-24988
                 URL: https://issues.apache.org/jira/browse/SPARK-24988
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: mahmoud mehdi


The main goal of this User Story is to extend the Dataframe methods in order to add a method which casts all the values of a Dataframe, based on the DataTypes of a StructType.

This feature can be useful when we have a large dataframe, and that we need to make multiple casts. In that case, we won't have to cast each value independently, all we have to do is to pass a StructType to the method castBySchema with the types we need (In real world examples, this schema is generally provided by the client, which was my case).

I'll explain the new feature via an example, let's create a dataframe of strings : 
{code:java}
val df = Seq(("test1", "0"), ("test2", "1")).toDF("name", "id")
{code}
Let's suppose that we want to cast the second column's values of the dataframe to integers, all we have to do is the following : 
{code:java}
val schema = StructType( Seq( StructField("name", StringType, true), StructField("id", IntegerType, true))){code}
{code:java}
df.castBySchema(schema)
{code}

I made sure that castBySchema works also with nested StructTypes by adding several tests.



 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org