You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "RoopTeja Muppalla (JIRA)" <ji...@apache.org> on 2019/07/26 22:34:00 UTC

[jira] [Created] (SPARK-28533) Spark datatype error

RoopTeja Muppalla created SPARK-28533:
-----------------------------------------

             Summary: Spark datatype error
                 Key: SPARK-28533
                 URL: https://issues.apache.org/jira/browse/SPARK-28533
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.1
            Reporter: RoopTeja Muppalla


Hello,

I have faced an issue while casting the datatype of a column in pyspark 2.4.1.

Say that i have the following data frame in which column B is a string which has a list or arrays

df = spark.createDataFrame([("row1", "[[12.46575,13.78697],[10.565,*11*]]"),  ("row2", "[[1.2345,13.45454],[6.6868,0.234524]]")], schema=['A', 'B'])

Now i want to convert the column B to a Arraytype, so i have used the following code

to_array = udf(lambda x: ast.literal_eval(x.replace('\"', '')), ArrayType(ArrayType(DoubleType())))
df = df.withColumn('C', to_array(col('B')))

The new column C is an ArrayType of ArrayType with elements of DoubleType. But with this code I was not able to convert the integer type value *11.* This value is not part of the final output.
||A||B||C||
|row1|[[12.46575,13.78697],[10.565,*11*]]|[[12.46575, 13.78697], [10.565,]]|
|row2|[[1.2345,13.45454],[6.6868,0.234524]]|[[1.2345, 13.45454], [6.6868, 0.234524]]|

As you could see, the column C does not have 11. If I replace the DoubleType to FloatType same error and if I replace it with DecimalType the output is all empty.

I am not sure whether there is a issue with my code or it is a bug.

Hope, someone can provide some clarification on this. Thanks!!

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org