You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Franklyn Dsouza (JIRA)" <ji...@apache.org> on 2016/03/07 20:59:40 UTC

[jira] [Created] (SPARK-13730) Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT

Franklyn Dsouza created SPARK-13730:
---------------------------------------

             Summary: Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT
                 Key: SPARK-13730
                 URL: https://issues.apache.org/jira/browse/SPARK-13730
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 2.0.0
            Reporter: Franklyn Dsouza
            Priority: Critical


Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0. 

I haven't tested this on 1.6.1 or in Scala.

{code}
from pyspark.sql import types
from pyspark.sql import DataFrame, types, functions as F

sql_schema = types.StructType([
  types.StructField("a", types.LongType(), True),
  types.StructField("b", types.StringType(),  True),
])

df = sqlCtx.createDataFrame([
    (1, "one"),
    (None, "two"),
], sql_schema)

# Everything is fine here
df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]

def assert_not_null(val):
    return val

udf = F.udf(assert_not_null, types.LongType())

df = df.withColumnRenamed('a', "_tmp_col")
df = df.withColumn('a', udf(df._tmp_col))
df = df.drop("_tmp_col")

# None gets converted to 0
df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org