You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Franklyn Dsouza (JIRA)" <ji...@apache.org> on 2016/03/07 20:59:40 UTC
[jira] [Created] (SPARK-13730) Nulls in dataframes getting
converted to 0 with spark 2.0 SNAPSHOT
Franklyn Dsouza created SPARK-13730:
---------------------------------------
Summary: Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT
Key: SPARK-13730
URL: https://issues.apache.org/jira/browse/SPARK-13730
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Franklyn Dsouza
Priority: Critical
Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0.
I haven't tested this on 1.6.1 or in Scala.
{code}
from pyspark.sql import types
from pyspark.sql import DataFrame, types, functions as F
sql_schema = types.StructType([
types.StructField("a", types.LongType(), True),
types.StructField("b", types.StringType(), True),
])
df = sqlCtx.createDataFrame([
(1, "one"),
(None, "two"),
], sql_schema)
# Everything is fine here
df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]
def assert_not_null(val):
return val
udf = F.udf(assert_not_null, types.LongType())
df = df.withColumnRenamed('a', "_tmp_col")
df = df.withColumn('a', udf(df._tmp_col))
df = df.drop("_tmp_col")
# None gets converted to 0
df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org