You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/06/23 12:33:16 UTC

[jira] [Updated] (SPARK-16170) Throw error when row is not schema-compatible

     [ https://issues.apache.org/jira/browse/SPARK-16170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-16170:
------------------------------
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

I don't think that's a bug. You're asking for different behavior. Really it's a problem with your data or schema, right?

> Throw error when row is not schema-compatible
> ---------------------------------------------
>
>                 Key: SPARK-16170
>                 URL: https://issues.apache.org/jira/browse/SPARK-16170
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Federico Ponzi
>            Priority: Minor
>
> We are using Spark to import some data from mysql.
> We just found that many of our imports are useless because our import function was wrongly forcing the longtype to a float column. 
> Consider this example:
> {code}
> from pyspark.sql.types import *
> sqlContext = SQLContext(sc)
> sch = StructType([StructField("id", LongType(), True), StructField("rol", StringType(), True)])
> i = ['{"id": 1, "rol": "str"}', '{"id": 2.4, "rol": "str"}']
> rdd = sc.parallelize(i)
> df = sqlContext.read.json(rdd, schema=sch)
> print df.collect()
> {code}
> The output is:
> {code}
> [Row(id=1, rol=u'str'), Row(id=None, rol=None)]
> {code}
> Every column in the second row is null, not only id which has a wrong datatype and no error is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org