You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pablo Langa Blanco (Jira)" <ji...@apache.org> on 2023/02/26 22:57:00 UTC

[jira] [Commented] (SPARK-40525) Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL

    [ https://issues.apache.org/jira/browse/SPARK-40525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693721#comment-17693721 ] 

Pablo Langa Blanco commented on SPARK-40525:
--------------------------------------------

Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior and you have 3 policies for type coercion rules. ([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you expected, but it's not the policy by default.

I hope it help you.

> Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40525
>                 URL: https://issues.apache.org/jira/browse/SPARK-40525
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: xsys
>            Priority: Major
>
> h3. Describe the bug
> Storing an invalid INT value {{1.1}} using DataFrames via {{spark-shell}} expectedly errors out. However, it is evaluated to a rounded value {{1}} if the value is inserted into the table via {{{}spark-sql{}}}.
> h3. Steps to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql {code}
> Execute the following:
> {code:java}
> spark-sql> create table int_floating_point_vals(c1 INT) stored as ORC;
> 22/09/19 16:49:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
> Time taken: 0.216 seconds
> spark-sql> insert into int_floating_point_vals select 1.1;
> Time taken: 1.747 seconds
> spark-sql> select * from int_floating_point_vals;
> 1
> Time taken: 0.518 seconds, Fetched 1 row(s){code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to behave consistently for the same data type & input combination ({{{}INT{}}} and {{{}1.1{}}}).
> h4. Here is a simplified example in {{{}spark-shell{}}}, where insertion of the aforementioned value correctly raises an exception:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following:
> {code:java}
> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(Seq(Row(1.1)))
> val schema = new StructType().add(StructField("c1", IntegerType, true))
> val df = spark.createDataFrame(rdd, schema)
> df.write.mode("overwrite").format("orc").saveAsTable("int_floating_point_vals") {code}
> The following exception is raised:
> {code:java}
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Double is not a valid external type for schema of int{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org