You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 03:59:22 UTC

[jira] [Updated] (SPARK-21754) No Exception/Warn When Join Columns are Differing Types

     [ https://issues.apache.org/jira/browse/SPARK-21754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-21754:
---------------------------------
    Labels: bulk-closed  (was: )

> No Exception/Warn When Join Columns are Differing Types
> -------------------------------------------------------
>
>                 Key: SPARK-21754
>                 URL: https://issues.apache.org/jira/browse/SPARK-21754
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>         Environment: Ubuntu Xenial 16.04
>            Reporter: Ed Lee
>            Priority: Major
>              Labels: bulk-closed
>
> No Exception/Warn When Join Columns are Differing Types, which can lead to problematic join results to the unsuspecting.
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as sf
> import pandas as pd
> spark = SparkSession.builder.master("local").appName("JoinTest").getOrCreate()
> * Spark infers LongType Schema for keycol:*
> left_df = pd.DataFrame({"keycol": [1], "col1": ["hello"]})
> left_sdf = spark.createDataFrame(left_df)
> left_sdf.schema
> left_sdf.show()
> right_df = pd.DataFrame({"keycol": ["1", "1", "01", "01"],
>                          "r_col2": ["alpha", "beta", "gamma", "theta"]
>                          })
> right_sdf = spark.createDataFrame(right_df)
> right_sdf.schema
> * But when joining no warning of mismatched types '01' get converted to 1*
> left_sdf.join(right_sdf, on="keycol", how="left").show()
> * Get:*
>  +------+-----+------+
>  |keycol| col1|r_col2|
>  +------+-----+------+
>  |     1|hello| alpha|
>  |     1|hello|  beta|
>  |     1|hello| gamma|
>  |     1|hello| theta|
>  +------+-----+------+
> Think it'd be safer if it fails?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org