You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:15:25 UTC
[jira] [Resolved] (SPARK-21754) No Exception/Warn When Join Columns
are Differing Types
[ https://issues.apache.org/jira/browse/SPARK-21754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21754.
----------------------------------
Resolution: Incomplete
> No Exception/Warn When Join Columns are Differing Types
> -------------------------------------------------------
>
> Key: SPARK-21754
> URL: https://issues.apache.org/jira/browse/SPARK-21754
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.2.0
> Environment: Ubuntu Xenial 16.04
> Reporter: Ed Lee
> Priority: Major
> Labels: bulk-closed
>
> No Exception/Warn When Join Columns are Differing Types, which can lead to problematic join results to the unsuspecting.
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as sf
> import pandas as pd
> spark = SparkSession.builder.master("local").appName("JoinTest").getOrCreate()
> * Spark infers LongType Schema for keycol:*
> left_df = pd.DataFrame({"keycol": [1], "col1": ["hello"]})
> left_sdf = spark.createDataFrame(left_df)
> left_sdf.schema
> left_sdf.show()
> right_df = pd.DataFrame({"keycol": ["1", "1", "01", "01"],
> "r_col2": ["alpha", "beta", "gamma", "theta"]
> })
> right_sdf = spark.createDataFrame(right_df)
> right_sdf.schema
> * But when joining no warning of mismatched types '01' get converted to 1*
> left_sdf.join(right_sdf, on="keycol", how="left").show()
> * Get:*
> +------+-----+------+
> |keycol| col1|r_col2|
> +------+-----+------+
> | 1|hello| alpha|
> | 1|hello| beta|
> | 1|hello| gamma|
> | 1|hello| theta|
> +------+-----+------+
> Think it'd be safer if it fails?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org