You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason Ferrell (JIRA)" <ji...@apache.org> on 2019/02/15 23:32:03 UTC
[jira] [Commented] (SPARK-26693) Large Numbers Truncated

    [ https://issues.apache.org/jira/browse/SPARK-26693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769856#comment-16769856 ] 

Jason Ferrell commented on SPARK-26693:
---------------------------------------

Perhaps this is an issue with Zeppelin's sql interpreter.  Taking the example and adding line:

from pyspark.sql.types import *

sqlContext.sql('select * from global_temp.testTable').show(3)

Result:

+-------------------+-------------------+-------------------+ | idAsString| idAsBigint| idAsLong| +-------------------+-------------------+-------------------+ |4065453307562594031|4065453307562594031|4065453307562594031| |7659957277770523059|7659957277770523059|7659957277770523059| |1614560078712787995|1614560078712787995|1614560078712787995| +-------------------+-------------------+-------------------+

> Large Numbers Truncated 
> ------------------------
>
>                 Key: SPARK-26693
>                 URL: https://issues.apache.org/jira/browse/SPARK-26693
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Code was run in Zeppelin using Spark 2.4.
>            Reporter: Jason Blahovec
>            Priority: Major
>
> We have a process that takes a file dumped from an external API and formats it for use in other processes.  These API dumps are brought into Spark with all fields read in as strings.  One of the fields is a 19 digit visitor ID.  Since implementing Spark 2.4 a few weeks ago, we have noticed that dataframes read the 19 digits correctly but any function in SQL appears to truncate the last two digits and replace them with "00".  
> Our process is set up to convert these numbers to bigint, which worked before Spark 2.4.  We looked into data types, and the possibility of changing to a "long" type with no luck.  At that point we tried bringing in the string value as is, with the same result.  I've added code that should replicate the issue with a few 19 digit test cases and demonstrating the type conversions I tried.
> Results for the code below are shown here:
> dfTestExpanded.show:
> +-------------------+-------------------+-------------------+ | idAsString| idAsBigint| idAsLong| +-------------------+-------------------+-------------------+ |4065453307562594031|4065453307562594031|4065453307562594031| |7659957277770523059|7659957277770523059|7659957277770523059| |1614560078712787995|1614560078712787995|1614560078712787995| +-------------------+-------------------+-------------------+
> Run this query in a paragraph:
> %sql
> select * from global_temp.testTable
> and see these results (all 3 columns):
> 4065453307562594000
> 7659957277770523000
> 1614560078712788000
>  
> Another notable observation was that this issue soes not appear to affect joins on the affected fields - we are seeing issues when the fields are used in where clauses or as part of a select list.
>  
>  
> {code:java}
> // code placeholder
> %pyspark
> from pyspark.sql.functions import *
> sfTestValue = StructField("testValue",StringType(), True)
> schemaTest = StructType([sfTestValue])
> listTestValues = []
> listTestValues.append(("4065453307562594031",))
> listTestValues.append(("7659957277770523059",))
> listTestValues.append(("1614560078712787995",))
> dfTest = spark.createDataFrame(listTestValues, schemaTest)
> dfTestExpanded = dfTest.selectExpr(\
> "testValue as idAsString",\
> "cast(testValue as bigint) as idAsBigint",\
> "cast(testValue as long) as idAsLong")
> dfTestExpanded.show() ##This will show three columns of data correctly.
> dfTestExpanded.createOrReplaceGlobalTempView('testTable') ##When this table is viewed in a %sql paragraph, the truncated values are shown.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org