You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tom Tang (JIRA)" <ji...@apache.org> on 2017/03/16 09:20:41 UTC
[jira] [Commented] (SPARK-19971) Wired SELECT equal behaviour.

    [ https://issues.apache.org/jira/browse/SPARK-19971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927714#comment-15927714 ] 

Tom Tang commented on SPARK-19971:
----------------------------------

I think I've found the answer, because it will be cast as double, the precision would be lost.

{code}
spark-sql> explain select * from T where cid = -100224910923912596;
== Physical Plan ==
*Project [cid#0, name#1]
+- *Filter (isnotnull(cid#0) && (cast(cid#0 as double) = -1.00224910923912592E17))
   +- *FileScan csv [cid#0,name#1] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/tmp/1.csv], PartitionFilters: [], PushedFilters: [IsNotNull(cid)], ReadSchema: struct<cid:string,name:string>
Time taken: 0.047 seconds, Fetched 1 row(s)
{code}

{code}
spark-sql> explain select * from T where cid = -100224910923912595;
== Physical Plan ==
*Project [cid#0, name#1]
+- *Filter (isnotnull(cid#0) && (cast(cid#0 as double) = -1.00224910923912592E17))
   +- *FileScan csv [cid#0,name#1] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/tmp/1.csv], PartitionFilters: [], PushedFilters: [IsNotNull(cid)], ReadSchema: struct<cid:string,name:string>
Time taken: 0.039 seconds, Fetched 1 row(s)
{code}

> Wired SELECT equal behaviour. 
> ------------------------------
>
>                 Key: SPARK-19971
>                 URL: https://issues.apache.org/jira/browse/SPARK-19971
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>         Environment: macOS Sierra
>            Reporter: Tom Tang
>            Priority: Critical
>
> Let say we have a csv /tmp/1.csv :
> {quote}
> cid,name
> -100224910923912596,jack
> -100224910923912595,tom
> -1,rose
> -2,marry
> -100,rose1
> -101,rose2
> {quote}
> Use following SQL to define a view in Spark-SQL:
> CREATE TEMPORARY VIEW T
> (
>   `cid` string,
>   `name` string
> )
> USING CSV
> OPTIONS (
>   path "/tmp/1.csv"
> );
> Statement 1:
> {quote}select * from T where cid = -100224910923912596; {quote}
> Returns:
> {quote}
> -100224910923912596	jack
> -100224910923912595	tom
> {quote}
> Statement 2:
> {quote}select * from T where cid = -100224910923912599;{quote}
> it also returns:
> {quote}
> -100224910923912596	jack
> -100224910923912595	tom
> {quote}
> Unless you do, 
> {quote}select * from T where cid = '-100224910923912596';{quote}
> It returns: 
> {quote}
> -100224910923912596	jack
> {quote}
> However, i think the expected behaviour for statement 1 and 2 is pretty wired.
> Statement 4
> {quote}select * from T where cid = -100;{quote}
> Returns:
> {quote}-100	rose1{quote}
> And this just affect the large number, the smaller one seemed to be good.
> Does that look like a bug to you folks ?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org