You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Franck Tago (JIRA)" <ji...@apache.org> on 2016/10/02 04:17:20 UTC

[jira] [Created] (SPARK-17758) Spark Aggregate function LAST returns null on an empty partition

Franck Tago created SPARK-17758:
-----------------------------------

             Summary: Spark Aggregate function  LAST returns null on an empty partition 
                 Key: SPARK-17758
                 URL: https://issues.apache.org/jira/browse/SPARK-17758
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.0.0
         Environment: Spark 2.0.0
            Reporter: Franck Tago


My Environment 
Spark 2.0.0  

I have included the physical plan of my application below.
Issue description
The result from  a query that uses the LAST function are incorrect. 

The output obtained for the column that corresponds to the last function is null .  

My input data contain 3 rows . 

The application resulted in  2 stages 

The first stage consisted of 3 tasks . 

The first task/partition contains 2 rows
The second task/partition contains 1 row
The last task/partition contain  0 rows

The result from the query executed for the LAST column call is NULL which I believe is due to the  PARTIAL_LAST on the last partition . 

I believe that this behavior is incorrect. The PARTIAL_LAST call on an empty partition should not return null .


== Physical Plan ==
InsertIntoHiveTable MetastoreRelation default, bdm_3449_tgt20, true, false
+- *Project [last(C3_1)#51 AS field#102, cast(round(max(C3_0)#50, 0) as int) AS field1#103, cast(round(max(C3_0)#50, 0) as int) AS field2#104]
   +- SortAggregate(key=[], functions=[max(C3_0#40),last(C3_1#41, false)], output=[max(C3_0)#50,last(C3_1)#51])
      +- SortAggregate(key=[], functions=[partial_max(C3_0#40),partial_last(C3_1#41, false)], output=[max#91,last#92])
         +- *Project [CAST(sum(C1_0) AS DOUBLE)#27 AS C3_0#40, last(C1_1)#28 AS C3_1#41]
            +- SortAggregate(key=[], functions=[sum(cast(C1_0#17 as bigint)),last(C1_1#18, false)], output=[CAST(sum(C1_0) AS DOUBLE)#27,last(C1_1)#28])
               +- Exchange SinglePartition
                  +- SortAggregate(key=[], functions=[partial_sum(cast(C1_0#17 as bigint)),partial_last(C1_1#18, false)], output=[sum#95L,last#96])
                     +- *Project [field1#7 AS C1_0#17, field#6 AS C1_1#18]
                        +- HiveTableScan [field1#7, field#6], MetastoreRelation default, bdm_3449_src, alias




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org