You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pierre Borckmans (JIRA)" <ji...@apache.org> on 2015/12/22 09:50:46 UTC

[jira] [Updated] (SPARK-12477) [SQL] Tungsten projection fails for null values in array fields

     [ https://issues.apache.org/jira/browse/SPARK-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Borckmans updated SPARK-12477:
-------------------------------------
    Description: 
Accessing null elements in an array field fails when tungsten is enabled.

The following code works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled:

{code}
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}

// Array of Int
case class AI( ai: Seq[Option[Int]] )
val dfAI = sc.parallelize( Seq( AI ( Seq(Some(1),None,Some(2) ) ) ) ).toDF
dfAI.registerTempTable("T_AI")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select ai[$i] from T_AI").collect.mkString(","))}

// Array of struct[Int,String]
case class B(x: Option[Int], y: String)
case class A( b: Seq[B] )
val df1 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ) ) ).toDF
df1.registerTempTable("T1")
val df2 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ), A(null) ) ).toDF
df2.registerTempTable("T2")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T1").collect.mkString(","))}
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T2").collect.mkString(","))}

// Struct[Int,String]
case class C(b: B)
val df3 = sc.parallelize( Seq( C ( B(Some(1),"test") ), C(null) ) ).toDF
df3.registerTempTable("T3")
sqlContext.sql("select b.x, b.y from T3").collect
{code}

With Tungsten enabled, it reaches NullPointerException.


  was:
Accessing null elements in an array field fails when tungsten is enabled.

The following code works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled:

```
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}

// Array of Int
case class AI( ai: Seq[Option[Int]] )
val dfAI = sc.parallelize( Seq( AI ( Seq(Some(1),None,Some(2) ) ) ) ).toDF
dfAI.registerTempTable("T_AI")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select ai[$i] from T_AI").collect.mkString(","))}

// Array of struct[Int,String]
case class B(x: Option[Int], y: String)
case class A( b: Seq[B] )
val df1 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ) ) ).toDF
df1.registerTempTable("T1")
val df2 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ), A(null) ) ).toDF
df2.registerTempTable("T2")
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T1").collect.mkString(","))}
for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T2").collect.mkString(","))}

// Struct[Int,String]
case class C(b: B)
val df3 = sc.parallelize( Seq( C ( B(Some(1),"test") ), C(null) ) ).toDF
df3.registerTempTable("T3")
sqlContext.sql("select b.x, b.y from T3").collect
```

With Tungsten enabled, it reaches NullPointerException.



> [SQL] Tungsten projection fails for null values in array fields
> ---------------------------------------------------------------
>
>                 Key: SPARK-12477
>                 URL: https://issues.apache.org/jira/browse/SPARK-12477
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Pierre Borckmans
>
> Accessing null elements in an array field fails when tungsten is enabled.
> The following code works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled:
> {code}
> // Array of String
> case class AS( as: Seq[String] )
> val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
> dfAS.registerTempTable("T_AS")
> for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
> // Array of Int
> case class AI( ai: Seq[Option[Int]] )
> val dfAI = sc.parallelize( Seq( AI ( Seq(Some(1),None,Some(2) ) ) ) ).toDF
> dfAI.registerTempTable("T_AI")
> for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select ai[$i] from T_AI").collect.mkString(","))}
> // Array of struct[Int,String]
> case class B(x: Option[Int], y: String)
> case class A( b: Seq[B] )
> val df1 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ) ) ).toDF
> df1.registerTempTable("T1")
> val df2 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), B(None, "c"), B(Some(4),null), B(None,null), null ) ), A(null) ) ).toDF
> df2.registerTempTable("T2")
> for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T1").collect.mkString(","))}
> for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, b[$i].y from T2").collect.mkString(","))}
> // Struct[Int,String]
> case class C(b: B)
> val df3 = sc.parallelize( Seq( C ( B(Some(1),"test") ), C(null) ) ).toDF
> df3.registerTempTable("T3")
> sqlContext.sql("select b.x, b.y from T3").collect
> {code}
> With Tungsten enabled, it reaches NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org