You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charlie Feng (JIRA)" <ji...@apache.org> on 2018/11/21 08:47:00 UTC

[jira] [Updated] (SPARK-26136) Row.getAs return null value in some condition

     [ https://issues.apache.org/jira/browse/SPARK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charlie Feng updated SPARK-26136:
---------------------------------
    Component/s: Spark Core

> Row.getAs return null value in some condition
> ---------------------------------------------
>
>                 Key: SPARK-26136
>                 URL: https://issues.apache.org/jira/browse/SPARK-26136
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.0, 2.3.2, 2.4.0
>         Environment: Windows 10
> JDK 1.8.0_181
> scala 2.11.12
> spark 2.4.0 / 2.3.2 / 2.3.0
>  
>            Reporter: Charlie Feng
>            Priority: Major
>
> Row.getAs("fieldName") will return null value when all below conditions met:
>  * Used in DataFrame.flatMap()
>  * Another map() call inside flatMap
>  * call row.getAs("fieldName") inside a Tuple.
> *Source code to reproduce the bug:*
> import org.apache.spark.sql.SparkSession
> object FlatMapGetAsBug {
> def main(args: Array[String]) {
>  val spark = SparkSession.builder.appName("SparkUtil").master("local").getOrCreate
>  import spark.implicits._;
> val df = Seq(("a1", "b1", "x,y,z")).toDF("A", "B", "XYZ")
>  df.show();
>  val df2 = df.flatMap(row => row.getAs[String]("XYZ").split(",")
>  .map(xyz => {
>  var colA: String = row.getAs("A");
>  var col0: String = row.getString(0);
>  (row.getAs("A"), colA, row.getString(0), col0, row.getString(1), xyz)
>  })).toDF("ColumnA_API1", "ColumnA_API2", "ColumnA_API3", "ColumnA_API4", "ColumnB", "ColumnXYZ")
> df2.show();
>  spark.close()
>  }
> }
> *Console Output:*
> +---+---+-----+
> | A| B| XYZ|
> +---+---+-----+
> | a1| b1|x,y,z|
> +---+---+-----+
> +------------+------------+------------+------------+-------+---------+
> |ColumnA_API1|ColumnA_API2|ColumnA_API3|ColumnA_API4|ColumnB|ColumnXYZ|
> +------------+------------+------------+------------+-------+---------+
> | null| a1| a1| a1| b1| x|
> | null| a1| a1| a1| b1| y|
> | null| a1| a1| a1| b1| z|
> +------------+------------+------------+------------+-------+---------+
> We try to get "A" column with 4 approach
> 1) call row.getAs("A") inside a tuple
> 2) call row.getAs("A"), save result into a variable "colA", and add variable into the tuple
> 3) call row.getString(0) inside a tuple
> 4) call row.getString(0), save result into a variable "col0", and add variable into the tuple 
> And we found that approach 2~4 get value "a1" successfully, but approach 1 get "null"
> This issue existing in spark 2.4.0/2.3.2/2.3.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org