You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/11/21 09:10:00 UTC
[jira] [Resolved] (SPARK-26136) Row.getAs return null value in some
condition
[ https://issues.apache.org/jira/browse/SPARK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26136.
----------------------------------
Resolution: Invalid
For questions, please ask to mailing list next time.
When filing an issue, please make it readable as much as possible.
> Row.getAs return null value in some condition
> ---------------------------------------------
>
> Key: SPARK-26136
> URL: https://issues.apache.org/jira/browse/SPARK-26136
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.3.0, 2.3.2, 2.4.0
> Environment: Windows 10
> JDK 1.8.0_181
> scala 2.11.12
> spark 2.4.0 / 2.3.2 / 2.3.0
>
> Reporter: Charlie Feng
> Priority: Major
>
> {{Row.getAs("fieldName")}} will return null value when all below conditions met:
> * Used in {{DataFrame.flatMap()}}
> * {{Another map()}} call inside {{flatMap}}
> * call {{row.getAs("fieldName")}} inside a {{Tuple}}.
> Source code to reproduce the bug:
> {code}
> import org.apache.spark.sql.SparkSession
> object FlatMapGetAsBug {
> def main(args: Array[String]) {
> val spark = SparkSession.builder.appName("SparkUtil").master("local").getOrCreate
> import spark.implicits._;
> val df = Seq(("a1", "b1", "x,y,z")).toDF("A", "B", "XYZ")
> df.show();
> val df2 = df.flatMap { row =>
> row.getAs[String]("XYZ").split(",").map { xyz =>
> var colA: String = row.getAs("A");
> var col0: String = row.getString(0);
> (row.getAs("A"), colA, row.getString(0), col0, row.getString(1), xyz)
> }
> }.toDF("ColumnA_API1", "ColumnA_API2", "ColumnA_API3", "ColumnA_API4", "ColumnB", "ColumnXYZ")
> df2.show();
> spark.close()
> }
> }
> {code}
> Console Output:
> {code}
> +---+---+-----+
> | A| B| XYZ|
> +---+---+-----+
> | a1| b1|x,y,z|
> +---+---+-----+
> +------------+------------+------------+------------+-------+---------+
> |ColumnA_API1|ColumnA_API2|ColumnA_API3|ColumnA_API4|ColumnB|ColumnXYZ|
> +------------+------------+------------+------------+-------+---------+
> | null| a1| a1| a1| b1| x|
> | null| a1| a1| a1| b1| y|
> | null| a1| a1| a1| b1| z|
> +------------+------------+------------+------------+-------+---------+
> {code}
> We try to get "A" column with 4 approach
> 1. call {{row.getAs("A")}} inside a tuple
> 2. call {{row.getAs("A")}}, save result into a variable "colA", and add variable into the tuple
> 3. call {{row.getString(0)}} inside a tuple
> 4. call {{row.getString(0)}}, save result into a variable "col0", and add variable into the tuple
> And we found that approach 2~4 get value "a1" successfully, but approach 1 get "null"
> This issue existing in spark 2.4.0/2.3.2/2.3.0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org