You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Enver Osmanov (Jira)" <ji...@apache.org> on 2021/02/14 12:08:00 UTC
[jira] [Created] (SPARK-34435) ArrayIndexOutOfBoundsException when
select in different case
Enver Osmanov created SPARK-34435:
-------------------------------------
Summary: ArrayIndexOutOfBoundsException when select in different case
Key: SPARK-34435
URL: https://issues.apache.org/jira/browse/SPARK-34435
Project: Spark
Issue Type: Bug
Components: Optimizer, SQL
Affects Versions: 3.0.1
Environment: Actual behavior:
Select column with different case after remapping fail with ArrayIndexOutOfBoundsException.
Expected behavior:
Spark shouldn't fail with ArrayIndexOutOfBoundsException.
Spark is case insensetive by default, so select should return selected column.
Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")
val ds = Seq(user).toDS().map(identity)
ds.select("aa").show(false)
{code}
Additional notes:
Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.
I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}
Reporter: Enver Osmanov
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org