You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2021/07/26 10:50:09 UTC
[spark] branch branch-3.1 updated: [SPARK-36269][SQL] Fix only set
data columns to Hive column names config
This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push:
new 213e62e [SPARK-36269][SQL] Fix only set data columns to Hive column names config
213e62e is described below
commit 213e62ed853778263768c39018ec98e37733616d
Author: Cheng Su <ch...@fb.com>
AuthorDate: Mon Jul 26 18:48:06 2021 +0800
[SPARK-36269][SQL] Fix only set data columns to Hive column names config
### What changes were proposed in this pull request?
When reading Hive table, we set the Hive column id and column name configs (`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We should set non-partition columns (data columns) for both configs, as Spark always [appends partition columns in its own Hive reader](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240). The column id config has only non-partition columns, but column name config has both parti [...]
### Why are the changes needed?
Fix the code logic to be more consistent.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing Hive tests.
Closes #33489 from c21/hive-col.
Authored-by: Cheng Su <ch...@fb.com>
Signed-off-by: Wenchen Fan <we...@databricks.com>
(cherry picked from commit e5616e32eecb516a6b46ae9bc5c2c850c18210a2)
Signed-off-by: Wenchen Fan <we...@databricks.com>
---
.../scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
index 41820b0..7e1dc29 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
@@ -117,8 +117,9 @@ case class HiveTableScanExec(
// Specifies needed column IDs for those non-partitioning columns.
val columnOrdinals = AttributeMap(relation.dataCols.zipWithIndex)
val neededColumnIDs = output.flatMap(columnOrdinals.get).map(o => o: Integer)
+ val neededColumnNames = output.filter(columnOrdinals.contains).map(_.name)
- HiveShim.appendReadColumns(hiveConf, neededColumnIDs, output.map(_.name))
+ HiveShim.appendReadColumns(hiveConf, neededColumnIDs, neededColumnNames)
val deserializer = tableDesc.getDeserializerClass.getConstructor().newInstance()
deserializer.initialize(hiveConf, tableDesc.getProperties)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org