You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by li...@apache.org on 2018/04/13 21:05:07 UTC

spark git commit: [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table

Repository: spark
Updated Branches:
  refs/heads/master 25892f3cc -> 558f31b31


[SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table

## What changes were proposed in this pull request?

TableReader would get disproportionately slower as the number of columns in the query increased.

I fixed the way TableReader was looking up metadata for each column in the row. Previously, it had been looking up this data in linked lists, accessing each linked list by an index (column number). Now it looks up this data in arrays, where indexing by column number works better.

## How was this patch tested?

Manual testing
All sbt unit tests
python sql tests

Author: Bruce Robbins <be...@gmail.com>

Closes #21043 from bersprockets/tabreadfix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/558f31b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/558f31b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/558f31b3

Branch: refs/heads/master
Commit: 558f31b31c73b7e9f26f56498b54cf53997b59b8
Parents: 25892f3
Author: Bruce Robbins <be...@gmail.com>
Authored: Fri Apr 13 14:05:04 2018 -0700
Committer: gatorsmile <ga...@gmail.com>
Committed: Fri Apr 13 14:05:04 2018 -0700

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/sql/hive/TableReader.scala     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/558f31b3/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
----------------------------------------------------------------------
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
index cc8907a..b5444a4 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
@@ -381,7 +381,7 @@ private[hive] object HadoopTableReader extends HiveInspectors with Logging {
 
     val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) =>
       soi.getStructFieldRef(attr.name) -> ordinal
-    }.unzip
+    }.toArray.unzip
 
     /**
      * Builds specific unwrappers ahead of time according to object inspector


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org