You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "lithiumlee-_- (Jira)" <ji...@apache.org> on 2020/05/26 09:24:00 UTC
[jira] [Created] (SPARK-31822) Cost too much resources when read
orc hive table to infer schema
lithiumlee-_- created SPARK-31822:
-------------------------------------
Summary: Cost too much resources when read orc hive table to infer schema
Key: SPARK-31822
URL: https://issues.apache.org/jira/browse/SPARK-31822
Project: Spark
Issue Type: Improvement
Components: Input/Output
Affects Versions: 2.4.3
Reporter: lithiumlee-_-
When read a hive orc partitioned table without spark schema properties , spark read all partitions and all files to infer schema.
Other settings: native orc mode ; _convertMetastoreOrc = true._
And I think it can improve by pass *_partitionFilters_* to *_fileIndex.listFiles_*.
{code:java}
// code placeholder
// org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:238
val inferredSchema = fileFormat
.inferSchema(
sparkSession,
options,
fileIndex.listFiles(Nil, Nil).flatMap(_.files))
.map(mergeWithMetastoreSchema(relation.tableMeta.dataSchema, _))
{code}
I think
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org