You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/09/15 23:57:47 UTC

[jira] [Resolved] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself

     [ https://issues.apache.org/jira/browse/SPARK-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-6632.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Starting with Spark 1.5 I believe all footer reading is delegated to a spark job.

> Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6632
>                 URL: https://issues.apache.org/jira/browse/SPARK-6632
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Yash Datta
>             Fix For: 1.5.0
>
>
> Currently in ParquetRelation2, schema from all the part files is first merged, and then reconciled with metastore schema. This approach does not scale in case we have thousands of partitions for the table. We can take a different approach where we can go ahead with the metastore schema, and reconcile the names of the columns within each map task , using ReadSupport hooks provided in parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org