You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/16 10:07:46 UTC

[jira] [Updated] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself

     [ https://issues.apache.org/jira/browse/SPARK-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-6632:
-----------------------------
    Assignee: Michael Armbrust

> Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6632
>                 URL: https://issues.apache.org/jira/browse/SPARK-6632
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Yash Datta
>            Assignee: Michael Armbrust
>             Fix For: 1.5.0
>
>
> Currently in ParquetRelation2, schema from all the part files is first merged, and then reconciled with metastore schema. This approach does not scale in case we have thousands of partitions for the table. We can take a different approach where we can go ahead with the metastore schema, and reconcile the names of the columns within each map task , using ReadSupport hooks provided in parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org