You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/07/14 17:12:08 UTC

[jira] [Assigned] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery

     [ https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-8125:
-----------------------------------

    Assignee: Apache Spark  (was: Cheng Lian)

> Accelerate ParquetRelation2 metadata discovery
> ----------------------------------------------
>
>                 Key: SPARK-8125
>                 URL: https://issues.apache.org/jira/browse/SPARK-8125
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Cheng Lian
>            Assignee: Apache Spark
>            Priority: Blocker
>
> For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org