You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Mick Davies (JIRA)" <ji...@apache.org> on 2016/04/11 18:06:25 UTC

[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

    [ https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235370#comment-15235370 ] 

Mick Davies commented on SPARK-6910:
------------------------------------

Hi, 

We are seeing something similar, but in our case subsequent queries are still expensive. Looking at HiveMetastoreCatalog.lookupRelation (we are using 1.5, but 1.6 looks the same) we seem to create a new MetastoreRelation for each query. Part of the analysis phase tries to convert this to a ParquetRelation using convertToParquetRelation which always calls metastoreRelation.getHiveQlPartitions() which gets all partition information. So every query incurs the cost of retrieving all partition info.

We don't understand how the code can use the cachedDataSourceTables effectively in the circumstances just described.

We changed HiveMetastoreCatalog.lookupRelation to use cache even if Hive table property "spark.sql.sources.provider" is unset which caused subsequent queries to use cached relation and therfore run more quickly.

Eg, changed 
{code}
if (table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

to 
{code}
if (cachedDataSourceTables.getIfPresent(QualifiedTableName(databaseName, tblName).toLowerCase) != null ||
      table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

Are we doing something wrong?




> Support for pushing predicates down to metastore for partition pruning
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6910
>                 URL: https://issues.apache.org/jira/browse/SPARK-6910
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Cheolsoo Park
>            Priority: Critical
>             Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org