You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yana Kadiyska (JIRA)" <ji...@apache.org> on 2015/04/17 15:57:58 UTC

[jira] [Commented] (SPARK-5923) Very slow query when using Oracle hive metastore and table has lots of partitions

    [ https://issues.apache.org/jira/browse/SPARK-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499855#comment-14499855 ] 

Yana Kadiyska commented on SPARK-5923:
--------------------------------------

[~mtaylor] Can you please provide some information on how you debugged this? I just experienced a similar issue -- really poor performance on large metastore even though I'm only touching a few partitions. I'm using a Postgresql metastore . I do not however see "IN" queries logged to Postgres, and according to the posgres log no individual query took longer than 50ms. So I'm hoping to get some debugging tips

> Very slow query when using Oracle  hive metastore and table has lots of partitions
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-5923
>                 URL: https://issues.apache.org/jira/browse/SPARK-5923
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Matthew Taylor
>
> This has two aspects
> * The direct sql support for oracle is broken in hive 0.13.1. Fails when partitions get bigger than 1000 due oracle limitation on IN clause. This cause fall back to ORM which is very slow(20 minutes to even start the query)
> * Hive it self does not suffer this problem as it passes down to the metadata query, filter terms that restrict the partitions returned. SparkSQL is always asking for all partitions event if they are not all needed. Even when we patched hive it was still taking 2 minutes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org